CN117687773A - Network segmentation for container orchestration platform - Google Patents
Network segmentation for container orchestration platform Download PDFInfo
- Publication number
- CN117687773A CN117687773A CN202311149543.5A CN202311149543A CN117687773A CN 117687773 A CN117687773 A CN 117687773A CN 202311149543 A CN202311149543 A CN 202311149543A CN 117687773 A CN117687773 A CN 117687773A
- Authority
- CN
- China
- Prior art keywords
- network
- virtual network
- virtual
- pod
- router
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000011218 segmentation Effects 0.000 title abstract description 30
- 238000000034 method Methods 0.000 claims abstract description 120
- 238000004891 communication Methods 0.000 claims abstract description 60
- 238000012545 processing Methods 0.000 claims abstract description 48
- 230000004044 response Effects 0.000 claims abstract description 25
- 238000003860 storage Methods 0.000 claims description 39
- 230000008569 process Effects 0.000 claims description 38
- 230000008676 import Effects 0.000 claims description 34
- 239000003795 chemical substances by application Substances 0.000 description 86
- 230000006855 networking Effects 0.000 description 65
- 238000007726 management method Methods 0.000 description 35
- 238000010586 diagram Methods 0.000 description 33
- 230000006870 function Effects 0.000 description 33
- 239000004744 fabric Substances 0.000 description 27
- 230000002776 aggregation Effects 0.000 description 17
- 238000004220 aggregation Methods 0.000 description 17
- 238000005538 encapsulation Methods 0.000 description 12
- 238000012544 monitoring process Methods 0.000 description 11
- 238000005516 engineering process Methods 0.000 description 10
- 230000008901 benefit Effects 0.000 description 9
- 238000002955 isolation Methods 0.000 description 9
- 230000010076 replication Effects 0.000 description 8
- 241000721662 Juniperus Species 0.000 description 7
- 238000013500 data storage Methods 0.000 description 6
- 238000012546 transfer Methods 0.000 description 6
- 238000011161 development Methods 0.000 description 5
- 230000018109 developmental process Effects 0.000 description 5
- 230000010354 integration Effects 0.000 description 5
- 230000003863 physical function Effects 0.000 description 5
- 238000013499 data model Methods 0.000 description 4
- 238000001514 detection method Methods 0.000 description 4
- 238000009826 distribution Methods 0.000 description 4
- 230000005855 radiation Effects 0.000 description 4
- 230000011664 signaling Effects 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 230000015556 catabolic process Effects 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 238000006731 degradation reaction Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 238000009434 installation Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000002093 peripheral effect Effects 0.000 description 3
- 230000002265 prevention Effects 0.000 description 3
- 230000005641 tunneling Effects 0.000 description 3
- BKCJZNIZRWYHBN-UHFFFAOYSA-N Isophosphamide mustard Chemical compound ClCCNP(=O)(O)NCCCl BKCJZNIZRWYHBN-UHFFFAOYSA-N 0.000 description 2
- 101100116973 Mus musculus Dmbt1 gene Proteins 0.000 description 2
- RJKFOVLPORLFTN-LEKSSAKUSA-N Progesterone Chemical compound C1CC2=CC(=O)CC[C@]2(C)[C@@H]2[C@@H]1[C@@H]1CC[C@H](C(=O)C)[C@@]1(C)CC2 RJKFOVLPORLFTN-LEKSSAKUSA-N 0.000 description 2
- 230000006399 behavior Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 238000010168 coupling process Methods 0.000 description 2
- 238000005859 coupling reaction Methods 0.000 description 2
- 230000007423 decrease Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 238000007667 floating Methods 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 230000002085 persistent effect Effects 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 238000013519 translation Methods 0.000 description 2
- 238000013024 troubleshooting Methods 0.000 description 2
- 241001026509 Kata Species 0.000 description 1
- 241000322338 Loeseliastrum Species 0.000 description 1
- 101100513046 Neurospora crassa (strain ATCC 24698 / 74-OR23-1A / CBS 708.71 / DSM 1257 / FGSC 987) eth-1 gene Proteins 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000013475 authorization Methods 0.000 description 1
- 238000003339 best practice Methods 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 230000000903 blocking effect Effects 0.000 description 1
- 239000000969 carrier Substances 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000013070 change management Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000001816 cooling Methods 0.000 description 1
- 239000013256 coordination polymer Substances 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 238000005206 flow analysis Methods 0.000 description 1
- GVVPGTZRZFNKDS-JXMROGBWSA-N geranyl diphosphate Chemical compound CC(C)=CCC\C(C)=C\CO[P@](O)(=O)OP(O)(O)=O GVVPGTZRZFNKDS-JXMROGBWSA-N 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 230000004941 influx Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 239000005022 packaging material Substances 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 239000000047 product Substances 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 230000008929 regeneration Effects 0.000 description 1
- 238000011069 regeneration method Methods 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 239000011435 rock Substances 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012384 transportation and delivery Methods 0.000 description 1
- WFKWXMTUELFFGS-UHFFFAOYSA-N tungsten Chemical compound [W] WFKWXMTUELFFGS-UHFFFAOYSA-N 0.000 description 1
- 229910052721 tungsten Inorganic materials 0.000 description 1
- 239000010937 tungsten Substances 0.000 description 1
- 238000011144 upstream manufacturing Methods 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
- G06F9/5077—Logical partitioning of resources; Management or configuration of virtualized resources
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
- G06F2009/4557—Distribution of virtual machine instances; Migration and load balancing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
- G06F2009/45587—Isolation or security of virtual machine instances
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
- G06F2009/45595—Network integration; Enabling network access in virtual machine instances
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
Network segments for a container orchestration platform are disclosed herein. In general, techniques for performing network segmentation for a container orchestration platform are described. A network controller including memory and processing circuitry may be configured to perform these techniques. The memory may be configured to store a request conforming to the containerization platform to configure a new pod of the plurality of pods with a master interface to communicate over the virtual network to segment the network formed by the plurality of pods. The processing circuitry may be configured to configure the new pod with a master interface to enable communication via the virtual network in response to the request.
Description
Cross reference to the present application
This application claims the benefit of U.S. patent application Ser. No.18/146,799, filed on Ser. No. 12/27 of 2022, and the benefit of U.S. provisional patent application Ser. No.63/375,091, filed on 9 of 2022, each of which is incorporated herein by reference in its entirety.
Technical Field
The present disclosure relates to virtualized computing infrastructure and, more particularly, to container management platforms.
Background
Kubernetes (sometimes abbreviated as "K8 s") is a container orchestration platform for automated software deployment, expansion, and management. Kubernetes is deployed in a Software Defined Network (SDN) to manage containers, which are lightweight virtualization of various workloads (lightweight compared to virtual machines, which implement a complete workload execution stack in terms of virtualizing computing devices executing an operating system).
Kubernetes may be deployed in a data center or other environment to form and manage different network topologies. Kubernetes provides a default networking environment in which a default pod network facilitates communication between pods via a host interface. That is, each pod can use the master interface of that pod to interconnect with the default pod network. In this way, all the pod can communicate with each other using their primary interface on the default pod network. In this regard, a container orchestration platform such as Kubernetes may provide a flat networking architecture in which all pods can communicate with all pods.
The container orchestration platform may thus prevent network segmentation via the master interface, which may introduce security issues that may provide a back door with which malicious users may access data that was previously considered secure.
Disclosure of Invention
In general, techniques are described that enable network segmentation for container management platforms such as Kubernetes. Kubernetes operates as an example cloud orchestration platform for deploying containers to manage workload execution. As noted above, kubernetes provides a default pod network to facilitate a flat network topology, where each pod can communicate with each other pod via the default pod network and with the default pod via a master interface configured for each pod.
Rather than implementing a default pod network and creating a master interface strictly for communications on the default pod network (and thus implementing inter-pod communications), the techniques described in this disclosure may redefine the master interface as a custom Kubernetes resource to allow high-level (e.g., non-microscopic) segmentation, thereby restricting inter-pod communications and thus closing the backdoor provided by the default pod network and the master interface when the master interface is defined as a standard (or in other words, native) resource of Kubernetes. In some examples, the pod manifest is annotated to specify custom Kubernetes resources of the virtual network for the primary interface used to configure the pod, and the network controller configures the primary interface for the pod-at deployment-to enable communication on the virtual network specified in the pod manifest annotation. Alternatively, in conjunction with the pod manifest, a namespace can be used to specify a virtual network for the host interface.
The redefined master interface may then be segmented using mesh and/or hub-and-spoke topologies (possibly depending on the virtual network router-VNR). In this case, the host interface may be used to facilitate intercommunication between the pods in a controlled and well-defined manner according to the VNR configuration referencing the host interface and following a well-defined network routing paradigm.
The redefined master interface can reduce security issues associated with backdoors previously introduced by the default pod network in view of the backdoors having been eliminated. Furthermore, the redefined host interface may still support standard host interface functionality, including VNR, border gateway protocol as a service (BGPaaS), layer two (L2) networking, and so forth.
These techniques may provide one or more technical advantages. For example, the container management platform may provide advanced networking that facilitates network segmentation at lower networking layers, such as a data link layer (including, as one example, a virtual data link layer represented by a virtual interface), a network layer, and the like. By removing the backdoor provided by the default master interface and the associated default pod network provided by the standard Kubernetes resources, a network controller implementing a container management platform (such as Kubernetes) can redefine the master interface as a custom resource to reduce, if not eliminate, inter-pod communication that would otherwise provide a secure backdoor. Improving network security may reduce attacks that expose sensitive data (e.g., personal data) and other vulnerabilities to achieve a more complex and secure networking architecture. Removing security vulnerabilities may be beneficial to the network controller itself because malware may not be able to access sensitive data, while preventing such access may reduce consumption of computing resources (e.g., by not executing such malware that consumes processing cycles, memory bus bandwidth, associated power, etc.).
In one example, aspects of the technology relate to a network controller comprising: a memory configured to store a request conforming to the containerization platform by which a new pod of the plurality of pods is configured with a master interface to communicate over a virtual network to segment a network formed by the plurality of pods; a processing circuit configured to configure the new pod with a master interface in response to the request to enable communication via the virtual network.
In one example, aspects of the technology relate to a method comprising: storing, by the network controller, a request conforming to the container orchestration platform, by which a new pod of the plurality of pods is configured with a master interface to communicate over the virtual network to segment the network formed by the plurality of pods; and configuring, by the network controller in response to the request, the new pod with a master interface to enable communication via the virtual network.
In one example, aspects of the present technology relate to a non-transitory computer-readable storage medium storing instructions that, when executed, cause a processing circuit to: storing a request conforming to a container orchestration platform by which a new pod of the plurality of pods is configured with a master interface to communicate over a virtual network to segment a network formed by the plurality of pods; and in response to the request, configuring the new pod with a master interface to enable communication via the virtual network.
The details of one or more examples of the disclosure are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description and drawings, and from the claims.
Drawings
FIG. 1 is a block diagram illustrating an example computing infrastructure in which examples of techniques described herein may be implemented.
Fig. 2 is a block diagram illustrating one example of a cloud native SDN architecture for cloud native networking in accordance with the techniques of this disclosure.
Fig. 3 is a block diagram illustrating another view of components of an SDN architecture in greater detail and in accordance with the techniques of this disclosure.
Fig. 4 is a block diagram illustrating example components of an SDN architecture in accordance with the techniques of this disclosure.
FIG. 5 is a block diagram of an example computing device in accordance with the techniques described in this disclosure.
Fig. 6 is a block diagram of an example computing device operating as a computing node of one or more clusters of an SDN architecture system in accordance with the techniques of this disclosure.
Fig. 7A is a block diagram illustrating a control/routing plane for an underlying network and overlay network configuration using an SDN architecture in accordance with the techniques of this disclosure.
Fig. 7B is a block diagram illustrating a configured virtual network that connects clusters using configured tunnels in an underlying network in accordance with the techniques of this disclosure.
Fig. 8 is a block diagram illustrating one example of a custom controller for custom resource(s) configured for an SDN architecture in accordance with the techniques of this disclosure.
FIG. 9 is a block diagram illustrating an example flow of creation, monitoring, and reconciliation between custom resource types that have dependencies on different custom resource types.
Fig. 10 is a diagram illustrating an example network topology using custom master interfaces in accordance with the network segmentation techniques described in this disclosure.
Fig. 11 is a diagram illustrating a network topology using a custom master interface in addition to a virtual network router in accordance with aspects of the network segmentation technique described in the present disclosure.
Fig. 12 is a diagram illustrating an example pod-to-pod networking configured in accordance with aspects of the network segmentation techniques described in this disclosure.
Fig. 13 is a diagram illustrating an example pod-to-service networking configured in accordance with aspects of the network segmentation techniques described in this disclosure.
FIG. 14 is a diagram illustrating an example container management platform feature extension in accordance with aspects of the network segmentation technique described in this disclosure.
FIG. 15 is a diagram illustrating another example container management platform feature extension in accordance with aspects of the network segmentation technique described in this disclosure.
Fig. 16 is a diagram illustrating one example of network-aware scheduling in accordance with aspects of the network segmentation technique described in this disclosure.
Fig. 17 is a flowchart illustrating example operations of the network controller shown in fig. 1 in performing aspects of the network segmentation techniques described in this disclosure.
Like reference numerals refer to like elements throughout the specification and drawings.
Detailed Description
FIG. 1 is a block diagram illustrating an example computing infrastructure 8 in which examples of techniques described herein may be implemented. Implementation of Software Defined Networking (SDN) architecture for virtual networks currently presents challenges for cloud native adoption due to, for example, complexity of lifecycle management, mandatory high-resource analysis components, scale restrictions in configuration modules, and no Command Line Interface (CLI) based (kubectl-like) interfaces. The computing infrastructure 8 includes the cloud-native SDN architecture system described herein that addresses these challenges and is modern for the telco cloud-native era. Example use cases of cloud-native SDN architecture include 5G mobile networks, cloud and enterprise cloud-native use cases. The SDN architecture may include data plane elements implemented in computing nodes (e.g., servers 12) and network devices such as routers or switches, and may also include SDN controllers (e.g., network controller 24) for creating and managing virtual networks. The SDN architecture configuration and control plane is designed as laterally-extended cloud native software with a container-based micro-service architecture that supports upgrades in services.
As a result, the SDN architecture component is a micro-service and, in contrast to existing network controllers, the SDN architecture assumes a base container orchestration platform to manage the lifecycle of the SDN architecture component. The container orchestration platform is used to propose SDN architecture components; the SDN architecture uses a cloud native monitoring tool that can be integrated with customer-provided cloud native options; the SDN architecture uses an aggregation API for SDN architecture objects (i.e., custom resources) to provide a declarative manner of resources. SDN architecture upgrades may follow cloud native patterns and SDN architecture may utilize Kubernetes constructs such as Multus, authentication and authorization, cluster API, kubeFederation, kubeVirt, and Kata containers. The SDN architecture may support a Data Plane Development Kit (DPDK) pod, and the SDN architecture may be extended to support Kubernetes with virtual network policies and global security policies.
For service providers and enterprises, SDN architecture automates network resource provisioning and orchestration to dynamically create highly scalable virtual networks and link Virtualized Network Functions (VNFs) and Physical Network Functions (PNFs) to form differentiated service chains on demand. The SDN architecture may be integrated with orchestration platforms (e.g., orchestrator 23) such as Kubernetes, openShift, mesos, openStack, VMware vspheres, and with service provider operations support systems/business support systems (OSS/BSS).
In general, one or more data centers 10 provide an operating environment for applications and services through a service provider network 7 for a customer site 11 (illustrated as "customer 11") having one or more customer networks coupled to the data centers. Each of the data center(s) 10 may, for example, host infrastructure equipment such as networking and storage systems, redundant power supplies, and environmental controls. The service provider network 7 is coupled to a public network 15, which public network 15 may represent one or more networks managed by other providers, and thus may form part of a large-scale public network infrastructure (e.g., the internet). Public network 15 may represent, for example, a Local Area Network (LAN), wide Area Network (WAN), the internet, a Virtual LAN (VLAN), an enterprise LAN, a layer 3 Virtual Private Network (VPN), an Internet Protocol (IP) intranet operated by a service provider operating service provider network 7, an enterprise IP network, or some combination thereof.
Although the customer site 11 and public network 15 are primarily illustrated and described as edge networks of the service provider network 7, in some examples one or more of the customer site 11 and public network 15 may be a tenant network within any one of the data center(s) 10. For example, the data center(s) 10 may host multiple tenants (customers), each associated with one or more Virtual Private Networks (VPNs), each of which may implement one of the customer sites 11.
The service provider network 7 provides packet-based connectivity to the attached customer site 11, data center(s) 10 and public network 15. The service provider network 7 may represent a network owned and operated by a service provider to interconnect multiple networks. The service provider network 7 may implement multiprotocol label switching (MPLS) forwarding and in this case may be referred to as an MPLS network or MPLS backbone. In some examples, service provider network 7 represents a plurality of interconnected autonomous systems (such as the internet) that provide services from one or more service providers.
In some examples, each of the data center(s) 10 may represent one of a number of geographically distributed network data centers that may be connected to each other via a service provider network 7, a dedicated network link, a dark fiber, or other connection. As illustrated in the example of fig. 1, data center(s) 10 may include facilities that provide network services to customers. The clients of the service provider may be a corporate entity such as an enterprise, government, or may be individuals. For example, a network data center may host Web services for several enterprises and end users. Other exemplary services may include data storage, virtual private networks, business engineering, file services, data mining, scientific or super computing, and so forth. Although illustrated as a separate edge network of the service provider network 7, elements of the data center(s) 10, such as one or more Physical Network Functions (PNFs) or Virtualized Network Functions (VNFs), may be included within the service provider network 7 core.
In this example, the data center(s) 10 include storage and/or computing servers (or "nodes") that are interconnected with servers 12A-12X (herein "servers 12") depicted as coupled to roof-top switches 16A-16N via a switch fabric 14 provided by one or more layers of physical network switches and routers. The server 12 is a computing device and may also be referred to herein as a "computing node," host, "or" host device. Although only server 12A coupled to TOR switch 16A is shown in detail in fig. 1, data center 10 may include many additional servers coupled to other TOR switches 16 of data center 10.
The switch fabric 14 in the illustrated example includes interconnected top of rack (TOR) (or other "leaf") switches 16A-16N (collectively, "TOR switches 16") that are coupled to a distribution layer of chassis (or "hub" or "core") switches 18A-18M (collectively, "chassis switches 18"). Although not shown, the data center 10 may also include, for example, one or more non-edge switches, routers, hubs, gateways, security devices such as firewalls, intrusion detection and/or prevention devices, servers, computer terminals, laptops, printers, databases, wireless mobile devices such as cellular telephones or personal digital assistants, wireless access points, bridges, cable modems, application accelerators, or other network devices. The data center(s) 10 may also include one or more Physical Network Functions (PNFs), such as physical firewalls, load balancers, routers, route reflectors, broadband Network Gateways (BNGs), mobile core network elements, and other PNFs.
In this example, TOR switch 16 and chassis switch 18 provide redundant (multi-homed) connectivity to IP fabric 20 and service provider network 7 to server 12. The chassis switches 18 aggregate traffic flows and provide connectivity between TOR switches 16. Switch 16 may be a network device that provides layer 2 (MAC) and/or layer 3 (e.g., IP) routing and/or switching functionality. The TOR switch 16 and the chassis switch 18 may each include one or more processors and memory and may execute one or more software processes. Chassis switch 18 is coupled to IP fabric 20, IP fabric 20 may perform layer 3 routing to route network traffic between data center 10 and customer sites 11 through service provider network 7. The switching architecture of the data center(s) 10 is merely an example. For example, other switching fabrics may have more or fewer switching layers. IP fabric 20 may include one or more gateway routers.
The term "packet flow", "traffic flow" or simply "flow" refers to a group of packets originating from a particular source device or endpoint and sent to a particular destination device or endpoint. A single packet flow may be identified by a five tuple: for example, < source network address, destination network address, source port, destination port, protocol >. The five-tuple generally identifies the packet flow to which the received packet corresponds. n-tuple refers to any n-term extracted from a 5-tuple. For example, a tuple of a packet may refer to a combination of < source network address, destination network address > or < source network address, source port > of the packet.
The servers 12 may each represent a computing server or a storage server. For example, each server 12 may represent a computing device configured to operate in accordance with the techniques described herein, such as an x86 processor-based server. The server 12 may provide a Network Function Virtualization Infrastructure (NFVI) for NFV architecture.
Any of the servers 12 may be configured with virtual execution elements, such as pod or virtual machines, by virtualizing the resources of the servers in order to provide some measure of isolation between one or more processes (applications) executing on the servers. "hypervisor-based" or "hardware level" or "platform" virtualization refers to creating virtual machines, each comprising a guest operating system for executing one or more processes. In general, virtual machines provide a virtualized/guest operating system for executing applications in an isolated virtual environment. Because the virtual machines are virtualized from the physical hardware of the host server, the executing applications are isolated from the hardware of the host and other virtual machines. Each virtual machine may be configured with one or more virtual network interfaces for communicating over a corresponding virtual network.
Virtual networks are logical constructs implemented on top of physical networks. Virtual networks may be used to replace VLAN-based quarantine and provide multi-tenancy in a virtualized data center (e.g., one of the data center(s) 10). Each tenant or application may have one or more virtual networks. Each virtual network may be isolated from all other virtual networks unless explicitly allowed by the security policy.
The virtual network may be connected to and spread over a physical multiprotocol label switching (MPLS) layer 3 virtual private network (L3 VPN) and an Ethernet Virtual Private Network (EVPN) network using a data center 10 gateway router (not shown in fig. 1). Virtual networks may also be used to implement Network Function Virtualization (NFV) and service chaining.
Virtual networks may be implemented using a variety of mechanisms. For example, each virtual network may be implemented as a Virtual Local Area Network (VLAN), a Virtual Private Network (VPN), or the like. The virtual network may also be implemented using two networks-a physical underlay network consisting of IP fabric 20 and switch fabric 14, and a virtual overlay network. The role of the physical underlay network is to provide an "IP fabric" that provides unicast IP connectivity from any physical device (server, storage device, router, or switch) to any other physical device. The underlying network may provide uniform low-latency, non-blocking, high-bandwidth connectivity from any point in the network to any other point in the network.
As described further below with respect to virtual router 21 (illustrated as and also referred to herein as "vruter 21"), virtual routers running in server 12 use a grid of dynamic "tunnels" between them to create a virtual overlay network over the physical underlay network. These overlay tunnels may be, for example, MPLS tunnels based on GRE/UDP, VXLAN tunnels or NVGRE tunnels. The underlying physical routers and switches may not store any per-tenant state of the virtual machine or other virtual execution element, such as any Media Access Control (MAC) address, IP address, or policy. The forwarding tables of the underlying physical routers and switches may, for example, contain only the IP prefix or MAC address of the physical server 12. (gateway routers or switches connecting virtual networks to physical networks are an exception and may contain tenant MAC or IP addresses.)
The virtual router 21 of the server 12 typically contains per-tenant states. For example, they may contain separate forwarding tables (routing instances) for each virtual network. The forwarding table contains the IP prefix (in the case of layer 3 coverage) or MAC address (in the case of layer 2 coverage) of the virtual machine or other virtual execution element (e.g., pod of the container). No single virtual router 21 needs to contain all IP prefixes or all MAC addresses for all virtual machines in the entire data center. Only those route instances that exist locally on server 12 need be included by a given virtual router 21 (i.e., that have at least one virtual execution element that exists on server 12.)
"Container-based" or "operating system" virtualization refers to the virtualization of an operating system to run multiple independent systems on a single machine (virtual or physical). Such stand-alone systems represent containers such as those provided by open source DOCKER container applications or CoreOS Rkt ("rock"). As with virtual machines, each container is virtualized and can remain isolated from hosts and other containers. However, unlike virtual machines, each container may omit an individual operating system, and instead provide application suites and application-specific libraries. In general, containers are executed by a host as separate user space instances, and may share an operating system and a common library with other containers executing on the host. Thus, the container may require less processing power, storage, and network resources than a virtual machine ("VM"). A set of one or more containers may be configured to share one or more virtual network interfaces for communicating over a corresponding virtual network.
In some examples, containers are managed by their host kernel to allow limiting and prioritizing resources (CPUs, memory, block I/O, networks, etc.) without the need to launch any virtual machines, in some cases using namespace isolation functionality that allows complete isolation of the operating environment view of an application (e.g., a given container), including process trees, networking, user identifiers, and installed file systems. In some examples, the containers may be deployed according to a Linux container (LXC), which is an operating system level virtualization method for running multiple independent Linux systems (containers) on a control host using a single Linux kernel.
Server 12 hosts virtual network endpoints for one or more virtual networks that operate on the physical networks represented herein by IP fabric 20 and switch fabric 14. Although described primarily with respect to a data center based switching network, other physical networks, such as the service provider network 7, may underlie one or more virtual networks.
Each of servers 12 may host one or more virtual execution elements, each having at least one virtual network endpoint for one or more virtual networks configured in a physical network. A virtual network endpoint for a virtual network may represent one or more virtual execution elements that share a virtual network interface for the virtual network. For example, a virtual network endpoint may be a virtual machine, a set of one or more containers (e.g., pod), or another virtual executive(s), such as a layer 3 endpoint of a virtual network. The term "virtual execution element" encompasses virtual machines, containers, and other virtualized computing resources that provide an at least partially independent execution environment for an application. The term "virtual actuator" may also encompass the pod of one or more containers. The virtual execution element may represent an application workload. As shown in fig. 1, server 12A hosts a virtual network endpoint in the form of pod 22 having one or more containers. However, the server 12 may execute as many virtual execution elements as there are actual, taking into account the hardware resource limitations of the server 12. Each virtual network endpoint may use one or more virtual network interfaces to perform packet I/O or otherwise process packets. For example, the virtual network endpoint may use one virtual hardware component (e.g., SR-IOV virtual function) enabled by NIC 13A to perform packet I/O and receive/transmit packets over one or more communication links with TOR switch 16A. Other examples of virtual network interfaces are described below.
The servers 12 each include at least one Network Interface Card (NIC) 13 that each include at least one interface to exchange packets with the TOR switch 16 over a communication link. For example, the server 12A includes the NIC 13A. Any of the NICs 13 may provide one or more virtual hardware components 21 for virtualizing input/output (I/O). The virtual hardware component for I/O may be a virtualization of a physical NIC ("physical function"). For example, in single root I/O virtualization (SR-IOV) described in the peripheral component interface special interest group SR-IOV specification, the PCIe physical functions of a network interface card (or "network adapter") are virtualized to present one or more virtual network interfaces as "virtual functions" for use by various endpoints executing on server 12. In this way, the virtual network endpoints may share the same PCIe physical hardware resources, and the virtual functions are examples of virtual hardware components 21.
As another example, one or more servers 12 may implement Virtio, which is an available paravirtualized framework, such as for Linux operating systems, that provides analog NIC functionality as a virtual hardware component to provide virtual network interfaces to virtual network endpoints. As another example, one or more servers 12 may implement Open vswitches to perform distributed virtual multi-layer switching between one or more virtual NICs (vnics) for hosted virtual machines, where such vnics may also represent a type of virtual hardware component that provides virtual network interfaces to virtual network endpoints. In some examples, the virtual hardware component is a virtual I/O (e.g., NIC) component. In some examples, the virtual hardware component is an SR-IOV virtual function.
In some examples, any of servers 12 may implement a Linux bridge that emulates a hardware bridge and forwards packets between virtual network interfaces of the servers or between virtual network interfaces of the servers and physical network interfaces of the servers. For a Docker implementation of containers hosted by a server, a Linux bridge or other operating system bridge executing on the server that exchanges packets between containers may be referred to as a "Docker bridge. The term "virtual router" as used herein may encompass a Contrail or Tungsten structured virtual router, an Open VSwitch (OVS), an OVS bridge, a Linux bridge, a Docker bridge, or other devices and/or software located on a host device and performing switching, bridging, or routing packets between virtual network endpoints of one or more virtual networks, where the virtual network endpoints are hosted by one or more servers 12.
Any of the NICs 13 may include an internal device switch to exchange data between virtual hardware components associated with the NIC. For example, for an NIC supporting SR-IOV, the internal device switch may be a Virtual Ethernet Bridge (VEB) for switching between SR-IOV virtual functions and, correspondingly, between endpoints configured to use the SR-IOV virtual functions, where each endpoint may include a guest operating system. The internal device switch may alternatively be referred to as a NIC switch, or for SR-IOV implementation, may be referred to as a SR-IOV NIC switch. The virtual hardware component associated with NIC 13A may be associated with a layer 2 destination address that may be assigned by NIC 13A or a software process responsible for configuring NIC 13A. A physical hardware component (or "physical function" for SR-IOV implementation) is also associated with the layer 2 destination address.
The one or more servers 12 may each include a virtual router 21, the virtual router 21 executing one or more routing instances for corresponding virtual networks within the data center 10 to provide virtual network interfaces and route packets between virtual network endpoints. Each routing instance may be associated with a network forwarding table. Each routing instance may represent a virtual routing and forwarding instance (VRF) of an internet protocol virtual private network (IP-VPN). Packets received by the virtual router 21 of the server 12A, for example, from the underlying physical network fabric of the data center 10 (i.e., the IP fabric 20 and the switch fabric 14) may include an outer header to allow the physical network fabric to tunnel the payload or "inner packet" to the physical network address of the network interface card 13A of the server 12A executing the virtual router. The outer header may include not only the physical network address of the network interface card 13A of the server, but also a virtual network identifier, such as a VxLAN label or a multiprotocol label switching (MPLS) label, that identifies one of the virtual networks and the virtual network of the corresponding routing instance performed by the virtual router 21. The inner packet includes an inner header having a destination network address conforming to a virtual network addressing space of the virtual network identified by the virtual network identifier.
The virtual router 21 terminates the virtual network overlay tunnel and determines the virtual network for the received packet based on the tunnel encapsulation header of the packet and forwards the packet to the appropriate destination virtual network endpoint for the packet. For example, for server 12A, for each packet outbound from a virtual network endpoint (e.g., pod 22) hosted by server 12A, virtual router 21 appends a tunnel encapsulation header of the virtual network that indicates the packet to generate an encapsulated or "tunnel" packet, and virtual router 21 outputs the encapsulated packet to a physical destination computing device, such as another one of servers 12, via an overlay tunnel for the virtual network. As used herein, virtual router 21 may perform operations of tunnel endpoints to encapsulate internal packets originating from virtual network endpoints to generate tunnel packets and decapsulate the tunnel packets to obtain internal packets for routing to other virtual network endpoints.
In some examples, virtual router 21 may be kernel-based and execute as part of the kernel of the operating system of server 12A.
In some examples, virtual router 21 may be a virtual router supporting a Data Plane Development Kit (DPDK). In such an example, the virtual router 21 uses DPDK as a data plane. In this mode, the virtual router 21 operates as a user space application linked to a DPDK library (not shown). This is a performance version of the virtual router and is typically used by carriers, where VNFs are often DPDK-based applications. The performance of the virtual router 21 as a DPDK virtual router can achieve ten times higher throughput than a virtual router operating as a kernel-based virtual router. The physical interface is used by the Polling Mode Driver (PMD) of the DPDK instead of the interrupt-based driver of the Linux kernel.
User I/O (UIO) kernel modules, such as vfio or uio_pci_geneic, may be used to expose registers of the physical network interface into user space so that they are accessible by the DPDK PMD. When NIC 13A is bound to the UIO driver, it is moved from Linux kernel space to user space and is therefore no longer managed or visible by the Linux operating system. Thus, the DPDK application (i.e., the virtual router 21A in this example) completely manages the NIC 13. This includes packet polling, packet processing, and packet forwarding. The user packet processing step may be performed by the virtual router 21DPDK data plane with limited or no participation of the core (the core not being shown in fig. 1). The nature of this "polling mode" makes virtual router 21DPDK data plane packet processing/forwarding more efficient than interrupt mode, especially when the packet rate is high. There is limited or no interruption and context switching during packet I/O. Additional details of the DPDK vRouter example can be found in Kiran KN et al, "DAY ONE: CONTRAIL DPDK vROUTER", of the Juniper network, (2021), the entire contents of which are incorporated herein by reference.
The computing infrastructure 8 implements an automation platform for automating the deployment, extension, and operation of virtual execution elements across servers 12 to provide a virtualized infrastructure for executing application workloads and services. In some examples, the platform may be a container orchestration system (or in other words, a container management platform) that provides a container-centric infrastructure for automating the deployment, expansion, and operation of containers to provide a container-centric infrastructure. In the context of virtualized computing infrastructure, "orchestration" generally refers to provisioning, scheduling, and managing virtual execution elements and/or applications and services executing on such virtual execution elements to host servers available to an orchestration platform. Container orchestration may facilitate container orchestration and refers to the deployment, management, expansion, and configuration of containers that host servers, e.g., through a container orchestration platform. Examples of orchestration platforms include Kubernetes (container orchestration system), docker swarm, meso/Marathon, openShift, openStack, VMware, and Amazon ECS.
The elements of the automation platform of the computing infrastructure 8 include at least a server 12, an orchestrator 23, and a network controller 24. The container may be deployed to the virtualized environment using a cluster-based framework in which a cluster master node of the cluster manages the deployment and operation of the container to one or more cluster slave nodes of the cluster. The terms "master node" and "slave node" as used herein encompass different orchestration platform terms for similar devices used to distinguish between the primary management element of the cluster and the primary container hosting device of the cluster. For example, kubernetes platforms use the terms "cluster master" and "slave" while Docker Swarm platforms are referred to as cluster manager and cluster nodes.
The orchestrator 23 and the network controller 24 may execute on separate computing devices, on the same computing device. Each of orchestrator 23 and network controller 24 may be a distributed application executing on one or more computing devices. Orchestrator 23 and network controller 24 may implement respective master nodes for one or more clusters, each cluster having one or more slave nodes (also referred to as "compute nodes") implemented by respective servers 12.
In general, network controller 24 controls the network configuration of the data center 10 architecture, for example, to establish one or more virtual networks for packetized communications between virtual network endpoints. Network controller 24 provides a logically and in some cases physically centralized controller for facilitating the operation of one or more virtual networks within data center 10. In some examples, network controller 24 may operate in response to configuration inputs received from orchestrator 23 and/or an administrator/operator. Additional information regarding example operations of the network controller 24 operating in conjunction with the data center 10 or other devices of other software defined networks may be found in the following documents: international application number PCT/US2013/044378 entitled "PHYSICAL PATH DETERMINATION FOR VIRTUAL NETWORK PACKET FLOWS", filed on 5/6/2013; and U.S. patent No. 14/226,509 entitled "TUNNELED PACKET AGGREGATION FOR VIRTUAL NETWORKS," filed on day 26 of 3 months of 2014, each of which is incorporated by reference as if fully set forth herein.
In general, orchestrator 23 controls the deployment, expansion, and operation of containers across clusters of servers 12, and provides computing infrastructure, which may include container-centric computing infrastructure. The orchestrator 23 and in some cases the network controller 24 may implement respective cluster hosts for one or more Kubernetes clusters. For example, kubernetes is a container management platform that provides portability across public and private clouds, each of which may provide a virtualized infrastructure to the container management platform. Example components of the Kubernetes orchestration system are described below with reference to fig. 3.
In one example, pod 22 is a Kubernetes pod and is one example of a virtual network endpoint. A pod is a set of one or more logically related containers (not shown in fig. 1), shared storage of containers, and options for how to run containers. In the case of instantiation to execute, a pod may also be referred to as a "pod copy". Each container of pod 22 is one example of a virtual execution element. The containers of the pod are always co-located on a single server, co-scheduled, and run in a shared context. The shared context of pod may be a collection of Linux namespaces cgroup and other isolation aspects.
Within the context of pod, the individual application may have applied additional child isolation. Typically, the containers within the pod have a public IP address and port space and are able to detect each other via a local host. Since the containers within the pod have shared context, they can also communicate with each other using inter-process communication (IPC). Examples of IPC include SystemV semaphores or POSIX shared memory. Typically, containers that are members of different pod have different IP addresses and cannot communicate through IPC in the absence of a configuration to enable this feature. Containers that are members of different pod typically communicate with each other via pod IP addresses.
The server 12A includes a container platform 19 for running containerized applications, such as the application of pod 22. Container platform 19 receives requests from orchestrator 23 to obtain and host containers in server 12A. The container platform 19 obtains and executes containers.
The Container Network Interface (CNI) 17 configures a virtual network interface for a virtual network endpoint. Orchestrator 23 and container platform 19 use CNI 17 to manage networking for pod (including pod 22). For example, the CNI 17 creates a virtual network interface to connect the pod to the virtual router 21 and enable the container of such pod to communicate with other virtual network endpoints on the virtual network via the virtual network interface. For example, the CNI 17 may insert the virtual network interface of the virtual network into the network namespace of the container in the pod 22 and configure (or request configuration) the virtual network interface of the virtual network in the virtual router 21, such that the virtual router 21 is configured to send packets received from the virtual network to the container of the pod 22 via the virtual network interface and send packets received from the container of the pod 22 on the virtual network via the virtual network interface. The CNI 17 may assign a network address (e.g., a virtual IP address of a virtual network) and may set a route for the virtual network interface.
In Kubernetes, by default all pods can communicate with all other pods without using Network Address Translation (NAT). In some cases, orchestrator 23 and network controller 24 create a service virtual network and a pod virtual network that are shared by all namespaces, from which service and pod network addresses are assigned, respectively. In some cases, all of the pod's in all namespaces generated in the Kubernetes cluster may be able to communicate with each other, and the network addresses for all of the pod's may be allocated from the pod subnet specified by the orchestrator 23. When a user creates an isolated namespace for a pod, orchestrator 23 and network controller 24 can create a new pod virtual network and a new shared services virtual network for the new isolated namespace. The pod in the isolated namespace generated in the Kubernetes cluster extracts the network address from the new pod virtual network and the corresponding service for such pod extracts the network address from the new service virtual network.
CNI 17 may represent a library, plug-in, module, runtime, or other executable code of server 12A. The CNI 17 may at least partially conform to the pod network interface (CNI) specification or rkt networking proposal. CNI 17 may represent Contrail, openContrail, multus, calico, cRPD or other CNIs. CNI 17 may alternatively be referred to as a network plug-in or CNI instance. For example, the Multus CNI may call a separate CNI to establish a different virtual network interface for pod 22.
CNI 17 may be invoked by orchestrator 23. For the purposes of the CNI specification, a container may be considered synonymous with a Linux network namespace. Its corresponding unit depends on the particular container runtime implementation: for example, in an implementation of an application container specification such as rkt, each pod runs in a unique network namespace. However, in Docker, each individual Docker container typically has a network namespace. For the purposes of the CNI specification, a network refers to a set of entities that are uniquely addressable and that can communicate with each other. This may be an individual container, a machine/server (real or virtual) or some other network device (e.g., a router). Conceptually, containers may be added to or removed from one or more networks. The CNI specification specifies a number of notes that are in line with the plugin ("CNI plugin").
pod 22 includes one or more containers. In some examples, pod 22 includes a containerized DPDK workload designed to use DPDK to accelerate packet processing, for example, by using a DPDK library to exchange data with other components. In some examples, virtual router 21 may execute as a containerized DPDK workload.
The pod22 is configured with a virtual network interface 26 for sending and receiving packets with the virtual router 21. The virtual network interface 26 may be the default interface of the pod 22. pod22 may implement virtual network interface 26 as an ethernet interface (e.g., named "eth 0"), while virtual router 21 may implement virtual network interface 26 as a tap interface, a virtio user interface, or other type of interface.
The pod22 and virtual router 21 exchange data packets using a virtual network interface 26. The virtual network interface 26 may be a DPDK interface. The pod22 and virtual router 21 can use vhost to set up the virtual network interface 26.pod 22 may operate according to an aggregation model. pod22 may use a virtual device, such as a virtio device with a virtual host-user adapter, for inter-process communication of the user space containers of virtual network interface 26.
CNI 17 may configure virtual network interface 26 for pod22 in conjunction with one or more of the other components shown in fig. 1. Any container of pod22 can utilize (i.e., share) virtual network interface 26 of pod 22.
Virtual network interface 26 may represent a virtual ethernet ("veth") pair, where each end of the pair is a separate device (e.g., a Linux/Unix device), one end of the pair is assigned to pod22 and one end of the pair is assigned to virtual router 21. The veth pair or one end of the veth pair is sometimes referred to as a "port". The virtual network interface may represent a macvlan network with Media Access Control (MAC) addresses assigned to pod22 and virtual router 21 for communication between the container of pod22 and virtual router 21. For example, a virtual network interface may alternatively be referred to as a virtual machine interface, e.g., (VMI), pod interface, pod network interface, tap interface, veth interface, or simply as a network interface (in a particular context).
In the example server 12A of fig. 1, pod 22 is a virtual network endpoint in one or more virtual networks. Orchestrator 23 may store or otherwise manage configuration data for application deployment that specifies the virtual network and that pod 22 (or one or more containers therein) is a virtual network endpoint of the virtual network. For example, orchestrator 23 may receive configuration data from a user, operator/administrator, or other computing system.
As part of the process of creating pod 22, orchestrator 23 requests network controller 24 to create a corresponding virtual network interface for one or more virtual networks (indicated in the configuration data). The pod 22 may have a different virtual network interface for each virtual network to which it belongs. For example, virtual network interface 26 may be a virtual network interface for a particular virtual network. Additional virtual network interfaces (not shown) may be configured for other virtual networks.
The network controller 24 processes the request to generate interface configuration data for the virtual network interface for the pod 22. The interface configuration data may include a container or pod unique identifier, a list or other data structure specifying network configuration data for configuring the virtual network interfaces for each virtual network interface. The network configuration data for the virtual network interface may include a network name, an assigned virtual network address, a MAC address, and/or a domain name server value. The following is an example of interface configuration data in JavaScript object notation (JSON) format.
The network controller 24 transmits the interface configuration data to the server 12A, more specifically, in some cases, to the virtual router 21. To configure the virtual network interface for pod 22, orchestrator 23 may invoke CNI 17. The CNI 17 obtains interface configuration data from the virtual router 21 and processes it. The CNI 17 creates each virtual network interface specified in the interface configuration data. For example, CNI 17 can attach one end of a veth pair implementing management interface 26 to virtual router 21 and the other end of the same veth pair to pod 22, pod 22 can implement it using a virtio-user.
The following is example interface configuration data for pod 22 of virtual network interface 26.
[{
Virtual network interface 26
"id":"fe4bab62-a716-11e8-abd5-0cc47a698428",
"instance-id":"fe3edca5-a716-11e8-822c-0cc47a698428",
"ip-address":"10.47.255.250",
"plen":12,
"vn-id":"56dda39c-5e99-4a28-855e-6ce378982888",
"vm-project-id":"00000000-0000-0000-0000-000000000000",
"mac-address":"02:fe:4b:ab:62:a7",
"system-name":"tapeth0fe3edca",
"rx-vlan-id":65535,
"tx-vlan-id":65535,
"vhostuser-mode":0,
“v6-ip-address”:“::“,
“v6-plen”:,
“v6-dns-server”:“::”,
“v6-gateway”:“::”,
"dns-server":"10.47.255.253",
"gateway":"10.47.255.254",
"author":"/usr/bin/contrail-vrouter-agent",
"time":"426404:56:19.863169"
}]
The CNI plug-in is invoked by the container platform/runtime, receives an Add command from the container platform to Add the container to a single virtual network, and such plug-in may then be invoked to receive a delete (Del) command from the container/runtime and remove the container from the virtual network. The term "call" may refer to an instantiation of a software component or module in memory for execution by a processing circuit as executable code.
The network controller 24 is a cloud-native distributed network controller for Software Defined Networking (SDN) implemented using one or more configuration nodes 30 and one or more control nodes 32. Each configuration node 30 itself may be implemented using one or more cloud native component microservices. Each control node 32 itself may be implemented using one or more cloud native component microservices.
In some examples, configuration node 30 may be implemented by expanding a native orchestration platform to support custom resources for orchestration platforms for software-defined networking, and more specifically, to provide a northbound interface to orchestration platforms to support intent-driven/declarative creation and management of virtual networks, for example, by: configuring a virtual network interface for the virtual execution element, configuring an underlying network of connection servers 12, configuring overlay routing functionality (including overlay tunnels for the virtual network and overlay trees for multicasting layers 2 and 3).
As part of the SDN architecture illustrated in fig. 1, network controller 24 may be multi-tenant aware and support multiple tenants of the orchestration platform. For example, network controller 24 may support Kubernetes role-based access control (RBAC) architecture, local Identity Access Management (IAM), and external IAM integration. The network controller 24 may also support Kubernetes defined networking fabric and high cascading networking features such as virtual networking, BGPaaS, networking policies, service chaining, and other telco features. The network controller 24 may support network isolation using virtual network construction and support layer 3 networking.
As discussed above, by default all pods can communicate with all other pods. In some cases, orchestrator 23 and network controller 24 create a service virtual network and a pod virtual network that are shared by all namespaces, from which service and pod network addresses are assigned, respectively. In some cases, all of the pod's in all namespaces generated in the Kubernetes cluster may be able to communicate with each other, and the network addresses for all of the pod's may be allocated from the pod subnet specified by the orchestrator 23. The pod virtual network may be referred to as a default pod network.
In this sense, kubernetes and other container management platforms can provide a flat networking architecture in which all pods can communicate with all other pods (and node-to-pod, and pod-to-service). When establishing the default pod network, the network controller 24 (possibly together with the orchestrator 23) may configure a master interface for each pod, through which the pod can communicate with the default pod network. The host interface supports Kubernetes defined networking architecture and high cascading networking features such as virtual networking, BGPaaS, networking policies, service chains, and other telco features.
Such an architecture for a container management platform results from the environment of a large data center, where lightweight containers are instantiated for performing discrete tasks (e.g., performing a lookup of a record stored to memory) and then removed when the discrete tasks are completed. In this way, there is little need to provide a network fabric to segment the networks supported by the pod, which results in a master interface that facilitates inter-pod communication via the default pod network.
As container management platforms have begun to be deployed outside of this limited data center environment, the lack of segments provided by container management platforms has led to the development of complex extensions that attempt to add network segments back into the container management platform. These complex extensions may be referred to as secondary interfaces, which are manually defined interfaces through which a pod can communicate with only certain segments of the underlying network (e.g., virtual network) formed by the pod. However, these auxiliary interfaces may not prevent the pod from communicating with other pods via the primary interface and the default pod network, exposing a potential backdoor that malware may utilize to access sensitive data (e.g., personal data, health records, government data, financial data, etc.). Furthermore, these auxiliary interfaces do not support the higher level network features listed above (e.g., virtual networking, BGPaaS, networking policies, service chains, and other telco features).
According to aspects of the techniques described in this disclosure, the network controller 24 may not implement a default pod network and create a master interface that is strictly used for communication over the default pod network (and thus for inter-pod communication), but rather redefine the master interface as a custom Kubernetes resource to allow high-level (non-microscopic) segmentation to limit inter-pod communication, closing the backdoor that the default pod network and master interface provide when the master interface is defined as a standard resource of a Kubernetes or other cloud orchestration platform. In some examples, the pod manifest is annotated as a virtual network specifying custom resources for configuring the primary interface for the pod, and the network controller, when deployed, configures the primary interface for the pod to enable communication over the virtual network specified in the pod manifest annotation. Alternatively, in conjunction with the pod manifest, a namespace can be used to specify a virtual network for the host interface.
The redefined master interface may then be segmented using mesh and/or hub-and-spoke topologies (which may rely on virtual network router-VNR). Although described in more detail below, more information about VNR can be found in U.S. patent application No. 17/809,659, entitled "VIRTUAL NETWORK ROUTERS FOR CLOUD NATIVE SOFTWARE-DEFINED NETWORK ARCHITECTURES," filed on 6 months 29 of 2022, the entire contents of which are incorporated herein by reference. In this case, the host interface may be used to facilitate intercommunication between the pod 22 in a controlled and well-defined manner according to the VNR configuration referencing the host interface and following a well-defined network routing paradigm.
The redefined master interface can reduce security issues associated with backdoors previously introduced by the default pod network in view of the backdoors having been eliminated. Furthermore, the redefined host interface may still support standard host interface functionality, including VNR, border gateway protocol as a service (BGPaaS), layer two (L2) networking, and so forth.
In operation, network controller 24 may receive a request conforming to the container orchestration platform by which to configure a new pod of the plurality of pods (e.g., including pod 22) with a master interface to communicate over a virtual network (e.g., VN 50A) to segment the network formed by the plurality of pods, e.g., as represented by pod 22. The network controller 24 may configure the new pod with a master interface to enable communication via the VN 50A in response to the request.
VN 50A may differ from the default pod network in that not all pods may access VN 50A (which is not used in a strict mathematical sense, where a subset may include zero or more elements, but in a traditional sense, a subset includes at least one but not all elements of pods 22), given that VN 50A segments the underlying network such that there is only a subset of pods 22. The network controller 24 may deploy a number of different VNs 50 to implement an advanced networking architecture that improves security between segments by active segmentation at lower layers of the network (e.g., data link layer, network layer, etc.), while also providing advanced container orchestration constructs and networking typically associated with a master interface defined as standard resources.
As such, the network controller 24 may define custom resources for the primary interface (which may be referred to as a "custom primary interface") that extend the primary interface (which may be referred to as a "standard primary interface") defined as standard resources to facilitate segmentation. To facilitate further advanced segmentation, where different network topologies are used to stitch together different virtual networks according to defined policies, network controller 24 may provide VNR 52, VNR 52 implementing mesh topologies and hub-and-spoke topologies common to networks deployed by companies (such as telcos), large national and/or global entities, universities, and the like. As the standard host interface is extended to support segmentation, the custom host interface also supports the higher level networking features listed above.
The network controller 24 may configure the targets for the custom host interfaces as any virtual network, including the default host network. However, the network controller 24 may not instantiate the default pod network when instantiating the custom host interface, but rather process the request to identify the VN 50A that the new pod (e.g., pod 22) is to communicate by way of the VN 50A via the custom host interface. As noted above, in some cases, the pod manifest annotation may identify VN 50A, while in this or other instances, the namespace may identify VN 50A.
In other words, a container management platform, such as Kubernetes, defines a default pod network as a single CIDR used by all pods in a cluster, which may not provide (network layer) segmentation between pods in the cluster. CN2 may provide an optional enhancement called an isolated namespace where a new default pod, service network, will be created for each namespace to provide network isolation for pods and services within the namespace. However, the granularity of such quarantining is limited to a per namespace basis.
To be able to create pod 22 with different pod networks on a per-namespace or even per-container basis, aspects of the technology implement custom pod networks, which may include pod specific virtual networks and subnets. Custom pod networks can represent such standard virtual networks and subnets: they can be used by defining annotations on the pod definition or namespace definition.
In this regard, aspects of the technology may address the following objectives: the host interface network of the o pod should be customizable on a per pod or namespace basis.
The pod using the custom pod network should be isolated (possibly by default) from all other networks.
The service that selects the pod using the custom pod network should be isolated (possibly by default) from all other networks.
o the pod using the custom pod network should be accessible from the ip structure.
o a pod using a custom pod network should be able to access services (e.g., kube-dns) in the default services network.
The feature that o applies to NAD interfaces should apply to custom pod network interfaces.
Once the custom host interface is configured, the pod can communicate with other pods via the virtual network, outputting packets or other data via the configured custom host interface. To facilitate intercommunication between pods located in different VNs 50, network controller 24 may receive a request to instantiate one or more VNRs 52 mentioned above.
To interconnect multiple VNs 50, network controller 24 may use (and be configured in the underlying and/or virtual router 21) import and export policies defined using Virtual Network Router (VNR) resources. Virtual network router resources may be used to define connectivity between VNs 50 by configuring the importation and exportation of routing information between various routing instances for implementing VNs 50 in an SDN architecture. A single network controller 24 may support multiple clusters, and VNR 52 may thus allow connection of multiple VNs 50 in namespaces, VNs 50 in different namespaces, in clusters, and across clusters. The VNR 52 may also be extended to support virtual network connectivity across multiple instances of the network controller 24. VNR 52 may alternatively be referred to herein as a Virtual Network Policy (VNP) or a virtual network topology.
As shown in the example of fig. 1, network controller 24 may maintain configuration data (e.g., configuration 30) representing Virtual Networks (VNs) 50A-50N ("VNs 50") (which represent policies), as well as other configuration data for establishing VNs 50 within data center 10 through a physical underlying network and/or virtual router (e.g., virtual router 21 ("vrout 21")). Network controller 24 may also maintain configuration data (e.g., configuration 30) representing Virtual Network Routers (VNRs) 52A-52N ("VNRs 52") that may be implemented, at least in part, using policies and other configuration data for establishing interconnectivity between VNs 50
A user, such as an administrator, may interact with UI 60 of network controller 24 to define VN 50 and VNR 52. In some examples, UI 60 represents a Graphical User Interface (GUI) that facilitates input of configuration data defining VNs 50 and VNRs 52. In other examples, UI 60 may represent a Command Line Interface (CLI) or other type of interface. Assuming UI 60 represents a graphical user interface, an administrator may define VN 50 by arranging graphical elements representing different pods (such as pod 22) to associate a pod with VN 50, wherein any VN 50 enables communication between one or more pods assigned to the VN.
In this regard, an administrator may understand Kubernetes or other orchestration platform, but does not fully understand the underlying infrastructure supporting VN 50. Some controller architectures, such as Contrail, may configure VN 50 based on a networking protocol that is similar, if not substantially similar, to the routing protocol in a traditional physical network. For example, contrail may utilize concepts from the Border Gateway Protocol (BGP), a routing protocol that is used to communicate routing information within, and sometimes between, so-called Autonomous Systems (ASs).
There are different versions of BGP, such AS Internal BGP (iBGP) for transporting routing information within the ases, and external BGP (eBGP) for transporting routing information between ases. The AS may be related to the concept of an item in Contrail, which is also similar to the namespaces in Kubernetes. In each instance of an AS, project, and namespace, the AS, similar project, and namespace may represent a set of one or more networks (e.g., one or more VNs 50) that may share routing information and thereby facilitate interconnectivity between networks (alternatively, in this case, VNs 50).
In its simplest form, VNR 52 represents a logical abstraction of a set of routers in the context of a container orchestration platform (e.g., kubernetes), where VNR 52 may be defined as a custom resource to facilitate interconnectivity between VNs 50. In view of the complex propagation of routing information according to complex routing protocols, such as BGP, that may not be fully understood by an administrator, aspects of cloud-native networking techniques may facilitate abstraction of an underlying routing protocol (or a complementary process of a control or other controller architecture), such as VNR 52.
That is, rather than resort to defining how routing occurs between two or more VNRs 50, an administrator may define one or more VNRs 52 to interconnect VNs 50 without manually developing and deploying extensive policies and/or routing instance configurations to enable routing information to be exchanged between such VNs 50. Instead, an administrator (who may have little knowledge of the routing protocol) may define custom resources (e.g., one or more of the VNRs 52) using familiar Kubernetes syntax/semantics (or even merely by dragging a graphical element and specifying an interconnection between that graphical element representing the exemplary VNR 52A and graphical elements representing the again exemplary VNs 50A and 50N).
In this regard, an administrator may readily interconnect VNs 50 using a logical abstraction shown in the example of fig. 1 as VNR 50, whereupon network controller 24 may convert VNR 50 into an underlying routing target to automatically (meaning little or no human intervention) cause routing information of VNs 50A and 50N to be exchanged and enable communication (meaning exchange of packets or other data) between VNs 50A and 50N.
Assuming that an administrator may employ familiar container orchestration platform syntax/semantics to configure VNR 50 instead of complex configuration data conforming to routing protocol syntax/semantics, network controller 24 may facilitate a better user experience while also facilitating more efficient operation of data center 8 itself. That is, having administrators enter configuration data that are unfamiliar to these administrators may result in incorrect configurations, wasting the underlying resources of data center 8 (in terms of processing cycles, memory, bus bandwidth, etc., and associated power), while also delaying proper implementation of the network topology (which may prevent successful routing of packets and other data between VNs 50). Not only may this delay frustrate the administrator, but also the clients associated with VN 50, who may require prompt operation of VN 50 to achieve business goals. By enabling administrators to easily facilitate communications between VNs 50 using logical abstractions as shown by VNR 50, data center 8 may itself experience more efficient operations (including processor cycles, memory, bus bandwidth, and associated power in terms of computing resources described above) while providing a better user experience for administrators and clients.
Network controller 24, representing an SDN architecture system of data center 10, includes processing circuitry for implementing configuration nodes and control nodes (as described in more detail with respect to the example of fig. 3). The network controller 24 may be configured to interconnect a first virtual network (e.g., VN 50A) and a second virtual network (e.g., VN 50N) operating within an SDN architecture system represented by the data center 10. The network controller 24 may be configured to define a logical abstraction of one or more policies to perform such interconnection via one or more of the VNRs 52 (e.g., VNR 52A).
Policies may include import and export policies regarding routing information maintained by the virtual network (which may be referred to as VNs 50A and 50N in this example). That is, kubernetes may be extended via custom resources representing VNR 52A to translate VNR 52A into one or more import and export policies deployed with respect to VNs 50A and 50N to configure intercommunication via routing information distribution between VNR 50A and VNR 50N. Once configured, VN 50A may export routing information (e.g., a route representing VN 50A) to VN 50N and import routing information (e.g., a route representing VN 50N) to VN 50A. Also, VN 50N may export routing information (e.g., a route representing VN 50N) to VN 50A and import routing information (e.g., a route representing VN 50A) to VN 50N.
The abstraction may hide the underlying routing configuration to enable such route leakage, such as defining routing targets for the importation and exportation of routing information to the routing instances used to implement VNs 50A and 50N. Instead, the network controller 24 may convert the VNR 52A into a common routing target and configure communication of routing information via the common routing target for the routing instance used to implement the VNs 50A and 50N (in this example).
To achieve mesh connectivity, network controller 24 may configure the importation and exportation of routing instances of VNs 50A, VN 50N and VNR 52A with routing targets associated with VNs 50A, VN N and VNR 52A. To achieve hub-spoke connectivity, network controller 24 may configure the derivation of routing instances associated with VNs 50A and 50N to derive routing information to routing instances associated with VNR 52A (acting as a hub), and configure the import of routing instances for VNR 52A to import routing information to routing instances associated with VNs 50A and 50N. In such central radiation connectivity, VN 50A and VN 50N may not communicate directly with each other.
In addition, network controller 24 may use network policies to implement multi-layer security. The default behavior of Kubernetes is communication between a pod and another pod. In order to apply network security policies, the SDN architecture implemented by the network controller 24 and the virtual router 21 may operate as a CNI of Kubernetes through CNI 17. For layer 3, the quarantine occurs at the network layer and the virtual network operates at L3. The virtual networks are connected by policies. Kubernetes native network policy provides security at layer 4. The SDN architecture may support Kubernetes network policies. Kubernetes network policies operate at Kubernetes namespace boundaries. SDN architecture may add custom resources to enhance network policies. The SDN architecture may support application-based security. (these security policies may be based on meta-tags in some cases, applying granular security policies in an extensible manner.) for layer 4+, SDN architecture may support integration with containerized security devices and/or ission in some examples, and may provide encryption support.
The network controller 24, which is part of the SDN architecture illustrated in fig. 1, may support multi-cluster deployment, which is important for telco and high-end enterprise use cases. For example, an SDN architecture may support multiple Kubernetes clusters. The cluster API may be used to support lifecycle management of Kubernetes clusters. KubefedV2 may be used to join configuration nodes 32 across Kubernetes clusters. Cluster API and KubefedV2 are optional components for supporting a single instance of network controller 24 that can support multiple Kubernetes clusters.
SDN architecture may use network user interfaces and telemetry components to provide insight into infrastructure, clusters, and applications. Telemetry nodes may be cloud-native and include micro-services that support insight.
Due to the above features and other features that will be described elsewhere herein, the computing infrastructure 8 implements a cloud-native SDN architecture and may present one or more of the following technical advantages. For example, the network controller 24 is a lightweight distributed application with cloud-native that simplifies installation space. This facilitates easier and modular upgrades of various component microservices for configuration node(s) 30 and control node(s) 32 (and any other components of other examples of network controllers described in this disclosure). These techniques may further enable optional cloud-native monitoring (telemetry) and user interfaces, high-performance data planes for containers using DPDK-based virtual routers connected to DPDK-enabled pods, and in some cases cloud-native configuration management with a configuration framework for existing orchestration platforms (such as Kubernetes or Openstack). As a cloud native architecture, the network controller 24 is an extensible and resilient architecture to address and support multiple clusters. In some cases, the network controller 24 may also support scalability and performance requirements for Key Performance Indicators (KPIs).
SDN architectures with features and technical advantages such as those described herein may be used to implement cloud-native telco clouds to support, for example, 5G mobile networking (and subsequent generations) and edge computing, as well as enterprise Kubernetes platforms, such as high-performance cloud-native application hosting. telco cloud applications are rapidly turning to containerized cloud-native methods. 5G fixed and mobile networks are pushing demands to deploy workloads as micro-services with significant decomposition, especially in the 5G next generation RAN (5 GNR). The 5G next generation core network (5 GNC) may be deployed as a set of micro-service based applications corresponding to each of the different components described by the 3 GPP. When considering 5GNC as a micro-service group for delivery applications, it may be a highly complex pod combination with complex networking, security and policy requirements. The cloud native SDN architecture described herein has well-defined networking constructs, security, and policies that may be used for this use case. The network controller 24 may provide a related API to enable creation of these complex constructs.
Also, the User Plane Function (UPF) within 5GNC will be an ultra-high performance application. It can be delivered as a set of highly distributed high performance pod. The SDN architecture described herein may be capable of providing a very high throughput data plane, whether in terms of bits per section (bps) or packets per second (pps). Integration with DPDK virtual router, eBPF and SmartNIC with the latest performance enhancements will help to achieve the required throughput. The DPDK-based virtual router is described in further detail in U.S. application No. 17/649,632 entitled "CONTAINERIZED ROUTER WITH VIRTUAL NETWORKING" filed on 1-2-2022, the entire contents of which are incorporated herein by reference.
High performance processing may also be relevant in gilans, as the workload therein migrates from the more traditional virtualized workload to the containerized micro-service. In the data plane of UPF and GiLAN services, such as GiLAN firewalls, intrusion detection and prevention, virtualized IP multimedia subsystem (vIMS) voice/video, etc., throughput will be high and sustainable in terms of bps and pps. For the control plane of 5GNC functions, such as access and mobility management functions (AMFs), session Management Functions (SMFs), etc., and for some GiLAN services (e.g., IMS), while absolute traffic volumes may not be large in bps, small packets dominate meaning pps will remain high. In some examples, the SDN controller and data plane provide millions of packets per second for each virtual router 21, as implemented on server 12. In a 5G Radio Access Network (RAN), in order to be remote from the proprietary vertical integrated RAN stack provided by a legacy radio provider, an open RAN decouples RAN hardware and software in many components including non-RT Radio Intelligent Controllers (RIC), near real-time RIC, centralized Unit (CU) control and user planes (CU-CP and CU-UP), distributed Units (DUs) and Radio Units (RU). The software components are deployed on a commercial server architecture and supplemented with a programmable accelerator if necessary. The SDN architecture described herein may support O-RAN specifications.
Edge computation may be primarily directed to two different use cases. The first would support a containerized telco infrastructure (e.g., 5G RAN, UPF, security functions) and the second would be for containerized service workloads, including containerized service workloads from telco and from third parties such as suppliers or enterprise customers. In both cases, edge computation is in fact a special case of GiLAN, where traffic is broken down for special handling at highly distributed locations. In many cases, the resources (electricity, cooling, space) of these locations are limited.
The SDN architecture described herein may be well suited to support very lightweight space-consuming requirements, may support computing and storage resources in sites remote from associated control functions, and may be location-aware in the manner in which workloads and stores are deployed. Some sites may have only one or two compute nodes delivering a very specific set of services for a highly localized set of users or other services. There may be a hierarchy of sites where a central site is densely connected with many paths, while regional sites are multiply connected with two to four uplink paths, while remote edge sites may have connections with only one or two upstream sites.
This requires great flexibility in the way that the SDN architecture can be deployed and the way (and location) that tunnel traffic in the overlay is terminated and bound to the core transport network (SRv, MPLS, etc.). Also, in sites hosting telco cloud infrastructure workloads, the SDN architecture described herein may support the dedicated hardware (GPU, smartNIC, etc.) required for high performance workloads. There may also be a workload that requires an SR-IOV. As such, the SDN architecture may also support creating a VTEP at ToR and linking it back into the overlay as VXLAN.
It is expected that there will be a mix of fully distributed Kubernetes micro clusters, where each site runs its own host(s), and that SDN architecture can support a remote computing like scenario.
For use cases involving enterprise Kubernetes platforms, high performance cloud native applications provide power for financial services platforms, online gaming services, and hosted application service providers. Cloud platforms delivering these applications must provide high performance and failure recovery capabilities with high security and visibility. Applications hosted on these platforms tend to be developed in-house. Application developers and platform owners cooperate with infrastructure teams to deploy and operate instances of organizational applications. These applications often require high throughput (20 Gbps per server) and low latency. Some applications may also use multicasting for signaling or payload traffic. Additional hardware and network infrastructure may be utilized to ensure availability. Applications and micro-services will partition using namespaces within the cluster. Isolation between namespaces is critical in a high security environment. While the default rejection policy is a standard state in a zero trust application deployment environment, the addition of additional network segments using virtual routing and forwarding instances (VRFs) adds an additional layer of security and allows the use of an overlay network scope. The network scope of coverage is a key requirement of hosted applications hosting environments that tend to be standardized across a set of accessible endpoints for all hosted clients.
Complex microservice-based applications tend to utilize complex network filters. The SDN architecture described herein may deliver high performance firewall filtering on a large scale. Such filtering may exhibit consistent forwarding performance with less delay degradation, regardless of the length or order of rule sets. Some customer separations with respect to applications may also face the same regulatory pressures as telco, not only at the network layer, but also in the kernel. The financial industry, as well as other industries, requires data plane encryption, particularly when running on public clouds. In some examples, the SDN architecture described herein may include features for meeting these requirements.
In some examples, the SDN architecture may provide a GitOps-friendly UX for strict change management control, auditing, and reliability of changes in production several times per day, even hundreds of times per day when the SDN architecture is automated through application development/testing/phase/product continuous integration/continuous development (CI/CD) pipelines.
In addition, custom host interface techniques may provide one or more technical advantages. For example, the container management platform may provide advanced networking that facilitates network segmentation at lower networking layers, such as a data link layer (including, as one example, a virtual data link layer represented by a virtual interface), a network layer, and the like. By removing the backgate provided by the default master interface and the associated default pod network provided by the standard Kubernetes resource, the network controller 24 implementing a container management platform (such as Kubernetes) may redefine the master resource as a custom resource to reduce, if not eliminate, inter-pod communication that would otherwise provide a secure backgate. Improving network security may reduce attacks that expose sensitive data (e.g., personal data) and other vulnerabilities, thereby enabling a more complex and secure networking architecture. Removing security vulnerabilities may be beneficial to the network controller 24 itself because malware may not be able to access sensitive data, where preventing such access may reduce consumption of computing resources (e.g., by not executing such malware that consumes processing cycles, memory, bus bandwidth of memory, related power, etc.).
Fig. 2 is a block diagram illustrating an example of a cloud native SDN architecture for cloud native networking in accordance with the techniques of this disclosure. SDN architecture 200 is illustrated in a manner that abstracts underlying connectivity between the various components. In this example, network controller 24 of SDN architecture 200 includes configuration nodes 230A-230N ("configuration nodes" or "config nodes" and collectively, "configuration nodes 230") and control nodes 232A-232K (collectively, "control nodes 232")). Configuration node 230 and control node 232 may represent example implementations of configuration node 30 and control node 32, respectively, of fig. 1. Configuration node 230 and control node 232, although illustrated as separate from server 12, may be implemented as one or more workloads on server 12.
Configuration node 230 provides a northbound, representational state transfer (REST) interface to support the intent-driven configuration of SDN architecture 200. Example platforms and applications that may be used to push intent to configuration node 230 include virtual machine orchestrator 240 (e.g., openstack), container orchestrator 242 (e.g., kubernetes), user interface 242, or other one or more applications 246. In some examples, SDN architecture 200 has Kubernetes as its underlying platform.
SDN architecture 200 is divided into a configuration plane, a control plane, and a data plane, and optionally a telemetry (or analysis) plane. The configuration plane is implemented with a horizontally scalable configuration node 230, the control plane is implemented with a horizontally scalable control node 232, and the data plane is implemented with a compute node.
At a high level, configuration nodes 230 use configuration store 224 to manage the state of configuration resources of SDN architecture 200. In general, a configuration resource (or more simply "resource") is a named object schema that includes data and/or methods describing custom resources, and defines an Application Programming Interface (API) for creating and manipulating data through an API server. The category is the name of the object schema. Configuration resources may include Kubernetes native resources such as pod, portal, configuration map, service, role, namespace, node, network policy, or load balancer.
In accordance with the techniques of this disclosure, the configuration resources also include custom resources that are used to extend the Kubernetes platform by defining application interfaces (APIs) that may not be available in the default installation of the Kubernetes platform. In examples of SDN architecture 200, custom resources may describe physical infrastructure, virtual infrastructure (e.g., custom master interfaces, VNs 50, and/or VNRs 52), configuration, and/or other resources of SDN architecture 200. As part of configuring and operating SDN architecture 200, various custom resources (e.g., VNR 52 within vruter 21) may be instantiated. Instantiated resources (whether native or custom) may be referred to as objects or instances of resources, which are persistent entities in the SDN architecture 200 that represent the intent (desired state) and state (actual state) of the SDN architecture 200.
Configuration node 230 provides an aggregation API for performing operations (i.e., creating, reading, updating, and deleting) on configuration resources of SDN architecture 200 in configuration store 224. Load balancer 226 represents one or more load balancer objects that load balance configuration requests among configuration nodes 230. Configuration store 224 may represent one or more etcd databases. Configuration node 230 may be implemented using ng inx.
SDN architecture 200 may provide networking for both Openstack and Kubernetes. Openstack uses a plug-in architecture to support networking. The Openstack networking plug-in driver converts the Openstack configuration object into SDN architecture 200 configuration objects (resources) using virtual machine orchestrator 240 as Openstack. The compute node runs Openstack nova to start the virtual machine.
With the container orchestrator 242 as Kubernetes, the sdn architecture 200 acts as a Kubernetes CNI. As noted above, kubernetes native resources (pod, service, portal, external load balancer, etc.) may be supported, and SDN architecture 200 may support custom resources of Kubernetes for advanced networking and security of SDN architecture 200.
Configuration node 230 provides REST monitoring to control node 232 to monitor configuration resource changes that control node 232 affects within the computing infrastructure. Control node 232 receives configuration resource data from configuration node 230 by monitoring resources and builds a complete configuration graph. A given one of the control nodes 232 consumes configuration resource data associated with the control node and distributes the desired configuration to the compute nodes (servers 12) via the control interface 254 (i.e., virtual router agents—not shown in fig. 1) in the control plane of the virtual router 21. Any computing node 232 may receive only a portion of the graph required for processing. Control interface 254 may be XMPP. The number of deployed configuration nodes 230 and control nodes 232 may be a function of the number of supported clusters. To support high availability, the configuration plane may include 2n+1 configuration nodes 230 and 2N control nodes 232.
Control node 232 distributes routes among the compute nodes. Control nodes 232 exchange routes between control nodes 232 using iBGP, and control nodes 232 may peer with any external BGP-supported gateway or other router. The control node 232 may use a routing reflector.
Container 250 and virtual machine 252 are examples of workloads that may be deployed to computing nodes by virtual machine orchestrator 240 or container orchestrator 242 and interconnected by SDN architecture 200 using one or more virtual networks.
Fig. 3 is a block diagram illustrating another view of components of SDN architecture 200 in greater detail in accordance with the techniques of this disclosure. Configuration node 230, control node 232, and user interface 244 are illustrated with their respective component micro services for implementing network controller 24 and SDN architecture 200 as a cloud native SDN architecture. Each component microservice may be deployed to a compute node.
Fig. 3 illustrates a single cluster divided into network controller 24, user interface 244, computing (server 12), and telemetry 260 features. The configuration node 230 and the control node 230 together form the network controller 24.
Configuration node 230 may include a component micro-service API server 300 (or "Kubernetes API server 300" —corresponding controller 406 is not shown in fig. 3), custom API server 301, custom resource controller 302, and SDN controller manager 303 (sometimes referred to as a "kube manager" or "SDN kube manager," where the orchestration platform for network controller 24 is Kubernetes). The Contrail-kube manager is one example of SDN controller manager 303. The configuration node 230 extends the interface of the API server 300 with the custom API server 301 to form an aggregation layer to support the data model of the SDN architecture 200. The SDN architecture 200 deployment intent may be a custom resource, as described above.
The control node 232 may include a component micro service control 320 and a core DNS 322. Control 320 performs configuration distribution and route learning and distribution as described above with respect to fig. 2.
The compute nodes are represented by servers 12. Each computing node includes a virtual router agent 316 and a virtual router forwarding component (vruter) 318. Either or both of the virtual router agent 316 and the vruter 318 may be component microservices. In general, the virtual router agent 316 performs control-related functions. The virtual router agent 316 receives the configuration data from the control node 232 and converts the configuration data into forwarding information for the vruter 318. The virtual router agent 316 may also perform firewall rule processing, set the flow of the directed vruter 318, and interface with orchestration plug-ins (the Nova plug-ins of Kubernetes' CNI and Openstack). When a workload (pod or VM) is started on a compute node, virtual router agent 316 generates a route, and virtual router 316 exchanges such a route with control nodes 232 to distribute to other compute nodes (control nodes 232 use BGP to distribute routes among control nodes 232). When the workload terminates, the virtual router agent 316 also withdraws the route. The vruter 318 may support one or more forwarding modes, such as kernel mode, DPDK, smartNIC offload, and so on. In some examples of container architecture or virtual machine workload, the compute nodes may be Kubernetes worker/slave nodes or Openstack nova compute nodes, depending on the particular orchestrator used.
One or more optional telemetry nodes 260 provide metrics, alarms, logging, and flow analysis. SDN architecture 200 telemetry utilizes cloud native monitoring services such as Prometheus, elastic, fluentd, kinaba stack (EFK) and Influx TSDB. The SDN architecture component microservices of the configuration node 230, the control node 232, the compute nodes, the user interface 244, and the analysis node (not shown) may generate telemetry data. The telemetry data may be consumed by the service of telemetry node(s) 260. Telemetry node(s) 260 may expose REST endpoints for users and may support insight and event correlation.
The optional user interface 244 includes a web User Interface (UI) 306 and UI backend 308 services. In general, the user interface 244 provides configuration, monitoring, visualization, security, and troubleshooting for SDN architecture components.
Each of telemetry 260, user interface 244, configuration node 230, control node 232, and server 12/compute nodes may be considered SDN architecture 200 nodes, as each of these nodes is an entity for implementing the functionality of a configuration, control, or data plane, or the functionality of UI and telemetry nodes. The node scale is configured during "startup" and SDN architecture 200 supports automatically scaling SDN architecture 200 nodes using orchestration system operators (such as Kubernetes operators).
As noted above, SDN architecture 200 configuration intent may be a custom resource. One such custom resource may include a custom host interface through which communications are established between virtual networks and pods that are used to segment an underlying network supported by a plurality of different containers. Additional custom resources include VNR 52 (shown in the example of fig. 1), through which VNR 52 communication is established between two or more VNs 50 in the manner described above. As noted above, VNR 52 may represent a logical abstraction of policies used to configure the importation and exportation of routing information between VNRs 50, whereby VNR 52 may facilitate the exchange of routing information (referred to as asymmetric or symmetric importation and exportation) using common routing targets established for each VNR 52. A common route target may be defined and associated with a route instance for implementing VN 50.
An administrator, such as a Kubernetes administrator, may interact with user interface 244 (e.g., web UI 306) to define VNR 52, possibly via a graphical user interface with graphical elements representing pod, VN 50, etc. To define VNR 52, an administrator may associate VNR 52 with one or more labels assigned to VN 50. Using these labels, the VNR 52 can establish import and export policies to and from common routing targets created when the VNR 52 is instantiated. Web UI 306 may interact with configuration controller 230 to create a common routing target to install the common routing target into one or more virtual routers 318 via control node 232. The Web UI 306 may also define route instances for the common route targets (to distinguish the common route targets from other common route targets) via the configuration controller 230 and the control nodes 232, thereby utilizing the routing domain to facilitate interconnection between VNs 50, as will be described in more detail below with respect to a plurality of different networking schemes.
Fig. 4 is a block diagram illustrating example components of an SDN architecture in accordance with the techniques of this disclosure. In this example, SDN architecture 400 extends and uses Kubernetes API server for implementing network configuration objects for user intent for network configuration. In Kubernetes terminology, such configuration objects are referred to as custom resources, and simply as objects when reserved in the SDN architecture. The configuration object is mainly a user intent (e.g., a virtual network such as VN 50, VNR 52, BGPaaS, network policy, service chain, etc.).
SDN architecture 400 configuration node 230 may use a Kubernetes API server to configure objects. In kubernetes terminology, these are referred to as custom resources.
Kubernetes provides two ways to add custom resources to a cluster:
● Custom Resource Definitions (CRDs) are simple and can be created without any programming.
● API aggregation requires programming, but allows more control over API behavior, such as data storage and conversion between API versions.
The aggregate API is a slave API server located behind the master API server that acts as a proxy. This arrangement is known as API Aggregation (AA). To the user, this appears to be just an extension of the Kubernetes API. The CRD allows the user to create a new type of resource without adding another API server. Regardless of how installed, the new resources are called Custom Resources (CRs) to distinguish them from the native Kubernetes resources (e.g., pod). CRD was used in the original Config prototype. The architecture may implement an aggregation API using an API server builder Alpha library. An API server builder is a collection of libraries and tools for building native Kubernetes syndication extensions.
Typically, each resource in the Kubernetes API requires code to process REST requests and manage persistent storage of objects. The primary Kubernetes API server 300 (implemented with API server micro services 300A-300J) handles native resources and may also handle custom resources, typically through CRD. Aggregation API 402 represents an aggregation layer of extended Kubernetes API server 300 to allow custom implementation for custom resources by writing and deploying custom API server 301 (using custom API server micro-services 301A-301M). The main API server 300 delegates requests for custom resources to the custom API server 301, making such resources available to all its clients.
In this way, API server 300 (e.g., kube-API server) receives the Kubernetes configuration object, the native object (pod, service), and the custom resource. Custom resources for SDN architecture 400 may include configuration objects that, when implementing the desired state of configuration objects in SDN architecture 400, implement the desired network configuration of SDN architecture 400, including the implementation of each custom master interface, VN, and VNR 52. The VNR 52 may be implemented with a common routing target (and routing instance) as one or more import policies and/or one or more export policies. As described above, implementing VNR 52 within SDN architecture 400 may result in an import and/or export policy that interconnects two or more VNs 50, as described in more detail below.
In this regard, custom resources may correspond to configuration modes that are traditionally defined for network configuration, but are extended to be steerable through aggregation API 402 in accordance with the techniques of the present disclosure. Such custom resources may alternatively be referred to and referred to herein as "custom resources for SDN architecture configuration. These may include custom primary resources, VN, VNR, bgp as a service (BGPaaS), subnets, virtual routers, service instances, projects, physical interfaces, logical interfaces, nodes, network ipam, floating ips, alarms, aliases ips, access control lists, firewall policies, firewall rules, network policies, routing targets, routing instances. Custom resources for SDN architecture configuration may correspond to configuration objects traditionally exposed by SDN controllers, but according to the techniques described herein, configuration objects are exposed as custom resources and integrated with Kubernetes native/built-in resources to support a unified intent model exposed by aggregation API 402, aggregation API 402 being implemented by Kubernetes controllers 406A-406N and custom resource controllers 302 (shown in fig. 4 with component micro-services 302A-302L) for coordinating actual and expected states of computing infrastructure including network elements.
Given the unified nature of exposing custom resources that integrate with Kubernetes native/built-in resources, a Kubernetes administrator (or other Kubernetes user) may use common Kubernetes semantics to define custom master interfaces, VNs, and VNRs (e.g., VNR 52) that may then be translated into complex policies detailing the importation and exportation of routing information to facilitate interconnection of VNs 50 without requiring too much (if any) understanding of BGP and other routing protocols required to interconnect VNs 50. In this way, aspects of the technology may facilitate a more uniform user experience, which may result in fewer misconfigurations and trial-and-error, which may improve execution of SDN architecture 400 itself (in terms of utilizing fewer processing cycles, memory, bandwidth, etc., and associated power).
The API server 300 aggregation layer sends the API custom resources to its corresponding, registered custom API server 300. There may be multiple custom API servers/custom resource controllers to support different kinds of custom resources. Custom API server 300 processes custom resources for SDN architecture configuration and writes to configuration repository(s) 304, which configuration repository(s) 304 may be etcd. Custom API server 300 may host and expose SDN controller identifier assignment services that custom resource controller 302 may need.
Custom resource controller(s) 302 begin applying business logic to reach the user intent provided by the user intent configuration. Business logic is implemented as a coordination loop. Fig. 8 is a block diagram illustrating an example of a custom controller for custom resource(s) configured for an SDN architecture in accordance with the techniques of this disclosure. Client controller 814 may represent an example instance of custom resource controller 301. In the example illustrated in fig. 8, custom controller 814 can be associated with custom resource 818. Custom resource 818 may be any custom resource for SDN architecture configuration. Custom controller 814 can include a coordinator 816, where coordinator 816 includes logic to perform a coordination loop, where custom controller 814 observes 834 (e.g., monitors) the current state 832 of custom resource 818. In response to determining that the desired state 836 does not match the current state 832, the coordinator 816 can perform actions to adjust 838 the state of the custom resource such that the current state 832 matches the desired state 836. A request may be received by API server 300 and passed to custom API server 301 to change the current state 832 of custom resource 818 to the desired state 836.
Where the API request 301 is a create request for a custom resource, the coordinator 816 can act on the create event for instance data of the custom resource. Coordinator 816 can create instance data for the custom resource that the requested custom resource depends on. As one example, the edge node custom resources may depend on virtual network custom resources, virtual interface custom resources, and IP address custom resources. In this example, when the coordinator 816 receives a create event on an edge node custom resource, the coordinator 816 can also create custom resources that the edge node custom resource depends on, such as virtual network custom resources, virtual interface custom resources, and IP address custom resources.
By default, custom resource controller 302 is running active-passive mode and uses host election to achieve consistency. When the controller pod starts up, it attempts to create configuration map resources in Kubernetes using the specified keys. If the creation is successful, the pod will become the host and begin processing coordination requests; otherwise it would prevent attempts to create a configuration map in an infinite loop.
Custom resource controller 302 can track the state of custom resources it creates. For example, a Virtual Network (VN) creates a Route Instance (RI) that creates a Route Target (RT). If the creation of the route target fails, the route instance state is downgraded and the virtual network state is also downgraded accordingly. Thus, the custom resource controller 302 can output custom messages indicating the status(s) of these custom resources for troubleshooting. Likewise, the VNR creates an RI that creates an RT in a similar manner as discussed above with respect to the VN, which is also described in more detail with respect to the example of fig. 17. An example flow of creation, monitoring, and reconciliation between custom resource types that have dependencies on different custom resource types is illustrated in FIG. 9.
The configuration plane implemented by the configuration node 230 has high availability. Configuration node 230 may be based on Kubernetes, including kube-apiserver services (e.g., API server 300) and storage backend etcd (e.g., configuration store(s) 304). In effect, aggregation API 402 implemented by configuration node 230 operates as the front-end of the control plane implemented by control node 232. The primary implementation of the API server 300 is kube-apiserver, which is designed to extend horizontally by deploying more instances. As shown, several instances of the API server 300 may be run to load balance API requests and processing.
Configuration store(s) 304 may be implemented as etcd. etcd is a consistent and highly available key-value store that is used as a Kubernetes backing store for cluster data.
In the example of fig. 4, servers 12 of SDN architecture 400 each include orchestration agent 420 and a containerized (or "cloud-native") routing protocol daemon 324. These components of SDN architecture 400 are described in further detail below.
SDN controller manager 303 may operate as an interface between Kubernetes core resources (services, namespaces, pod, network policies, network attachment definitions) and extended SDN architecture resources (virtual networks, routing instances, etc.). SDN controller manager 303 monitors the Kubernetes API to learn about the changes on the user-defined resources and the Kubernetes core for SDN architecture configuration, and thus can perform CRUD operations on the relevant resources.
In some examples, SDN controller manager 303 is a set of one or more Kubernetes custom controllers. In some examples, in a single cluster or multi-cluster deployment, SDN controller manager 303 may run on Kubernetes cluster(s) it manages
SDN controller manager 303 listens for create, delete and update events for the following Kubernetes objects:
·pod
Service
Node port
Inlet port
Endpoint
Namespaces
Deployment
Network policy
When these events are generated, SDN controller manager 303 creates the appropriate SDN architecture object, which in turn is defined as a custom resource for SDN architecture configuration. In response to detecting an event on an instance of a custom resource, whether instantiated by SDN controller manager 303 and/or by custom API server 301, control node 232 obtains configuration data for the instance of the custom resource and configures a corresponding instance of a configuration object in SDN architecture 400.
For example, SDN controller manager 303 monitors for pod creation events and in response, may create the following SDN architecture objects: virtual machine, virtual machine interface, and instant IP. The control node 232 may then instantiate the SDN architecture object in the selected compute node in this case.
As one example, based on the monitoring, control node 232A may detect an event on an instance of a first custom resource exposed by client API server 301A, where the first custom resource is used to configure certain aspects of SDN architecture system 400 and corresponds to a type of configuration object of SDN architecture system 400. For example, the type of configuration object may be a firewall rule corresponding to the first custom resource. In response to this event, control node 232A may obtain configuration data (e.g., firewall rule specifications) for the firewall rule instance and provide the firewall rules in the virtual router for server 12A. Configuration node 230 and control node 232 may perform similar operations on other custom resources of corresponding types of configuration objects having SDN architecture, such as virtual networks, virtual network routers, bgp-as-a-service (BGPaaS), subnets, virtual routers, service instances, items, physical interfaces, logical interfaces, nodes, network ipams, floating ips, alarms, aliases ips, access control lists, firewall policies, firewall rules, network policies, routing targets, routing instances, and the like.
FIG. 5 is a block diagram of an example computing device in accordance with the techniques described in this disclosure. Computing device 500 of fig. 2 may represent a real or virtual server, and may represent an example instance of any server 12, and may be referred to as a computing node, master/slave node, or host. In this example, computing device 500 includes a bus 542, bus 542 coupling the hardware components of the hardware environment of computing device 500. Bus 542 couples a Network Interface Card (NIC) 530, a memory disk 546, and one or more microprocessors 210 (hereinafter referred to as "microprocessors 510"). NIC 530 may have SR-IOV capability. In some cases, a front side bus may couple microprocessor 510 and memory device 524. In some examples, bus 542 may couple memory device 524, microprocessor 510, and NIC 530. Bus 542 may represent a Peripheral Component Interface (PCI) express (PCIe) bus. In some examples, a Direct Memory Access (DMA) controller may control DMA transfers between components coupled to bus 542. In some examples, components coupled to bus 542 control DMA transfers between components coupled to bus 542.
Microprocessor 510 may include one or more processors, each including a separate execution unit to execute instructions conforming to an instruction set architecture, the instructions being stored on a storage medium. The execution units may be implemented as separate Integrated Circuits (ICs) or may be combined within one or more multi-core processors (or "many-core" processors), each implemented using a single IC (i.e., a chip multiprocessor).
Disk 546 represents computer-readable storage media including volatile and/or nonvolatile, removable and/or non-removable media implemented in any method or technology for storage of information such as processor-readable instructions, data structures, program modules, or other data. Computer-readable storage media includes, but is not limited to, random Access Memory (RAM), read Only Memory (ROM), EEPROM, flash memory, CD-ROM, digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by microprocessor 510.
Main memory 524 includes one or more computer-readable storage media, which may include Random Access Memory (RAM), such as various forms of Dynamic RAM (DRAM) e.g., DDR2/DDR3 SDRAM or Static RAM (SRAM), flash memory, or any other form of fixed or removable storage medium, which may be used to carry or store desired program code and program data in the form of instructions or data structures that may be accessed by a computer. Main memory 524 provides a physical address space comprised of addressable memory locations.
The Network Interface Card (NIC) 530 includes one or more interfaces 532 configured to exchange packets using links of an underlying physical network. Interface 532 may include a port interface card having one or more network ports. NIC 530 may also include, for example, on-card memory for storing packet data. Direct memory access transmissions between NIC 530 and other devices coupled to bus 542 may be read from and written to NIC memory.
Memory 524, NIC 530, storage disk 546, and microprocessor 510 may provide an operating environment for a software stack that includes an operating system kernel 580 executing in kernel space. Kernel 580 may represent, for example, linux, berkeley software release (BSD), another Unix variant kernel, or a Windows server operating system kernel available from microsoft corporation. In some instances, an operating system may execute a hypervisor and one or more virtual machines managed by the hypervisor. Example hypervisors include kernel-based virtual machines (KVM) for Linux kernels, xen, ESXi available from VMware, windows Hyper-V available from Microsoft corporation, and other open source and proprietary hypervisors. The term hypervisor may encompass a Virtual Machine Manager (VMM). An operating system including kernel 580 provides an execution environment for one or more processes in user space 545.
Kernel 580 includes physical drivers 525 to use network interface card 530. The network interface card 530 may also implement an SR-IOV to enable sharing of physical network functions (I/O) among one or more virtual execution elements, such as container 529A or one or more virtual machines (not shown in fig. 5). A shared virtual device, such as a virtual function, may provide dedicated resources such that each virtual execution element may access the dedicated resources of NIC 530, and thus NIC 530 appears to each virtual execution element as a dedicated NIC. Virtual functions may represent physical functions used with physical driver 525 as well as lightweight PCIe functions that share physical resources with other virtual functions. For an SR-IOV capable NIC 530, NIC 530 may have thousands of available virtual functions according to the SR-IOV standard, but for I/O intensive applications, the number of configured virtual functions is typically much smaller.
The computing device 500 may be coupled to a physical network switch fabric that includes an overlay network that extends the switch fabric from a physical switch to software or "virtual" routers, including virtual router 506, of physical servers coupled to the switch fabric. The virtual router may be a process or thread or component thereof executed by a physical server (e.g., server 12 of fig. 1) that dynamically creates and manages one or more virtual networks that may be used for communication between virtual network endpoints. In one example, the virtual router implements each virtual network using an overlay network that provides the ability to decouple the virtual address of the endpoint from the physical address (e.g., IP address) of the server on which the endpoint is executing.
Each virtual network may use its own addressing and security scheme and may be considered orthogonal to the physical network and its addressing scheme. Various techniques may be used to transport packets within and across virtual networks through a physical network. The term "virtual router" as used herein may encompass Open VSwitch (OVS), OVS bridges, linux bridges, docker bridges, or other devices and/or software located on host devices and performing switching, bridging, or routing of packets between virtual network endpoints of one or more virtual networks, where the virtual network endpoints are hosted by one or more of servers 12. In the example computing device of fig. 5, virtual router 506 is executed within the user space as a DPDK-based virtual router, but virtual router 506 may be executed within a hypervisor, host operating system, host application, or virtual machine in various implementations.
Virtual router 506 may replace and contain the virtual routing/bridging functionality of the Linux bridge/OVS module typically used for Kubernetes deployment of pod 502. Virtual router 506 may perform bridging (e.g., E-VPN) and routing (e.g., L3VPN, IP-VPN) for virtual networks. Virtual router 506 may perform networking services such as application security policies, NAT, multicasting, mirroring, and load balancing.
The virtual router 506 may be implemented as a kernel module or as a user space DPDK procedure (the virtual router 506 is shown here in the user space 545). The virtual router agent 514 may also execute in user space. In the example computing device 500, the virtual router 506 executes within the user space as a DPDK-based virtual router, but the virtual router 506 may execute within a hypervisor, host operating system, host application, or virtual machine in various implementations. The virtual router agent 514 has a connection to the network controller 24 using a channel that is used to download configuration and forwarding information. The virtual router agent 514 programs this forwarding state to the virtual router data (or "forwarding") plane represented by the virtual router 506. Virtual router 506 and virtual router agent 514 may be processes. Virtual router 506 and virtual router agent 514 are containerized/cloud native.
Virtual router 506 may replace and contain the virtual routing/bridging functionality of the Linux bridge/OVS module typically used for Kubernetes deployment of pod 502. Virtual router 506 may perform bridging (e.g., E-VPN) and routing (e.g., L3VPN, IP-VPN) for virtual networks. Virtual router 506 may perform networking services such as application security policies, NAT, multicasting, mirroring, and load balancing.
Virtual router 506 may be multi-threaded and execute on one or more processor cores. Virtual router 506 may include multiple queues. Virtual router 506 may implement a packet processing pipeline. Depending on the operation to be applied to the packet, the virtual router agent 514 may stitch the pipe from the simplest to the most complex way. Virtual router 506 may maintain multiple instances of forwarding libraries. Virtual router 506 may use RCU (read copy update) locks to access and update tables.
To send packets to other computing nodes or switches, virtual router 506 uses one or more physical interfaces 532. Typically, virtual router 506 exchanges overlay packets with a workload such as VM or pod 502. Virtual router 506 has a plurality of virtual network interfaces (e.g., vifs). These interfaces may include: a kernel interface vhost0 for exchanging packets with the host operating system; interface pkt0 with virtual router agent 514 to obtain forwarding state from the network controller and send exception packets. There may be one or more virtual network interfaces corresponding to one or more physical network interfaces 532. Other virtual network interfaces of virtual router 506 are used to exchange packets with the workload.
In a kernel-based deployment of virtual router 506 (not shown), virtual router 506 is installed as a kernel module inside the operating system. Virtual router 506 registers itself with the TCP/IP stack to receive packets from any desired operating system interface it wants. Interfaces may be join, physical, drop (for VM), veth (for container), etc. The virtual router 506 in this mode relies on the operating system to send and receive packets from different interfaces. For example, the operating system may expose a drop interface supported by the vhost-net driver to communicate with the VM. Once virtual router 506 registers packets from the drop interface, the TCP/IP stack sends all packets to it. Virtual router 506 sends the packet via the operating system interface. In addition, the NIC queues (physical or virtual) are handled by the operating system. Packet processing may operate in an interrupt mode, which generates interrupts and may result in frequent context switches. When higher packet rates are present, the overhead of frequent interrupts and context switches can overwhelm the operating system and lead to performance degradation.
In DPDK-based deployment of virtual router 506 (as shown in fig. 5), virtual router 506 is installed as a user space 545 application linked to the DPDK library. This may lead to faster performance than core-based deployments, especially in the presence of high packet rates. The physical interface 532 is used by the Polling Mode Driver (PMD) of the DPDK instead of the interrupt-based driver of the kernel. Registers of physical interface 532 can be exposed in user space 545 for PMD accessibility; the physical interfaces 532 bound in this way are no longer managed by or visible to the host operating system and the DPDK based virtual router 506 manages the physical interfaces 532. This includes packet polling, packet processing, and packet forwarding. In other words, the user packet processing step is performed by the virtual router 506DPDK data plane. The nature of this "polling mode" makes virtual router 506DPDK data plane packet processing/forwarding more efficient when the packet rate is high than in the interrupt mode. There are relatively few interrupts and context switches during packet I/O as compared to the kernel-mode virtual router 506, and in some cases interrupts and context switches during packet I/O may be avoided entirely.
In general, each pod 502A-502B can be assigned one or more virtual network addresses for use within a corresponding virtual network, where each virtual network can be associated with a different virtual subnet provided by virtual router 506. For example, pod 502B may assign its own virtual layer three (L3) IP address for sending and receiving communications, but may not know the IP address of the computing device 500 on which pod 502B is executing. Thus, the virtual network address may be different from the logical address of the underlying physical computer system (e.g., computing device 500).
The computing device 500 includes a virtual router agent 514 that controls the coverage of the virtual network of the computing device 500 and coordinates the routing of data packets within the computing device 500. In general, virtual router agent 514 communicates with network controller 24 for use in a virtualization infrastructure that generates commands to create a virtual network and configure network virtualization endpoints, such as computing device 500, and more specifically virtual router 506, and virtual network interface 212. By configuring virtual router 506 based on information received from network controller 24, virtual router agent 514 may support configuration network isolation, policy-based security, gateways, source Network Address Translation (SNAT), load balancers, and service linking capabilities for orchestration.
In one example, network packets generated or consumed by containers 529A-529B within the virtual network domain (e.g., layer three (L3) IP packets or layer two (L2) ethernet packets) may be encapsulated in another packet (e.g., another IP or ethernet packet) transmitted by the physical network. Packets transmitted in a virtual network may be referred to herein as "inner packets" and physical network packets may be referred to herein as "outer packets" or "tunnel packets. Encapsulation and/or decapsulation of virtual network packets within physical network packets may be performed by virtual router 506. This functionality is referred to herein as tunneling and may be used to create one or more overlay networks. In addition to ipineip, other example tunneling protocols that may be used include Generic Routing Encapsulation (GRE) -based IP, vxLAN, GRE-based multiprotocol label switching (MPLS), user Datagram Protocol (UDP) -based MPLS, and the like. Virtual router 506 performs tunnel encapsulation/decapsulation for packets originating from/destined to any container of pod 502, and virtual router 506 exchanges packets with pod 502 via the bridge and/or bus 542 of NIC 530.
As noted above, network controller 24 may provide a logically centralized controller to facilitate the operation of one or more virtual networks. The network controller 24 may, for example, maintain a routing information base, such as one or more routing tables, that store routing information for the physical network and one or more overlay networks. Virtual router 506 implements one or more virtual routing and forwarding instances (VRFs), such as VRF 222A, for the respective virtual network in which virtual router 506 operates as a respective tunnel endpoint. In general, each VRF stores forwarding information for the corresponding virtual network and identifies where the data packet is to be forwarded and whether the packet is to be encapsulated in a tunneling protocol, such as with a tunnel header, which may include one or more headers for different layers of the virtual network protocol stack. Each VRF may include a network forwarding table that stores routing and forwarding information for the virtual network.
NIC 530 may receive the tunnel packet. Virtual router 506 processes the tunnel packets to determine virtual networks of source and destination endpoints of the internal packets from the tunnel encapsulation header. Virtual router 506 may strip the layer 2 header and tunnel encapsulation header to forward the inner packet only internally. The tunnel encapsulation header may include a virtual network identifier, such as a VxLAN label or MPLS label, that indicates the virtual network, e.g., the virtual network corresponding to VRF 222A. VRF 222A may include forwarding information for the internal packet. For example, VRF 222A may map a destination layer 3 address for the internal packet to virtual network interface 212. In response, VRF 222A forwards the internal packet to pod502A via virtual network interface 212.
Container 529A may also have the internal packet as a source virtual network endpoint. For example, container 529A may generate a layer 3 internal packet to a destination virtual network endpoint executed by another computing device (i.e., not computing device 500) or to another container. Container 529A may send the group of layer 3 portions to virtual router 506 via a virtual network interface attached to VRF 222A.
Virtual router 506 receives the inner packet and the layer 2 header and determines the virtual network for the inner packet. Virtual router 506 may determine the virtual network using any of the virtual network interface implementation techniques described above (e.g., macvlan, veth, etc.). Virtual router 506 uses VRF 222A corresponding to the virtual network of the inner packet to generate an outer header of the inner packet that includes an outer IP header for overlaying the tunnel and a tunnel encapsulation header identifying the virtual network. The virtual router 506 encapsulates the inner packet with an outer header. Virtual router 506 may encapsulate the tunnel packet with a new layer 2 header having a destination layer 2 address associated with a device external to computing device 500 (e.g., one of TOR switch 16 or server 12). If external to computing device 500, virtual router 506 outputs tunnel packets with new layer 2 headers to NIC 530 using physical function 221. NIC 530 outputs the packet on the outbound interface. If the destination is another virtual network endpoint executing on the computing device 500, the virtual router 506 routes the packet to the appropriate one of the virtual network interfaces 212, 213.
In some examples, a controller of computing device 500 (e.g., network controller 24 of fig. 1) configures a default route in each pod 502 to cause virtual machine 224 to use virtual router 506 as an initial next hop for an outbound packet. In some examples, NIC 530 is configured with one or more forwarding rules to cause all packets received from virtual machine 224 to be switched to virtual router 506.
pod 502A includes one or more application containers 529A. pod 502B includes an instance of a containerized routing protocol daemon (cRPD) 560. Container platform 588 includes container runtime 590, orchestration agent 592, service agent 593, and CNI 570.
The container engine 590 includes code executable by the microprocessor 510. The container runtime 590 can be one or more computer processes. The container engine 590 runs containerized applications in the form of containers 529A-529B. The container engine 590 may represent a dock, rk, or other container engine for managing containers. In general, the container engine 590 receives requests and manages objects such as images, containers, networks, and volumes. The image is a template with instructions for creating a container. A container is an executable instance of an image. Based on instructions from the controller agent 592, the container engine 590 can take images and instantiate them as executable containers in the pod 502A-502B.
The service agent 593 includes code executable by the microprocessor 510. The service agent 593 may be one or more computer processes. The service agent 593 monitors the addition and removal of services and endpoint objects and it maintains the network configuration of the computing device 500 to ensure communication between the pod and the container, e.g., use of the services. The service agent 593 can also manage IP tables to capture traffic to virtual IP addresses and ports of the service and redirect the traffic to agent ports of the agents-supported pod. The service proxy 593 may represent a kube proxy for a slave node of the Kubernetes cluster. In some examples, container platform 588 does not include service agent 593, or service agent 593 is disabled to facilitate configuration of virtual router 506 and pod 502 by CNI 570.
Orchestration agent 592 comprises code executable by microprocessor 510. Orchestration agent 592 can be one or more computer processes. Orchestration agent 592 may represent kubrelet for a slave node of the Kubernetes cluster. Orchestration agent 592 is an agent of an orchestrator (e.g., orchestrator 23 of fig. 1) that receives container specification data for containers and ensures that the containers are executed by computing device 500. The container specification data may be in the form of a manifest file sent from orchestrator 23 to orchestration agent 592, or received indirectly via a command line interface, HTTP endpoint, or HTTP server. The container specification data may be a pod specification (e.g., podspec—yaml (yet another markup language) or JSON object describing a pod) of one of the pods 502 of the container. Based on the container specification data, orchestration agent 592 directs container engine 590 to obtain and instantiate container images of containers 529 for computing device 500 to execute containers 529.
Orchestration agent 592 instantiates or otherwise invokes CNI 570 to configure one or more virtual network interfaces for each pod 502. For example, orchestration agent 592 receives container specification data for pod502A and directs container engine 590 to create pod502A with container 529A based on the container specification data for pod 502A. Orchestration agent 592 also invokes CNI 570 to configure a virtual network interface of the virtual network corresponding to VRF 222A for pod 502A. In this example, pod502A is a virtual network endpoint of a virtual network corresponding to VRF 222A.
CNI 570 may obtain interface configuration data for configuring virtual network interfaces of pod 502. The virtual router agent 514 operates as a virtual network control plane module to enable the network controller 24 to configure the virtual router 506. Unlike orchestration control planes (including container platform 588 for slave nodes and master node(s), e.g., orchestrator 23) that manage provisioning, scheduling, and management of virtual execution elements, virtual network control planes (including network controller 24 and virtual router agent 514 for slave nodes) manage the configuration of the virtual network implemented in the data plane in part by slave node's virtual router 506. Virtual router agent 514 communicates interface configuration data for the virtual network interface to CNI 570 to enable orchestration control plane element (i.e., CNI 570) to configure the virtual network interface according to the configuration state determined by network controller 24, bridging the gap between the orchestration control plane and the virtual network control plane. In addition, this may enable CNI 570 to obtain interface configuration data for and configure multiple virtual network interfaces of the pod, which may reduce communication and resource overhead inherent in invoking separate CNIs 570 for configuring each virtual network interface.
The containerized routing protocol daemon is described in U.S. application Ser. No. 17/649,632, filed on 1/2/2022, the entire contents of which are incorporated herein by reference.
In addition, CNI 570 may be combined with virtual router agent 514, which may configure virtual router 506 to implement a custom host interface to segment the network provided by a pod, such as pod 502. The custom host interface (which may also be referred to as a custom virtual network interface) may not enable communication between the pod 502 via the default cluster network, but rather enable communication between the pod 502 and one or more virtual networks. Thus, these pods 502 may not have a flat network architecture in which each pod 502 can communicate with every other pod in the pods 502. Instead, the pod 502 can communicate with only a selected subset of pods 502 as defined by the virtual network, each of the respective custom host interfaces of each pod 502 configured to interface with the virtual network.
To facilitate communication between the various virtual networks (and as such, a subset of pods 502 configured to interface with each of the various virtual networks), the CNI 570, possibly in combination with the virtual router agent 514, may configure the virtual router 506 to implement the VNR 52. The VNR 52 may cause the routing plane to be configured via one or more input policies and/or one or more output policies for exchanging routing information with a common routing target of the VNR 52. These policies result in routing information maintained for VNs to be exchanged with VNR 52 public routing targets (alternatively, in other words, leaked between VNRs 52), which in turn is parsed into forwarding information. CNI 570 may obtain configuration data from network controller 24 for installing the forwarding information, interface with virtual router agent 514 to install forwarding information through which packets from VN 50 are forwarded.
In other words, creating the common routing target may enable routing information to be imported and exported from one of VNs 50 (e.g., VN 50A) to the common routing target provided by one or more of VNRs 52, and from the common routing target to another one of VNs 50 (e.g., VN 50N) VN 50. The network controller 24 may parse the routing information into the forwarding information described above and install within the virtual router 506 to enable forwarding of packets between the VNs 50A and 50N (in some configurations, such as the mesh configuration described above). In this way, virtual router 506 may establish a way for importing and exporting routing information between VNs 50, which may then be used by VNs 50 to transmit packets between each other of one of VNs 50.
Fig. 6 is a block diagram of an example computing device operating as a computing node of one or more clusters of an SDN architecture system in accordance with the techniques of this disclosure. Computing device 1300 may represent one or more real or virtual servers. In some examples, computing device 1300 may implement one or more master nodes for respective clusters or for multiple clusters.
Scheduler 1322, API server 300A, controller 406A, custom API server 301A, custom resource controller 302A, controller manager 1326, SDN controller manager 1325, control node 232A, and configuration store 1328, although illustrated and described as being executed by a single computing device 1300, may be distributed among multiple computing devices that make up a computing system or hardware/server cluster. In other words, each of the plurality of computing devices may provide a hardware operating environment for one or more instances of any one or more of scheduler 1322, API server 300A, controller 406A, custom API server 301A, custom resource controller 302A, network controller manager 1326, network controller 1324, SDN controller manager 1325, control node 232A, or configuration store 1328.
In this example, computing device 1300 includes a bus 1342, the bus 1342 coupling hardware components of the computing device 1300 hardware environment. The bus 1342 couples a Network Interface Card (NIC) 1330, a memory disk 1346, and one or more microprocessors 1310 (hereinafter referred to as "microprocessors 1310"). In some cases, a front side bus may couple microprocessor 1310 and memory device 1344. In some examples, bus 1342 may couple memory device 1344, microprocessor 1310, and NIC 1330. Bus 1342 may represent a Peripheral Component Interface (PCI) express (PCIe)) bus. In some examples, a Direct Memory Access (DMA) controller may control DMA transfers between components coupled to bus 242. In some examples, components coupled to bus 1342 control DMA transfers between components coupled to bus 1342.
The microprocessor 1310 may include one or more processors, each including a separate execution unit to execute instructions conforming to an instruction set architecture, the instructions stored on a storage medium. The execution units may be implemented as separate Integrated Circuits (ICs) or may be combined within one or more multi-core processors (or "many-core" processors), each implemented using a single IC (i.e., a chip multiprocessor).
Disk 1346 represents computer-readable storage media including volatile and/or nonvolatile, removable and/or non-removable media implemented in any method or technology for storage of information such as processor-readable instructions, data structures, program modules, or other data. Computer-readable storage media includes, but is not limited to, random Access Memory (RAM), read Only Memory (ROM), EEPROM, flash memory, CD-ROM, digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by microprocessor 1310.
Main memory 1344 includes one or more computer-readable storage media, which may include Random Access Memory (RAM), such as various forms of Dynamic RAM (DRAM) e.g., DDR2/DDR3 SDRAM or Static RAM (SRAM), flash memory, or any other form of fixed or removable storage medium, that may be used to carry or store desired program code and program data in the form of instructions or data structures and that may be accessed by a computer. Main memory 1344 provides a physical address space comprised of addressable memory locations.
Network Interface Card (NIC) 1330 includes one or more interfaces 3132 configured to exchange packets using links of an underlying physical network. Interface 3132 may include a port interface card having one or more network ports. NIC 1330 may also include on-card memory, for example, for storing packet data. Direct memory access transmissions between NIC 1330 and other devices coupled to bus 1342 may be read from and written to NIC memory.
The memory 1344, NIC 1330, storage 1346, and microprocessor 1310 may provide an operating environment for a software stack that includes an operating system kernel 1314 executing in kernel space. Kernel 1314 may represent, for example, linux, berkeley software release (BSD), another Unix variant kernel, or a Windows server operating system kernel available from microsoft corporation. In some examples, an operating system may execute a hypervisor and one or more virtual machines managed by the virtual machine hypervisor. Example hypervisors include kernel-based virtual machines (KVM) for Linux kernels, xen, ESXi available from VMware, windows Hyper-V available from Microsoft corporation, and other open source and proprietary hypervisors. The term hypervisor may encompass a Virtual Machine Manager (VMM). An operating system including kernel 1314 provides an execution environment for one or more processes in user space 1345. The kernel 1314 includes a physical driver 1327 to use the network interface card 230.
Computing device 1300 may be coupled to a physical network switch fabric that includes an overlay network that extends the switch fabric from a physical switch to software or virtual routers, such as virtual router 21, of physical servers coupled to the switch fabric. Computing device 1300 can configure slave nodes of a cluster using one or more private virtual networks.
The API server 300A, scheduler 1322, controller 406A, custom API server 301A, custom resource controller 302A, controller manager 1326, and configuration store 1328 may implement the master nodes of the cluster and are alternatively referred to as "master components". The cluster may be a Kubernetes cluster and the master node is a Kubernetes master node, in which case the master component is a Kubernetes master component.
Each of API server 300A, controller 406A, custom API server 301A, and custom resource controller 302A includes code executable by microprocessor 1310. Custom API server 301A validates and configures data for custom resources (such as VN 50 and VNR 52) for SDN architecture configuration. A service may be an abstraction that defines the logical set of pod and the policies used to access pod. The pod set implementing the service is selected based on the service definition. The service may be implemented in part as or otherwise include a load balancer. API server 300A and custom API server 301A may implement a representational state transfer (REST) interface to handle REST operations and provide the front end to the shared state of the corresponding cluster stored to configuration store 1328 as part of the configuration plane of the SDN architecture. API server 300A may represent a Kubernetes API server.
Configuration store 1328 is a backing store for all cluster data. The cluster data may include cluster state and configuration data. The configuration data may also provide a back-end for service discovery and/or provide a locking service. Configuration store 1328 may be implemented as a key-value store. Configuration store 1328 may be a central database or a distributed database. Configuration store 1328 may represent etcd storage. Configuration store 1328 may represent a Kubernetes configuration store.
Scheduler 1322 includes code that may be executed by microprocessor 1310. Scheduler 1322 may be one or more computer processes. The scheduler 1322 monitors newly created or requested virtual execution elements (e.g., pod of container) and selects the slave node on which the virtual execution element will run. The scheduler 1322 may select slave nodes based on resource requirements, hardware constraints, software constraints, policy constraints, locality, and the like. Scheduler 1322 may represent a Kubernetes scheduler.
In general, the API server 1320 can call the scheduler 1322 to schedule the pod. The scheduler 1322 can select a slave node and return the identifier of the selected slave node to the API server 1320, which API server 1320 can write to the configuration store 1328 associated with the pod. API server 1320 may call orchestration agent 310 for the selected slave node, which may cause container engine 208 for the selected slave node to obtain the pod from the storage server and create a virtual execution element on the slave node. Orchestration agent 310 for the selected slave node may update the state of the pod to API server 1320, and API server 1320 saves the new state to configuration store 1328. In this way, computing device 1300 instantiates a new pod in computing infrastructure 8.
The controller manager 1326 includes code executable by the microprocessor 1310. The controller manager 1326 may be one or more computer processes. The controller manager 1326 may embed a core control loop to monitor the sharing status of the cluster by obtaining notifications from the API server 1320. The controller manager 1326 may attempt to move the state of the cluster toward the desired state. The example controller 406A and the custom resource controller 302A may be managed by a controller manager 1326. Other controllers may include a copy controller, an endpoint controller, a namespace controller, and a service account controller. The controller manager 1326 can perform lifecycle functions such as namespace creation and lifecycle, event garbage collection, terminated pod garbage collection, cascade delete garbage collection, node garbage collection, and the like. The controller manager 1326 may represent a Kubernetes controller manager for a Kubernetes cluster.
The network controller of the SDN architecture described herein may provide cloud networking for computing architectures operating on a network infrastructure. Cloud networking may include private clouds of enterprises or service providers, infrastructure as a service (IaaS), and Virtual Private Clouds (VPC) of Cloud Service Providers (CSP). Private cloud, VPC, and IaaS use cases may involve a multi-tenant virtualized data center, such as described with respect to fig. 1. In this case, multiple tenants within the data center share the same physical resources (physical servers, physical storage, physical networks). Each tenant is assigned its own logical resources (virtual machines, containers, or other forms of virtual execution elements; virtual storage; virtual networks). These logical resources are isolated from each other unless specifically allowed by security policies. Virtual networks within the data center may also be interconnected to physical IP VPNs or L2 VPNs.
Network controllers (or "SDN controllers") may provide Network Function Virtualization (NFV) to networks such as business edge networks, broadband subscriber management edge networks, and mobile edge networks. NFV involves orchestration and management of networking functions in virtual machines, containers, or other virtual execution elements, rather than physical hardware devices, such as firewalls, intrusion detection or prevention systems (IDS/IPS), deep Packet Inspection (DPI), caches, wide Area Network (WAN) optimizations, and so forth.
SDN controller manager 1325 includes code executable by microprocessor 1310. SDN controller manager 1325 may be one or more computer processes. SDN controller manager 1325 operates as an interface between orchestration-oriented elements (e.g., scheduler 1322, API server 300A and custom API server 301A, controller manager 1326, and configuration store 1328). Typically, SDN controller manager 1325 monitors clusters for new Kubernetes native objects (e.g., pod and services). SDN controller manager 1325 may isolate the pod in the virtual network and connect the pod with services and interconnect the virtual network using so-called virtual network routers (so-called virtual network routers should not be confused with virtual routers that implement virtual network routers in the form of various import and export policies to the common routing targets described above to facilitate interconnection between virtual networks).
SDN controller manager 1325 may be executed as a container of master nodes of the cluster. In some cases, using SDN controller manager 1325 enables disabling service agents (e.g., kubernetes kube-agents) of slave nodes such that all pod connectivity is implemented using virtual routers, as described herein.
The components of the network controller 24 may operate as CNIs of Kubernetes and may support a variety of deployment modes. CNI 17, CNI 750 are compute node interfaces for this overall CNI framework for managing networking of Kubernetes. Deployment modes can be divided into two categories: (1) An SDN architecture cluster as a CNI integrated into a workload Kubernetes cluster, and (2) an SDN architecture cluster as a CNI separate from the workload Kubernetes cluster.
Integration with workload Kubernetes clusters
The components of the network controller 24 (e.g., custom API server 301, custom resource controller 302, SDN controller manager 1325, and control node 232) operate in a hosted Kubernetes cluster on a master node that is close to the Kubernetes controller components. In this mode, the components of the network controller 24 are actually part of the same Kubernetes cluster as the workload.
Separate from workload Kubernetes clusters
The components of the network controller 24 will be executed by Kubernetes clusters separate from the workload Kubernetes clusters.
SDN controller manager 1325 may use the controller framework of the orchestration platform to snoop (or otherwise monitor) changes to objects defined in the Kubernetes native API and add annotations to some of these objects. The annotation may be a tag or other identifier (e.g., "virtual network green") that specifies an object attribute. SDN controller manager 1325 is a component of the SDN architecture that listens for Kubernetes core resources (such as pod, network policies, services, etc.) events and converts these events into custom resources for SDN architecture configuration as needed. CNI plug-ins (e.g., CNI 17, 570) are SDN architecture components that support the Kubernetes networking plug-in standard: a container network interface.
SDN controller manager 1325 may use REST interfaces exposed by aggregation API 402 to create network solutions for applications to define network objects such as virtual networks, virtual network interfaces, and access control policies. The network controller 24 component may implement a network solution in a computing infrastructure by, for example, configuring one or more virtual networks and virtual network interfaces in a virtual router. (this is just one example of an SDN configuration.)
The following example deployment configuration for the application consists of the pod and virtual network information for the pod:
this metadata information can be copied to each pod copy created by the controller manager 1326. When SDN controller manager 1325 is notified of these pods, SDN controller manager 1325 may create virtual networks (in the above example, "red network", "blue network" and "default/extns network") as listed in the notes and create a virtual network interface for each pod copy (e.g., pod 202A) for each virtual network with a unique private virtual network address from the cluster-wide address block (e.g., 10.0/16) of the virtual network.
Contrail is an example network controller architecture. Contrail CNI can be a CNI developed for Contrail. The cloud native Contrail controller may be one example of a network controller described in this disclosure, such as network controller 24.
Fig. 7A is a block diagram illustrating a control/routing plane for an underlying network and overlay network configuration using an SDN architecture in accordance with the techniques of this disclosure. Fig. 7B is a block diagram illustrating a configured virtual network for connecting pods using configured tunnels in an underlying network in accordance with the techniques of this disclosure.
The network controller 24 for the SDN architecture may use a distributed or centralized routing plane architecture. The SDN architecture may use a containerized routing protocol daemon (process).
From the perspective of network signaling, the routing plane may operate according to a distributed model, where cRPD runs on each compute node in the cluster. This essentially means that the intelligence is built into the compute nodes and involves complex configurations at each node. The Route Reflector (RR) in this model may not make intelligent routing decisions, but is used as a relay reflecting routes between nodes. A distributed container routing protocol daemon (cRPD) is a routing protocol process that may be used in which each computing node runs its own routing daemon instance. Meanwhile, the centralized cRPD master instance may act as an RR to relay routing information between computing nodes. Routing and configuration intelligence is distributed across nodes, with RRs located at a central location.
The routing plane may alternatively operate according to a more centralized model, where the components of the network controller operate centrally and absorb the intelligence required to process configuration information, construct the network topology, and program the forwarding plane into the virtual router. The virtual router agent is a home agent that processes information programmed by the network controller. This design results in more limited intelligence required at the compute nodes being facilitated and tends to result in simpler configuration states.
The centralized control plane provides the following:
allowing the proxy routing framework to become simpler and lighter. The complexity and limitations of BGP are hidden from agents. The agent need not understand concepts such as route specifiers, route targets, etc. The agent only needs to exchange the prefix and construct its forwarding information accordingly
Not just routing that the control node can do. They build on virtual network concepts and can use route replication and regeneration to generate new routes (e.g., to support features such as service chaining and inter-VN routing, among other use cases).
Build BUM tree to achieve optimal broadcast and multicast forwarding.
Note that the control plane has a distributed nature for certain aspects. As a control plane supporting distributed functionality, it allows each local virtual router agent to publish its local routes and subscribe to configurations on an as-needed basis.
It makes sense to consider the control plane design from the tool POV and to use the hand worker appropriately where it is most appropriate. Consider the advantages and disadvantages of condil-bgp and cRPD.
The following functionality may be provided by the cRPD or a control node of the network controller 24.
Routing daemon/process
Both the control node and cRPD may act as a routing daemon implementing different protocols and having the ability to program routing information in the forwarding plane.
cRPD implements a routing protocol using a rich routing stack that includes Interior Gateway Protocols (IGPs) (e.g., intermediate system-to-intermediate system (IS-IS)), BGP-LU, BGP-CT, SR-MPLS/SRv, bidirectional Forwarding Detection (BFD), path Computation Element Protocols (PCEP), etc. It can also be deployed to provide control plane services only, such as routing reflectors, and is popular in internet routing use cases because of these capabilities.
The control node 232 also implements the routing protocol, but is primarily BGP-based. The control node 232 understands the overlay networking. The control node 232 provides a rich feature set in the overlay virtualization and caters to SDN use cases. Overlay features such as virtualization (using an abstraction of a virtual network) and service chaining are popular among telco and cloud providers. In some cases, cRPD may not support such overlay functionality. However, the rich feature set of CRPD provides strong support for the underlying network.
Network orchestration/automation
The routing functionality is only a part of the control node 232. An overlay networking component is orchestration. In addition to providing overlay routing, control node 232 also helps model orchestration functionality and provides network automation. Central to the orchestration capability of control node 232 is the ability to use virtual network (and related object) based abstractions (including the VNR described above) to model network virtualization. Control node 232 interfaces with configuration node 230 to relay configuration information to both the control plane and the data plane. The control node 232 also assists in establishing overlay trees for multicasting layer 2 and layer 3. For example, the control node may establish a virtual topology of the clusters it serves to achieve this. cRPD typically does not include such orchestration capability.
High availability and horizontal scalability
The control node design is more centralized, while cRPD is more distributed. A cRPD operating node runs on each computing node. On the other hand, the control node 232 does not run computationally, but may even run on a remote cluster (i.e., separate and in some cases geographically remote from the workload cluster). The control node 232 also provides horizontal scalability to the HA and operates in an active-active mode. The computational load is shared between the control nodes 232. cRPD, on the other hand, typically does not provide horizontal scalability. Both control node 232 and cRPD may provide a graceful restart for the HA and may allow the data plane to operate in headless mode-where the virtual router may operate even if the control plane is restarted.
The control plane should not be just a routing daemon. It should support overlay routing and network orchestration/automation, while cRPD works well as a routing protocol that manages the underlying routes. However, cRPD generally lacks network orchestration capability and does not provide strong support for overlay routing.
Thus, in some examples, the SDN architecture may have cRPD on a compute node, as shown in 7A-7B. Fig. 7A illustrates an SDN architecture 700, which may represent an example implementation of an SDN architecture 200 or 400. In SDN architecture 700, cRPD 324 runs on compute nodes and provides the underlying routes to the forwarding plane while running a centralized (and horizontally scalable) set of control nodes 232 that provide orchestration and coverage services. In some examples, a default gateway may be used instead of running cRPD 324 on a computing node.
cRPD 324 on a compute node interacts with virtual router agent 514 through the use of interface 540 (which may be a gRPC interface) to provide rich underlying routes to the forwarding plane. The virtual router proxy interface may allow programming of routes, configuring virtual network interfaces for overlays, and otherwise configuring virtual router 506. This is described in more detail in U.S. application Ser. No. 17/649,632. At the same time, one or more control nodes 232 operate as separate pods providing overlay services. Thus, SDN architecture 700 may obtain the rich coverage and orchestration provided by control node 232 and the modern underlying routing provided by cRPD 324 on the compute node to supplement control node 232. A separate cRPD controller 720 may be used to configure cRPD 324.cRPD controller 720 may be a device/element management system, network management system, orchestrator, user interface/CLI, or other controller. The cRPD 324 runs routing protocols and exchanges routing protocol messages with routers including other cRPD 324. Each cRPD 324 may be a containerized routing protocol process and operate as, in effect, a pure software version of the router control plane.
The enhanced underlying route provided by cRPD 324 may replace the default gateway at the forwarding plane and provide a rich routing stack for use cases that may be supported. In some examples where cRPD 324 is not used, virtual router 506 will rely on a default gateway for the underlying route. In some examples, cRPD 324 as an underlying routing process will be limited to programming only the default inet (6) 0 structure with control plane routing information. In such an example, a non-default overlay VRF may be programmed by control node 232.
In this context, control node 232 may obtain custom resources defining custom master interfaces, VN 50, and VNR 52. When obtaining custom resources defining VNR 52, control node 232 may instantiate individual routing instances (with corresponding common routing targets). The control node 232 may create import and/or export policies for the common route targets that result in the import and export of routes from the various virtual networks (e.g., VNs 50A and 50N) to the common route targets. Control node 232 may resolve the common routing target to obtain forwarding information, which may then be pushed down to virtual router 506 via virtual routing agent 514. Using this forwarding information, virtual router 506 may forward packets between VNs 50A and 50N (in some interconnectivity schemes, such as the mesh interconnectivity scheme mentioned above).
Fig. 7A-7B illustrate the dual routing/control plane solution described above. In fig. 7A, cRPD 324 provides the underlying routing/forwarding information to virtual router agent 514, similar in some respects to how the router control plane programs the router forwarding/data plane.
As shown in fig. 7B, cRPD 324 exchanges routing information that can be used to create tunnels for VRFs through underlying network 702. Tunnel 710 is an example and connects virtual router 506 of server 12X and server 12A. Tunnel 710 may represent a Segment Routing (SR) or SRv tunnel, a Generic Routing Encapsulation (GRE) tunnel, and an IP-in-IP tunnel, LSP, or other tunnel. Control node 232 utilizes tunnel 710 to create virtual network 712 that connects server 12X and pod 22 of server 12A (which is attached to the VRF of the virtual network).
As noted above, cRPD 324 and virtual router agent 514 may exchange routing information using a gRPC interface, and virtual router agent 5145 may program virtual router 506 with a configuration using a gRPC interface. As also noted, control node 232 may be used for overriding and orchestration, while cRPD 324 may be used to manage the underlying routing protocol. The virtual router agent 514 may use the gRPC interface with the cRPD 324 while using XMPP to communicate with control nodes and Domain Name Service (DNS).
The gRPC model works well for cRPD 324 because there is a worker running on each compute node, and virtual router agent 314 acts as a gRPC server exposing services to clients (cRPD 324) for programming routing and configuration information (for the underlying layer). Therefore, gRPC is an attractive solution compared to XMPP. In particular, it transmits data in the form of a binary stream, and does not increase overhead in encoding/decoding data transmitted therethrough.
In some examples, the control node 232 may use XMPP to interface with the virtual router agent 514. Where virtual router agent 514 acts as a gRPC server, cRPD 324 acts as a gRPC client. This means that the client (cRPD) needs to initiate a connection towards the server (vruter proxy). In SDN architecture 700, virtual router agent 514 selects the set of control nodes 232 to which it will subscribe (because there are multiple control nodes). In this regard, the control node 232 acts as a server and the virtual router agent 514 connects and subscribes to updates as a client.
With gRPC, the control node 232 would need to pick the virtual router agent 514 to which it needs to connect and then subscribe as a client. Since the control node 232 does not run on each computing node, this would require implementing an algorithm to select the virtual router agent 514 to which it can subscribe. Furthermore, the control nodes 232 need to synchronize this information between each other. This also complicates the situation when a restart occurs and synchronization between the control nodes 232 is required to pick the agents they serve. Features such as smooth restart (GR) and fast convergence have been implemented on top of XMPP. XMPP is already lightweight and efficient. Thus, XMPP may be preferred over gRPC for communication of the control node 232 to the virtual router proxy 514.
Additional enhancements to control node 232 and their use are as follows. HA and horizontal scalability with three control nodes. Just two control nodes 232 are sufficient to meet HA requirements, as with any routing platform. In many cases this is advantageous. (however, one or more control nodes 232 may be used). For example, it provides a more deterministic infrastructure and meets standard routing best practices. Each virtual router agent 514 is attached to a unique pair of control nodes 232 to avoid randomness. With two control nodes 232, debugging may be simpler. In addition, the edge replication for constructing the multicast/broadcast tree may be simplified with only two control annotations 232. Currently, since the vruter proxy 314 is only connected to two of the three control nodes, all control nodes may not have a complete picture of the tree for a period of time and rely on BGP to synchronize the states between them. This is exacerbated for three control nodes 232 because virtual router agent 314 may randomly select two control nodes. If there are only two control nodes 232, each virtual router agent 314 will be connected to the same control node. This in turn means that the control node 232 does not need to rely on BGP to synchronize state and will have the same multicast tree pictures.
SDN architecture 200 may provide ingress replication as an alternative to edge replication and provide options for users. Ingress replication may be regarded as a special degradation case that generally covers multicast trees. In practice, however, the signaling of the ingress replication tree is much simpler than the signaling of a general overlay multicast tree. With ingress replication, each virtual router 21 ends up with a tree, itself as a root and each other virtual router as a leaf. The downtime of the virtual router 21 does not theoretically result in a reconstruction of the tree. Note that the performance of the ingress replication decreases with increasing cluster size. However, it works well for smaller clusters. Furthermore, multicasting is not a popular and widespread requirement for many clients. It is mainly limited to transmitting broadcast BUM traffic, which only occurs initially.
Configuration processing module enhancement
In a conventional SDN architecture, a network controller handles orchestration of all use cases. The configuration node converts the intent into a configuration object based on the data model and writes it into a database (e.g., cassandra). In some cases, notifications are sent to all clients waiting for configuration at the same time, e.g., via a rabitmq.
The control node not only acts as BGP speaker, but also has a configuration processing module that reads configuration objects from the database in the following manner. First, when a control node starts (or restarts), it connects to the database and reads all configurations directly from the database. Second, the control node may also be a messaging client. When there is an update to the configuration object, the control node receives a messaging notification in which the updated object is listed. This again results in the configuration processing module reading the object from the database.
The configuration processing module reads configuration objects of both the control plane (BGP related configuration) and the vruter forwarding plane. The configuration may be stored as a graph with objects as nodes and relationships as links. The graph may then be downloaded to the client (BGP/cRPD and/or vruter proxy).
In accordance with the techniques of this disclosure, conventional configuration API servers and messaging services are replaced in some examples by Kubeapi servers (API server 300 and custom API server 301), and the previous Cassandra database is replaced by etcd in Kubernetes. With this change, clients interested in configuring objects can directly monitor the etcd database for updates instead of relying on the RabbitMQ notifications.
Controller orchestration for CRPD
BGP configurations may be provided to cRPD 324. In some examples, cRPD controller 720 may be a Kubernetes controller that is adapted to develop its own controller to satisfy the Kubernetes space and to implement CRDs required to orchestrate and provision cRPD 324.
Distributed configuration processing
As mentioned earlier in this section, the configuration processing module may be part of the control node 232. It reads the configuration directly from the database, converts the data into JSON format and stores it as a graph in its local IFMAP database, which has the relationship between the objects as nodes and the objects as links. The graph is then downloaded through XMPP to the virtual router agent 514 of interest on the computing node. The virtual router agent 514 also constructs an IFMAP-based dependency graph locally to store the objects.
By having the virtual router agent 514 directly monitor the etcd servers in the API server 300, the need for IFMAPs as an intermediary module and for storing dependency graphs can be avoided. The same model may be used by cRPD 324 running on a compute node. This would avoid the need for IFMAP-XMPP configuration channels. A Kubernetes configuration client (for control node 232) may be used as part of this configuration. The client may also be used by a virtual router agent.
However, this increases the number of clients reading the configuration from the etcd server, especially in clusters with hundreds of computing nodes. Adding more monitors eventually leads to a decrease in the write rate and less than ideal event rates. The etcd's gRPC agent rebroadcasts from one server monitor to many client monitors. The gRPC agent merges multiple client monitors (c-latches) on the same key or scope into a single monitor (s-latch) that connects to the etcd server. The agent broadcasts all events from s-latches to its c-latches. Assuming that N clients monitor the same key, one gRPC proxy can reduce the monitoring load on the etcd server from N to 1. A user may deploy multiple gRPC agents to further distribute server load. The clients share a server monitor; the proxy effectively relieves the core cluster of resource stress. By adding agents, etcd can service one million events per second.
naming/DNS in SDN architecture
In previous architectures, DNS services were provided by co-operating condrail-DNS and condrail named processes to provide DNS services to VMs in the network. Naming serves as a DNS server providing an implementation of the BIND protocol. The con trail-dns receives updates from the vruter proxy and pushes these records to name.
The system supports four DNS modes and the IPAM configuration can select the desired DNS mode.
1. None-VM does not support DNS.
2. DNS resolution of the default DNS server-VM is done based on the name server configuration in the server infrastructure. When the VM gets a DHCP response, the subnet default gateway is configured as the DNS server of the VM. The DNS request sent by the VM to the default gateway is parsed via a (fabric) name server configured on the corresponding computing node, and a response is sent back to the VM.
3. Tenant DNS servers-tenants can use this schema to use their own DNS servers. The server list may be configured in the IPAM and then sent as DNS server(s) to the VM in a DHCP response. The DNS request sent by the VM will be routed as any other data packet based on the available routing information.
4. Virtual DNS server-in this mode, the system supports virtual DNS servers, providing DNS servers that resolve DNS requests from VMs. We can define multiple virtual domain name servers under each domain in the system. Each virtual domain name server is an authoritative server for the configured DNS domain.
The SDN architecture described herein is efficient in terms of DNS services it provides. Clients in the cloud native world would benefit from various DNS services. However, with the move to the next generation Kubernetes-based architecture, SDN architecture may use coreDNS to provide any DNS services.
Data plane
The data plane consists of two components: virtual router agent 514 (also known as an agent) and virtual router forwarding plane 506 (also known as DPDK vruter/kernel vruter). The agent 514 in the SDN architecture solution is responsible for managing the data plane components. The agent 514 establishes XMPP neighbor relation with the two control nodes 232 and then exchanges routing information with them. The vruter proxy 514 also dynamically generates flow entries and injects them into the virtual router 506. This gives instructions to virtual router 506 on how to forward the packet.
Responsibilities of agent 514 may include: interfacing with control node 232 to obtain a configuration. The received configuration is converted into a form that the data path can understand (e.g., the data model is converted from IFMap to the data model used by the data path). Interfacing with control node 232 to manage routing. And collecting statistics and deriving them from the data path to the monitoring solution.
Virtual router 506 implements data plane functionality that may allow virtual network interfaces to be associated with VRFs. Each VRF has its own forwarding and flow tables, while MPLS and VXLAN tables are global within virtual router 506. The forwarding table may contain routes for both IP and MAC addresses of the destination, and IP-to-MAC associations are used to provide proxy ARP capability. When a VM/container interface occurs, the label values in the MPLS table are selected by virtual router 506 and have a local meaning only for that virtual vruter. The VXLAN network identifier is global across all VRFs of the same virtual network in different virtual routers 506 within the domain.
In some examples, each virtual network has a default gateway address assigned to it, and each VM or container interface receives this address in a DHCP response received at initialization. When the workload sends a packet to an address outside its subnet, it will ARP for the MAC corresponding to the gateway's IP address, and virtual router 506 responds with its own MAC address. Thus, virtual router 506 may support a fully distributed default gateway function for all virtual networks.
The following is an example of packet flow forwarding implemented by virtual router 506.
Packet flows between container interfaces/VMs in the same subnet.
The working node may be a VM or a container interface. In some examples, packet processing proceeds as follows:
VM 1/container interface needs to send a packet to VM2, so virtual router 506 first looks up the IP address in its own DNS cache, but since this is the first packet, there is no entry.
VM1 sends a DNS request to the DNS server address supplied in the DHCP response at the start of its interface.
Virtual router 506 captures DNS requests and forwards them to DNS servers running in the SDN architecture controller.
DNS server in controller responds with IP address of VM2
Virtual router 506 sends a DNS response to VM1
VM1 needs to form an ethernet frame and therefore a MAC address for VM2 is needed. It checks its ARP cache but has no entry because this is the first packet.
VM1 issues an ARP request.
Virtual router 506 captures the ARP request and looks up the MAC address of IP-VM2 in its own forwarding table and finds the association in the L2/L3 route that the controller sends for VM 2. Virtual router 506 sends an ARP reply with the MAC address of VM2 to VM 1-network stack with VM1 presents a TCP timeout
The network stack of VM1 retries to send the packet and this time finds the MAC address of VM2 in the ARP cache and can form an ethernet frame and send it out.
Virtual router 506 looks up the MAC address of VM2 and finds the encapsulation route. The virtual router 506 establishes an outer header and sends the resulting packet to the server S2.
Virtual router 506 on server S2 decapsulates the packet and looks up the MPLS label to identify the virtual interface into which the original ethernet frame was sent. Ethernet frames are sent to the interface and received by VM 2.
Packet flow between VMs in different subnets
In some examples, the sequence when sending packets to destinations in different subnets is similar, except that virtual router 506 responds as a default gateway. VM1 will send the packet in an ethernet frame with the default gateway's MAC address provided in the DHCP response provided by virtual router 506 at VM1 start-up. When VM1 issues an ARP request for a gateway IP address, virtual router 506 responds with its own MAC address. When VM1 sends an ethernet frame using the gateway MAC address, virtual router 506 uses the destination IP address of the packet within the frame to look up a forwarding table in the VRF to find a route that will reach the host whose destination is running via the encapsulation tunnel.
Fig. 10 is a diagram illustrating an example network topology using custom master interfaces in accordance with the network segmentation techniques described in this disclosure. In the example of fig. 10, network controller 24 may configure an isolated namespace in which pod sets 1000A-1000C ("pod 1000") are coupled to respective Virtual Networks (VNs) 1002A-1002C ("VNs 1002") via custom primary interfaces 1004A-1004C ("custom primary interfaces 1004"). Custom main interface 1004 may provide the Kubernetes features listed above, including services, load Balancing (LB), cloud native network features, and the like.
For isolated namespaces (where the network is isolated from the namespaces), the network controller 24 may perform the following operations:
automatically providing a pod network and a service network for each namespace.
Connecting the pod network to one or more service networks.
Connecting pod network to IP fabric network
Connect pod network and default service network.
Use a namespace tag to enable this feature.
Enable namespace-to-namespace communication.
To enable namespace-to-namespace communication, network controller 24 may use VNR to connect to a network, with example notes provided below.
For pod and service networking, the network controller may define the following for the pod network and the service network. For a pod network, the network controller may provide annotations defining the pod network to the host interface and the namespace-level pod network, where the pod annotations take precedence. Examples of these pod notes are as follows:
for a service network, the network controller may define service network annotations for services, where service IP uses the default service CIDR and arrives via the pod network. The following is an example of service network annotation.
Service
--------annotations:
core.juniper.net/endpoint-network:“custom-podnet-svc-demo/vn1”
Fig. 11 is a diagram illustrating a network topology using a custom master interface in addition to virtual network routers in accordance with aspects of the network segmentation technique described in the present disclosure. In the example of fig. 11, the network topology 1100A is characterized by VNs 1002 coupled to each other (via VNRs as described above) to form a mesh topology. The network topology 1100B is characterized by VNs 1002 coupled to each other (via VNR as described above) to form a hub-and-spoke topology with VN 1002A as the hub and VNs 1002B and 1002C as the radiation.
Fig. 12 is a diagram illustrating an example pod-to-pod networking configured in accordance with aspects of the network segmentation techniques described in this disclosure. In this example, the network controller 24 configures networking for each NAD-pod- {0,1} via conventional NAD notes, such that the primary interface is from the default pod network and the secondary interface is from the NAD-vn. The network controller 24 may also configure networking for each of the pod-level-pod- {0,1} to potentially ensure that new fields exist in the annotation with the primary interface from the pod-level-pod-vn and no interface from the default pod network. The network controller 24 may also configure networking for each ns-level-podnet-pod- {0,1} to potentially ensure that namespaces have new annotations with primary interfaces from ns-level-podnet-vn and no interfaces from the default pod network. By default, the pod within the network is allowed to communicate. VNRs may be created that allow communication across virtual networks.
Fig. 13 is a diagram illustrating an example pod-to-service networking configured in accordance with aspects of the network segmentation techniques described in this disclosure. In the example shown in fig. 13, the network controller 24 may perform the following operations to configure container-to-service networking:
creating a client, nginx pod, in the default pod network.
o creates a service default-podnet-svc that selects the nginx pod.
Creating a client, a ngix pod, in a podnet-vn0 network and creating a service that selects ngix.
o creates a service podnet-vn0-svc that selects the nginx pod.
o ensures that the service notes specify a podnet-vn0 network.
Creating a client, a ngix pod, in a podnet-vn1 network and creating a service that selects ngix.
o creates a service podnet-vn1-svcd that selects the nginx po.
o ensures that the service annotation specifies a podnet-vn1 network.
All three services are assigned their cluster ip from the default-svcnnetwork.
The cluster ip will only be reached by the pod in the network specified on the service annotation.
FIG. 14 is a diagram illustrating an example container management platform feature extension in accordance with aspects of the network segmentation technique described in this disclosure. In the example shown in fig. 14, the network controller 24 may perform the following operations to configure the Kubernetes feature (network policy):
Two pod were created: ns-level-podnet-pod-0 and ns-level-podnet-pod-1.
Both of these pod are created in the same podnet-netpol namespace.
By default, these pod can communicate with each other.
A reject all network policy may be created that prevents all traffic within the namespace.
FIG. 15 is a diagram illustrating another example container management platform feature extension in accordance with aspects of the network segmentation technique described in this disclosure. In the example shown in fig. 15, the network controller 24 may perform the following operations to configure features (e.g., VLAN sub-interfaces): two pod were created: podnet-pod-0 and podnet-pod-1.
Each pod has a network annotation setting such that:
o will have a master interface created from parent-vn.
o will have a vlan 100 subinterface created from subtit-vn.
Each pod will have interfaces eth0 and eth0.100 from parent-vn and subtit-vn, respectively.
Fig. 16 is a diagram illustrating one example of network-aware scheduling in accordance with aspects of the network segmentation techniques described in this disclosure. The network controller 24 may configure the network aware scheduling in view of the following:
kubernetes Pod placement scheduling takes into account CPU and memory utilization of available nodes
It ignores network utilization
Even if a node meets CPU and memory requirements, the network on that node may be congested
Yun Yuansheng networking (cn 2) can obtain an overall view of the network utilization of each individual node by collecting a rich set of metrics
The scheduling plugin may add this information to the scheduling decision process
Example: the Containerized Network Function (CNF) must be deployed and it is known that it will have to support 20k sessions. The cloud native networking scheduling plug-in will filter out all nodes that cannot provide 20k traffic.
In this regard, various aspects of the technology may provide the following. The network controller 24 may mark a virtual network (virtual network) as a custom default pod network, wherein the virtual network to be used as the pod network will have to have a label core. If a virtual network (VirtualNetwork) is created manually, the user can add this tag. If it is created as part of a Network Accessory Definition (NAD), the NAD controller may add this tag as part of VN creation.
The network controller 24 can use NAD to create a custom default pod network. In this instance, a custom default pod network can be created at the time of installation of the NAD. The juniper.net/networks annotation will have a new boolean field podnetworkwhich should be set to true when the network will be used as a custom default pod network. One example of such an annotation is provided below:
The network controller 24 may also specify the pod network of each pod. In this example, "net. Juniper. Trail. Podnetwork" is specified in cni-args of NetworkSelectionElement (NSE) of the Pod network to be Pod. In some examples, cni-args with only one NSE in each pod may have this entry, otherwise an error would be raised. The annotation example is as follows:
the network controller 24 may also specify the pod network for each namespace. In this example, if all of the pods within the Namespace should use a network space/network-name as the pod network, then a network.juniper.con.pod network/network-name may be set on the Namespace (Namespace). If the same network as the host interface is used, then the pod within the namespace may not need NAD annotation. The following is an example of annotation.
The network controller 24 may also maintain Pod and namespace annotation priorities. Annotations may be specified at the pod or namespace level to select a non-default pod network as the primary interface. If annotations are specified at both the Pod and Namespace levels, then the Pod level annotations will take precedence.
The network controller 24 can specify a network for a service where the service that is to select a pod with a custom pod network requires the specified network as an annotation. The format of the annotation value is networkanamespace/networkname. The following is one example of such an annotation.
In this example, the network controller 24 may create a service with a cluster ip from the default-service network (default-service network), however the cluster ip may only be reached by a pod with an interface from the network specified by the annotation.
With respect to implementation, various changes are provided. The pod controller changes include the following:
logic will be added to the pod controller to also use the custom network for the idx0 interface.
In some examples, network controller 24 picks only the default pod network or the isolated namespace pod network.
The virtual network (virtual network) controller changes include, for each virtual network (virtual network) having a label core.juniper.net/virtual network: customer-default-nodnetwork, the virtual network (virtual network) controller may:
creating a radiation VNR connecting the VN to a default IP fabric network (DefaultIPFabricNetwork) central VNR.
This may be necessary for node-to-pod connectivity.
Creating a radiation VNR connecting the VN to a default service network (DefaultServiceNetwork) center VNR.
This is necessary for the pod in the custom pod network to access services in the default-ServiceNetwork (default-ServiceNetwork).
Services such as kube-DNS giving functionality reside in networks that require this connectivity.
If these VNRs are accidentally deleted but still needed, these VNRs will be recreated.
These VNRs will be automatically deleted when the virtual network no longer has custom default-pod network (custom-default-pod network) labels.
The service controller changes include the following:
the service that will select the pod with the custom pod network needs to explicitly account for this via annotation at creation time
The service controller will set the VN reference (VN ref) to the IIP created for the service to the VN provided via the annotation
Although the VN reference (VN ref) of IIP comes from the custom pod network, subnet references will be explicitly requested from default-servicenetwork- { v4, v6} -subnet
Example IP (InstanceIP) controller changes are discussed. InstanceIP (IIP) objects allow for configurable virtual network (VirtualNetwork) references in their specifications. However, the InstanceIP object may have a user non-configurable subnet reference in its state. The network controller 24 processes the subnet references of the virtual network and sets the subnet references on the status.
In some use cases, the IIP created for a service on a custom pod network will have a VN reference to an arbitrary network, while the cluster IP is always from the default-service network (default-service network). In this case, the subnet of the VN is different from the subnet of the IIP IPAddress. These modifications are illustrated below:
To allow such use cases, the network controller 24 may be extended to allow manual subnet references to be specified on the IIP object. This is done by using the notation "core.juniper.net/iip-core-subnet" whose value is in the format of subnet name/subnet-name. In this example, using the same IIP shown above as one example, the service controller would place the annotation as follows:
the InstanceIP controller first identifies the annotation and if found, uses the subnet to create a subnet reference.
In terms of interactions with other features, the following may be considered. There are multiple interfaces within the pod, where a pod with a custom default pod network may still contain multiple interfaces. An NSE example for this can look as follows:
a pod created with the annotation as described above will have eth0 from customer-podnet-vn, eth1 from vn1 and eth2 from vn 2. Copying this exact configuration by specifying the pod network at the namespace level would look like:
isolated namespaces: if the namespace specifies that it is an isolated namespace, then the creation of a pod with a custom default pod network within that namespace is not allowed.
Fig. 17 is a flowchart illustrating example operations of the network controller shown in fig. 1 in performing aspects of the network segmentation techniques described in this disclosure. As described above, network controller 24 may first receive a request conforming to the container orchestration platform, by which to configure a new pod of the plurality of pods with a master interface to communicate over the virtual network, thereby segmenting the network formed by the plurality of pods (1700). The network controller 24 may configure the new pod with a master interface to enable communication via the virtual network in response to the request (1702).
In this way, aspects of the technology may implement the following examples:
example 1. A network controller, comprising: a memory configured to store a request conforming to the containerization platform to configure a new pod of the plurality of pods with a master interface to communicate over the virtual network to segment a network formed by the plurality of pods; a processing circuit configured to configure the new pod with a master interface to enable communication via the virtual network in response to the request.
Example 2. The network controller of example 1, wherein the processing circuitry is configured to: when configured to configure a new pod, the master interface configuration is configured to enable communication via the virtual network, but is not configured to communicate with the default pod network.
Example 3. The network controller of example 2, wherein the request indicates that the container orchestration platform is to configure the new pod with custom resources that redefine the master interface to enable communication via the virtual network instead of the default pod network.
Example 4. The network controller of example 1, wherein the request indicates that the container orchestration platform is to configure the new pod with custom resources that redefine the master interface to enable communication via the virtual network.
Example 5 the network controller of any combination of examples 1-4, wherein the request comprises a first request, wherein the virtual network comprises a first virtual network, wherein the processing circuitry is further configured to process a second request to create a virtual network router, wherein the virtual network router is configured to cause the network controller to interconnect the first virtual network and the second virtual network of the network formed by the plurality of containers, wherein the virtual network router represents a logical abstraction of one or more policies that cause one or more of an import and an export of routing information between the first virtual network and the second virtual network, and wherein the processing circuitry is further configured to configure the first virtual network and the second virtual network in accordance with the one or more to enable one or more of an import and an export of routing information between the first virtual network and the second virtual network via the virtual network router.
Example 6 the network controller of example 5, wherein the second request includes a tag associated with the first virtual network and the second virtual network, and wherein the configuration node identifies the first virtual network and the second virtual network based on the tag to configure a routing instance corresponding to the virtual network router according to one or more policies to cause import and export of routing information between the first virtual network and the second virtual network.
Example 7 the network controller of any combination of examples 5 and 6, wherein the second request indicates that the virtual network router is a mesh virtual network router, and wherein the one or more policies represented by the mesh virtual network router include symmetric import and export policies that cause import and export of routing information between the first virtual network and the second virtual network.
Example 8 the network controller of any combination of examples 5 and 6, wherein the second request indicates that the virtual network router is a hub virtual network router, the first virtual network is a first radiating virtual network, and the second virtual network is a second radiating virtual network, and wherein the one or more policies represented by the hub virtual network router include asymmetric import and export policies that cause export of routing information from both the first radiating virtual network and the second radiating virtual network to the virtual network router, but do not cause import of routing information between the first radiating virtual network and the second radiating virtual network.
Example 9 the network controller of any combination of examples 1-8, wherein the memory stores a pod manifest annotation for the new pod identifying a virtual network over which the host interface configured for the new pod is to communicate, and wherein the processing circuitry processes the request to parse the pod manifest annotation to identify the virtual network over which the host interface configured for the new pod is to communicate.
Example 10 the network controller of any combination of examples 1-9, wherein the namespace identifies a virtual network on which to communicate for a primary interface of the new pod configuration.
Example 11. A method, comprising: storing, by the network controller, a request conforming to the container orchestration platform, by which a new pod of the plurality of pods is configured with a master interface to communicate over the virtual network to segment the network formed by the plurality of pods; and configuring, by the network controller in response to the request, the new pod with a master interface to enable communication via the virtual network.
Example 12. The method of example 11, wherein configuring the new pod comprises configuring the host interface to enable communication via the virtual network without being configured to communicate with a default pod network.
Example 13 the method of example 12, wherein the request indicates that the container orchestration platform is to configure the new pod with a custom resource that redefines the host interface to enable communication via the virtual network instead of the default pod network.
Example 14. The method of example 11, wherein the request indicates that the container orchestration platform is to configure the new pod with a custom resource that redefines the host interface to enable communication via the virtual network.
Example 15 the method of any combination of examples 11-14, wherein the request comprises a first request, wherein the virtual network comprises a first virtual network, wherein the method further comprises processing a second request to create a virtual network router through the second request, wherein the virtual network router is configured to cause the network controller to interconnect the first virtual network and a second virtual network of the network formed by the plurality of pods, wherein the virtual network router represents a logical abstraction of one or more policies that cause one or more of an import and export of routing information between the first virtual network and the second virtual network, and wherein the method further comprises configuring the first virtual network and the second virtual network according to the one or more policies to enable one or more of an import and export of routing information between the first virtual network and the second virtual network via the virtual network router.
Example 16 the method of example 15, wherein the second request includes a tag associated with the first virtual network and the second virtual network, and wherein the configuration node identifies the first virtual network and the second virtual network based on the tag to configure a routing instance corresponding to the virtual network router according to one or more policies to cause import and export of routing information between the first virtual network and the second virtual network.
Example 17 the method of any combination of examples 15 and 16, wherein the second request indicates that the virtual network router is a mesh virtual network router, and wherein the one or more policies represented by the mesh virtual network router include symmetric import and export policies that result in import and export of routing information between the first virtual network and the second virtual network.
Example 18 the method of any combination of examples 15 and 16, wherein the second request indicates that the virtual network router is a hub virtual network router, the first virtual network is a first radiating virtual network, and the second virtual network is a second radiating virtual network, and wherein the one or more policies represented by the hub virtual network router include asymmetric import and export policies that result in exporting routing information from both the first radiating virtual network and the second radiating virtual network to the virtual network router, but do not result in importation of routing information between the first radiating virtual network and the second radiating virtual network.
Example 19 the method of any combination of examples 11-18, further comprising: storing a pod manifest annotation of the new pod identifying a virtual network on which to communicate for a host interface of the new pod configuration; and processing the request to parse the pod manifest annotation to identify a virtual network on which to communicate for the primary interface of the new pod configuration.
Example 20. A non-transitory computer-readable storage medium storing instructions that, when executed, cause a processing circuit to: storing a request conforming to a container orchestration platform by which a new pod of the plurality of pods is configured with a master interface to communicate over a virtual network to segment a network formed by the plurality of containers; and in response to the request, configuring the new pod with a host interface to enable communication via the virtual network. If implemented in hardware, the present disclosure may relate to an apparatus such as a processor or an integrated circuit device (such as an integrated circuit chip or chipset).
Alternatively or additionally, if implemented in software or firmware, the techniques may be realized at least in part by a computer-readable data storage medium comprising instructions that, when executed, cause a processor to perform one or more of the methods described above. For example, a computer-readable data storage medium may store such instructions for execution by a processor.
The computer readable medium may form part of a computer program product, which may include packaging material. The computer-readable medium may include computer data storage media such as Random Access Memory (RAM), read Only Memory (ROM), non-volatile random access memory (NVRAM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory, magnetic or optical data storage media, and the like. In some examples, an article of manufacture may comprise one or more computer-readable storage media.
In some examples, the computer-readable storage medium may include a non-transitory medium. The term "non-transitory" may indicate that the storage medium is not embodied in a carrier wave or propagated signal. In some examples, a non-transitory storage medium may store data (e.g., in RAM or cache) that may change over time.
The code or instructions may be software and/or firmware executed by a processing circuit comprising one or more processors, such as one or more Digital Signal Processors (DSPs), general purpose microprocessors, application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Thus, the term "processor" as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. Additionally, in some aspects, the functionality described in this disclosure may be provided within software modules or hardware modules.
Claims (20)
1. A network controller, comprising:
a memory configured to store a request conforming to the containerization platform to configure a new pod of the plurality of pods with a master interface to communicate over a virtual network to segment a network formed by the plurality of pods; and
Processing circuitry configured to configure the new pod with the primary interface to enable communication via the virtual network in response to the request.
2. The network controller of claim 1, wherein the processing circuit is configured to: when configured to configure the new pod, the host interface is configured to enable the communication via the virtual network, but is not configured to communicate with a default pod network.
3. The network controller of claim 2, wherein the request indicates that the container orchestration platform is to configure the new pod with a custom resource that redefines the master interface to enable the communication via the virtual network instead of the default pod network.
4. A network controller according to any of claims 1-3, wherein the request indicates that the container orchestration platform is to configure the new pod with custom resources that redefine the master interface to enable the communication via the virtual network.
5. The network controller according to claim 1 to 3,
wherein the request comprises a first request,
wherein the virtual network comprises a first virtual network,
Wherein the processing circuit is further configured to process a second request to create a virtual network router,
wherein the virtual network router is configured to cause the network controller to interconnect the first virtual network and a second virtual network of the network formed by the plurality of pods,
wherein the virtual network router represents a logical abstraction of one or more policies that cause one or more of importation and exportation of routing information between the first virtual network and the second virtual network, and
wherein the processing circuitry is further configured to configure the first virtual network and the second virtual network according to the one or more policies to enable one or more of the importation and exportation of routing information between the first virtual network and the second virtual network via the virtual network router.
6. The network controller of claim 5,
wherein the second request includes a tag associated with the first virtual network and the second virtual network, and
wherein the configuration node identifies the first virtual network and the second virtual network based on the label to configure a routing instance corresponding to the virtual network router according to the one or more policies to cause the importing and the exporting of the routing information between the first virtual network and the second virtual network.
7. The network controller of claim 5,
wherein the second request indicates that the virtual network router is a mesh virtual network router, and
wherein the one or more policies represented by the mesh virtual network router include symmetric import and export policies that cause both the import and the export of the routing information between the first virtual network and the second virtual network.
8. The network controller of claim 5,
wherein the second request indicates that the virtual network router is a hub virtual network router, the first virtual network is a first radiating virtual network, and the second virtual network is a second radiating virtual network, and
wherein the one or more policies represented by the central virtual network router include asymmetric import and export policies that cause export of the routing information from both the first and second radiating virtual networks to the virtual network router, but do not cause import of the routing information between the first and second radiating virtual networks.
9. The network controller according to claim 1 to 3,
wherein the memory stores a pod manifest annotation for the new pod, the pod manifest annotation identifying the virtual network over which the primary interface configured for the new pod is to communicate, and
wherein the processing circuitry processes the request to parse the pod manifest annotation to identify the virtual network over which the primary interface configured for the new pod is to communicate.
10. A network controller according to any of claims 1-3, wherein a namespace identifies the virtual network over which the primary interface configured for the new pod is to communicate.
11. A method, comprising:
storing, by a network controller, a request conforming to a container orchestration platform, by which a new pod of a plurality of pods is configured with a master interface to communicate over a virtual network to segment a network formed by the plurality of pods; and
the new pod is configured with the primary interface by the network controller in response to the request to enable communication via the virtual network.
12. The method of claim 11, wherein configuring the new pod comprises configuring the master interface to enable the communication via the virtual network without being configured to communicate with a default pod network.
13. The method of claim 12, wherein the request indicates that the container orchestration platform is to configure the new pod with a custom resource that redefines the master interface to enable the communication via the virtual network instead of the default pod network.
14. The method of any of claims 11-13, wherein the request indicates that the container orchestration platform is to configure the new pod with a custom resource that redefines the master interface to enable the communication via the virtual network.
15. The method according to any one of claim 11 to 13,
wherein the request comprises a first request,
wherein the virtual network comprises a first virtual network,
wherein the method further comprises processing a second request, creating a virtual network router by the second request,
wherein the virtual network router is configured to cause the network controller to interconnect the first virtual network and a second virtual network of the network formed by the plurality of pods,
wherein the virtual network router represents a logical abstraction of one or more policies that cause one or more of importation and exportation of routing information between the first virtual network and the second virtual network, and
Wherein the method further comprises configuring the first virtual network and the second virtual network according to the one or more policies to enable one or more of the importing and the exporting of routing information between the first virtual network and the second virtual network via the virtual network router.
16. The method according to claim 15,
wherein the second request includes a tag associated with the first virtual network and the second virtual network, and
wherein the configuration node identifies the first virtual network and the second virtual network based on the label to configure a routing instance corresponding to the virtual network router according to the one or more policies to cause the importing and the exporting of the routing information between the first virtual network and the second virtual network.
17. The method according to claim 15,
wherein the second request indicates that the virtual network router is a mesh virtual network router, and
wherein the one or more policies represented by the mesh virtual network router include symmetric import and export policies that cause both the import and the export of the routing information between the first virtual network and the second virtual network.
18. The method according to claim 15,
wherein the second request indicates that the virtual network router is a hub virtual network router, the first virtual network is a first radiating virtual network, and the second virtual network is a second radiating virtual network, and
wherein the one or more policies represented by the central virtual network router include asymmetric import and export policies that cause export of the routing information from both the first and second radiating virtual networks to the virtual network router, but do not cause import of the routing information between the first and second radiating virtual networks.
19. The method of any of claims 11-13, further comprising:
storing a pod manifest annotation for the new pod, the pod manifest annotation identifying the virtual network over which the primary interface configured for the new pod is to communicate; and
the request is processed to parse the pod manifest annotation to identify the virtual network over which the host interface configured for the new pod is to communicate.
20. A non-transitory computer-readable storage medium storing instructions that, when executed, cause a processing circuit to:
storing a request conforming to a container orchestration platform by which a new pod of a plurality of pods is configured with a master interface to communicate over a virtual network to segment a network formed by the plurality of pods; and
the new pod is configured with the host interface to enable communication via the virtual network in response to the request.
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US63/375,091 | 2022-09-09 | ||
US18/146,799 | 2022-12-27 | ||
US18/146,799 US12101204B2 (en) | 2022-09-09 | 2022-12-27 | Network segmentation for container orchestration platforms |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117687773A true CN117687773A (en) | 2024-03-12 |
Family
ID=90132681
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311149543.5A Pending CN117687773A (en) | 2022-09-09 | 2023-09-07 | Network segmentation for container orchestration platform |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117687773A (en) |
-
2023
- 2023-09-07 CN CN202311149543.5A patent/CN117687773A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20230123775A1 (en) | Cloud native software-defined network architecture | |
US12101253B2 (en) | Container networking interface for multiple types of interfaces | |
US20230079209A1 (en) | Containerized routing protocol process for virtual private networks | |
US20230107891A1 (en) | User interface for cloud native software-defined network architectures | |
US12074884B2 (en) | Role-based access control autogeneration in a cloud native software-defined network architecture | |
US12101204B2 (en) | Network segmentation for container orchestration platforms | |
EP4160409A1 (en) | Cloud native software-defined network architecture for multiple clusters | |
US20230336414A1 (en) | Network policy generation for continuous deployment | |
US20230409369A1 (en) | Metric groups for software-defined network architectures | |
US20240305586A1 (en) | Hybrid data plane for a containerized router | |
US20240095158A1 (en) | Deployment checks for a containerized sdn architecture system | |
US12034652B2 (en) | Virtual network routers for cloud native software-defined network architectures | |
CN117061424A (en) | Containerized router using virtual networking | |
US12101227B2 (en) | Network policy validation | |
US20240073087A1 (en) | Intent-driven configuration of a cloud-native router | |
EP4160410A1 (en) | Cloud native software-defined network architecture | |
CN117687773A (en) | Network segmentation for container orchestration platform | |
US12058022B2 (en) | Analysis system for software-defined network architectures | |
CN117640389A (en) | Intent driven configuration of Yun Yuansheng router | |
CN117099082A (en) | User interface for cloud native software defined network architecture | |
CN117278428A (en) | Metric set for software defined network architecture | |
CN118282881A (en) | Network policy validation | |
CN117255019A (en) | System, method, and storage medium for virtualizing computing infrastructure |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |