CN111492628A

CN111492628A - Techniques for NIC port reduction with accelerated switching

Info

Publication number: CN111492628A
Application number: CN201980006768.0A
Authority: CN
Inventors: G.罗杰斯; S.T.帕莱尔莫; S-W.钱; N.N.文卡特桑; I.廖; D.梅塔; R.加迪亚尔
Original assignee: Intel Corp
Current assignee: Intel Corp
Priority date: 2018-02-25
Filing date: 2019-02-25
Publication date: 2020-08-04
Also published as: DE112019000965T5; WO2019165355A1

Abstract

A technique for accelerating network processing includes a computing device having a processor and an accelerator. The accelerator may be a Field Programmable Gate Array (FPGA). The accelerator includes a virtual switch and a network port, such as an ethernet physical interface. The network port of the accelerator is coupled to the network port of the external switch. The processor executes a plurality of virtual network functions, and the virtual switch processes network traffic associated with the virtual network functions. For example, the virtual switch may forward traffic generated by the virtual network function to the switch via the ports of the accelerator and the ports of the switch. Each virtual network function may be coupled to a para-virtualized interface of the accelerator, such as a virtual I/O queue. Network traffic may be processed within a coherency domain shared by the processor and the accelerator. Other embodiments are described and claimed.

Description

Techniques for NIC port reduction with accelerated switching

Cross Reference to Related Applications

This application claims the benefit of U.S. provisional patent application No.62/634,874 filed on 25/2/2018.

Background

Modern computing devices may include general purpose processor cores, as well as various hardware accelerators for performing specialized tasks. Some computing devices may include one or more Field Programmable Gate Arrays (FPGAs), which may include programmable digital logic resources that are configurable by an end user or system integrator. In some computing devices, instead of using a general purpose computing core, an FPGA may be used to perform network packet processing tasks.

Drawings

The concepts described herein are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings. For simplicity and clarity of illustration, elements illustrated in the figures have not necessarily been drawn to scale. Where considered appropriate, reference numerals have been repeated among the figures to indicate corresponding or analogous elements.

FIG. 1 is a simplified block diagram of at least one embodiment of a system for network acceleration;

FIG. 2 is a simplified block diagram of at least one embodiment of a computing device of the system of FIG. 1;

FIG. 3 is a simplified block diagram of at least one embodiment of an environment of the computing device of FIGS. 1 and 2;

FIG. 4 is a simplified block diagram of at least one embodiment of a virtual switch application function of the computing device of FIGS. 1-3;

FIG. 5 is a simplified flow diagram of at least one embodiment of a method for network acceleration, which may be performed by the computing device of FIGS. 1-4;

FIG. 6 is a chart illustrating exemplary test results that may be implemented using the system of FIGS. 1-4; and

fig. 7 is a simplified block diagram of a typical system.

Detailed Description

While the concepts of the present disclosure are susceptible to various modifications and alternative forms, specific embodiments thereof have been shown by way of example in the drawings and will herein be described in detail. It should be understood, however, that there is no intention to limit the concepts of the present disclosure to the specific forms disclosed, but on the contrary, the intention is to cover all modifications, equivalents, and alternatives consistent with the present disclosure and the appended claims.

References in the specification to "one embodiment," "an illustrative embodiment," etc., indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may or may not include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to effect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described. Additionally, it should be appreciated that items included in the list in the form of "at least one A, B and C" may mean (a); (B) (ii) a (C) (ii) a (A and B); (A and C); (B and C); or (A, B and C). Similarly, an item listed in the form of "at least one of A, B or C" can mean (a); (B) (ii) a (C) (ii) a (A and B); (A and C); (B and C); or (A, B and C).

In some cases, the disclosed embodiments may be implemented in hardware, firmware, software, or any combination thereof. The disclosed embodiments may also be implemented as instructions carried by or stored on a transitory or non-transitory machine-readable (e.g., computer-readable) storage medium, which may be read and executed by one or more processors. A machine-readable storage medium may be embodied as any storage device, mechanism, or other physical structure for storing or transmitting information in a form readable by a machine (e.g., a volatile or non-volatile memory, a media disk, or other media device).

In the drawings, some structural or methodical features may be shown in a particular arrangement and/or ordering. However, it should be appreciated that such a particular arrangement and/or ordering may not be required. Rather, in some embodiments, such features may be arranged in a manner and/or order different from that shown in the illustrative figures. Additionally, the inclusion of a structural or methodical feature in a particular figure is not meant to imply that such feature is required in all embodiments, and in some embodiments, such feature may not be included or may be combined with other features.

Referring now to fig. 7, an exemplary system 700 for network processing may require one or more dedicated network cards assigned to each virtual network function. In the illustrative system 700, the computing device 702 includes a processor 720 and a plurality of Network Interface Controllers (NICs) 726. Processor 720 executes a plurality of Virtual Network Functions (VNFs) 722. Each VNF 722 of computing device 702 is assigned to one or more dedicated ports in legacy NIC 726. Each VNF 722 may have direct access to NIC 726 (or a portion of NIC 726, such as a PCI virtual function) using a hardware interface, such as single root I/O virtualization (SR-IOV). For example, in the illustrative system, each VNF 722 accesses a dedicated NIC 726 using Intel VT-d technology 724 provided by the processor 720. As shown, each illustrative NIC 726 includes two network ports, and each of those network ports is coupled to a corresponding port 742 of the network switch 704. Thus, in the illustrative system 700, to execute four VNFs 722, computing device 702 occupies eight ports 742 of switch 704.

Referring now to fig. 1, a system 100 for accelerated networking includes a plurality of computing devices 102 communicating over a network 104. Each computing device 102 has a processor 120 and an accelerator 128, such as a Field Programmable Gate Array (FPGA) 128. Processor 120 and accelerator 128 are coupled together by coherent (coherent) interconnect 124 and non-coherent (non-coherent) interconnect 126. In use, as described below, computing device 102 executes one or more Virtual Network Functions (VNFs) or other Virtual Machines (VMs). Network traffic associated with the VNF is handled by a virtual switch (vSwitch) of the accelerator 128. The accelerators 128 include one or more ports or other physical interfaces coupled to the switches 106 of the network 104. Each VNF does not require a dedicated port on switch 106. Thus, system 100 can perform high-throughput, scalable network workloads with reduced top-of-rack (ToR) switch port consumption compared to conventional systems that require traditional NICs and dedicated ports for each VNF. Accordingly, fewer NICs may be required per computing device 102, which may reduce cost and power consumption. Additionally, reducing the number of NICs required may overcome server form factor limitations on the number of physical NIC expansion cards, chassis space, and/or other physical resources of the computing device 102. Further, the flexibility of the user or tenant of the system 100 may be improved because the user is not required to purchase and install a predetermined number of NICs in each server, and performance is not limited to the capabilities provided by those NICs. Rather, the performance of the system 100 may be extended with the overall throughput capability of a particular server platform and network fabric. Additionally and unexpectedly, tests have shown that the disclosed system 100 can achieve performance comparable to single root I/O virtualization (SR-IOV) implementations without using standard NICs and with fewer switch ports. Additionally, testing has shown that system 100 can achieve better performance than software-based systems.

Referring now to FIG. 6, a chart 600 illustrates test results that may be achieved by the system 100 as compared to a typical system. Bar 602 illustrates throughput achieved by a system using SR-IOV/VT-d PCI express virtualization, which is similar to system 700 of FIG. 7. The bar 604 illustrates the throughput that may be achieved by the system 100 with the FPGA accelerator 128 as disclosed herein. Bar 606 illustrates the throughput achieved by a system using the intel data plane development suite (DPDK), which is a high performance software packet processing framework. As shown, SR-IOV system 700 implements approximately 40 Gbps, FPGA system 100 implements approximately 36.4 Gbps, and DPDK system implements approximately 15 Gbps. The SR-IOV system 700 achieves about 2.67 times the throughput of a DPDK (software) system and about 1.1 times the throughput of an FPGA 100 system. The FPGA system 100 achieves approximately 2.4 times the throughput of a DPDK (software) system. Curve 608 illustrates the switch ports used by each system. As shown, SR-IOV system 700 uses four ports, FPGA system 100 uses two ports, and DPDK system uses two ports. Curve 610 illustrates the processor cores used by each system. As shown, SR-IOV system 700 uses zero cores, FPGA system 100 uses zero cores, and DPDK system uses six cores. Thus, as shown in graph 600, the FGPA system 100 provides throughput performance comparable to the typical SR-IOV system 700 with reduced NIC port usage and without the use of additional processor cores.

Referring back to fig. 1, each computing device 102 may be embodied as any type of computing or computer device capable of performing the functions described herein, including, but not limited to, a computer, a server, a workstation, a desktop computer, a laptop computer, a notebook computer, a tablet computer, a mobile computing device, a wearable computing device, a network appliance, a web appliance, a distributed computing system, a processor-based system, and/or a consumer electronics device. As shown in fig. 1, computing device 102 illustratively includes a processor 120, an accelerator 128, an input/output subsystem 130, a memory 132, a data storage device 134, and a communication subsystem 136, and/or other components and devices common in a server or similar computing device. Of course, in other embodiments, computing device 102 may include other or additional components, such as those commonly found in a server computer (e.g., various input/output devices). Additionally, in some embodiments, one or more illustrative components may be incorporated into, or otherwise form a part of, another component. For example, in some embodiments, memory 132, or portions thereof, may be incorporated into processor 120.

Processor 120 may be embodied as any type of processor capable of performing the functions described herein. Illustratively, the processor 120 is a multi-core processor 120 having two processor cores 122. Of course, in other embodiments, the processor 120 may be embodied as a single or multi-core processor(s), digital signal processor, microcontroller, or other processor or processing/control circuit. Similarly, the memory 132 may be embodied as any type of volatile or non-volatile memory or data storage device capable of performing the functions described herein. In operation, the memory 132 may store various data and software used during operation of the computing device 102, such as operating systems, applications, programs, libraries, and drivers. Memory 132 is communicatively coupled to processor 120 via I/O subsystem 130, I/O subsystem 130 may be embodied as circuitry and/or components to: the circuitry and/or components facilitate input/output operations with the processor 120, the accelerator 128, the memory 132, and other components of the computing device 102. For example, the I/O subsystem 130 may embody or otherwise include a memory controller hub, an input/output control hub, a sensor hub, firmware devices, communication links (i.e., point-to-point links, bus links, wires, cables, light guides, printed circuit board traces, etc.), and/or other components and subsystems that facilitate input/output operations. In some embodiments, the I/O subsystem 130 may form part of a system on a chip (SoC) and be incorporated on a single integrated circuit chip with the processor 120, memory 132, and other components of the computing device 102.

The data storage device 134 may be embodied as any type of device or devices configured for short-term or long-term storage of data, such as, for example, memory devices and circuits, memory cards, hard drives, solid state drives, non-volatile flash memory, or other data storage devices the computing device 102 also includes a communication subsystem 136, the communication subsystem 136 may be embodied as any communication circuit, device, or collection thereof that enables communication between the computing device 102 and other remote devices over the computer network 104.

As shown in fig. 1, the computing device 102 includes an accelerator 128. The accelerators 128 may be embodied as Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Graphics Processing Utilities (GPUs), Artificial Intelligence (AI) accelerators, coprocessors, or other digital logic devices capable of performing accelerated network functions. Illustratively, the accelerator 128 is an FPGA included in a multi-chip package with the processor 120, as described further below in connection with fig. 2. Accelerators 128 may be coupled to processor 120 via a plurality of high-speed connection interfaces including coherent interconnect 124 and one or more non-coherent interconnects 126.

Coherent interconnect 124 may be embodied as a high-speed data interconnect capable of maintaining data coherency between a last-level cache of processor 120, any cache of accelerator 128, or other local memory and memory 132. for example, coherent interconnect 124 may be embodied as an intra-die interconnect (IDI), an intel Ultra Path Interconnect (UPI), a QuickPath interconnect (QPI), an intel accelerator link (IA L), or other coherent interconnect non-coherent interconnect 126 may be embodied as a high-speed data interconnect that does not provide data coherency, such as a peripheral bus (e.g., a PCI express bus), a structural interconnect, such as an intel Omni-Path architecture, or other non-coherent interconnect.

Computing device 102 may further include one or more peripheral devices 138. The peripheral devices 138 may include any number of additional input/output devices, interface devices, and/or other peripheral devices. For example, in some embodiments, peripheral devices 138 may include touch screens, graphics circuits, Graphics Processing Units (GPUs) and/or processor graphics, audio devices, microphones, cameras, keyboards, mice, network interfaces, and/or other input/output devices, interface devices, and/or peripheral devices.

The network 104 may be embodied as or otherwise include a wired or wireless local area network (L AN) and/or a wired or wireless Wide Area Network (WAN). As such, the network 104 may include any number of additional devices, such as additional computers, routers, and switches, to facilitate communications among the devices of the system 100. in the illustrative embodiment, the network 104 is embodied as a local Ethernet network.A network 104 includes AN illustrative switch 106, which illustrative switch 106 may be embodied as a top-of-rack (ToR) switch, a top-of-rack (Mor) switch, or other switch.A network 104 may, of course, include a plurality of switches 106 and other network devices.

Referring now to FIG. 2, a diagram 200 illustrates one potential embodiment of the computing device 102. As shown, the computing device 102 includes a multi-chip package (MCP) 202. The MCP 202 includes the processor 120 and the accelerator 128, as well as the coherent interconnect 124 and the non-coherent interconnect 126. Illustratively, the accelerators 128 are FPGAs, which may be embodied as integrated circuits that include programmable digital logic resources that may be configured after manufacture. The FPGA 128 may include an array of configurable logic blocks that communicate, for example, through configurable data interchange. As shown, computing device 102 further includes memory 132 and a communication subsystem 136. The FPGA 128 is coupled to the communication subsystem 136 and can therefore send and/or receive network data. Additionally, although the memory 132 and/or the communication subsystem 136 are illustrated in fig. 2 as separate components from the MCP 202, it should be understood that in some embodiments, the memory 132 and/or the communication subsystem 136 may also be incorporated into the MCP 202.

As shown, the FPGA 128 includes an FPGA Interface Unit (FIU) 204, which FPGA Interface Unit (FIU) 204 may be embodied as a digital logic resource configured by a manufacturer, vendor, or other entity associated with the computing device 102. The FIU 204 implements interface protocols and manageability for the link between the processor 120 and the FPGA 128. In some embodiments, FIU 204 may also provide platform capabilities, such as Intel virtualization technology for directed I/O (Intel VT-d), security, fault monitoring, performance monitoring, power and thermal management, partial reconfiguration, and the like. As shown, FIU 204 further includes an UltraPath interconnect (UPI) block 206 coupled to coherent interconnect 124 and a PCI express (PCIe) block 208 coupled to non-coherent interconnect 126. The UPI block 206 and PCIe block 208 may be embodied as digital logic as follows: the digital logic is configured to carry data between the FPGA 128 and the processor 120 over

physical interconnects

124, 126, respectively. The physical coherent UPI block 206 and the physical non-coherent block 208 and their associated interconnects may be multiplexed into a set of Virtual Channels (VCs) that are connected to a VC steering (steering) block.

The FPGA 128 further includes one or more Acceleration Function Units (AFUs) 210. Each AFU 210 may be embodied as digital logic configured to perform one or more accelerated networking functions. For example, each AFU 210 may be embodied as intelligent NIC logic, intelligent vSwitch logic, or other logic that performs one or more network workloads (e.g., user-designed custom data path logic such as forwarding, classification, packet steering, encapsulation, security, quality of service, etc.). Illustratively, each AFU 210 may be configured by a user of the computing device 102. Each AFU 210 may access data in memory 132 using one or more Virtual Channels (VCs) supported by coherent interconnects 124 and/or non-coherent interconnects 126 using FIU 204. Although the accelerator 128 is illustrated in fig. 2 as an FGPA128 that includes a plurality of AFUs 210, it should be understood that in some embodiments, the accelerator 128 may be embodied as an ASIC, co-processor, or other accelerator 128 that also includes one or more AFUs 210 to provide accelerated networking functionality. In those embodiments, AFU 210 may be fixed-function or otherwise not user-configurable.

Referring now to FIG. 3, in an illustrative embodiment, the computing device 102 establishes an environment 300 during operation. The illustrative environment 300 includes one or more Virtual Network Functions (VNFs) 302, a Virtual Machine Monitor (VMM) 304, virtual I/O blocks 306, vswitches 308, and physical interfaces 310. As shown, the various components of environment 300 may be embodied as hardware, firmware, software, or a combination thereof. Thus, in some embodiments, one or more components of environment 300 may be embodied as a set of circuits or electrical devices (e.g., VNF circuitry 302, VMM circuitry 304, virtual I/O block circuitry 306, vSwitch circuitry 308, and/or physical interface circuitry 310). It should be appreciated that in such embodiments, one or more of VNF circuitry 302, VMM circuitry 304, virtual I/O block circuitry 306, vSwitch circuitry 308, and/or physical interface circuitry 310 may form part of the processor 120, the accelerator 128, the I/O subsystem 130, and/or other components of the computing device 102. Additionally, in some embodiments, one or more illustrative components may form a part of another component, and/or one or more illustrative components may be independent of each other.

The VMM 304 may be embodied as any virtual machine monitor, hypervisor, or other component that allows for the execution of virtualization workloads on the computing device 102. The VMM 304 may have full control of the computing device 102, for example, by executing in a non-virtualized host mode, such as ring 0 level (ringlev) and/or VMX root mode. Each VNF 302 may be embodied as any guest virtual machine, guest operating system, or other guest software configured to execute virtualized workload on computing device 102. For example, each VNF 302 may be embodied as a Virtual Network Function (VNF) or other network workload (e.g., user-designed customized data path logic such as forwarding, classification, packet steering, encapsulation, security, quality of service, etc.). The VMM 304 may enforce isolation between VNFs 302 and otherwise enforce platform security. Thus, the computing device 102 may host customers executed by multiple users or other tenants. VNF 302 and VMM 304 are executed by processor 120. The VMM 304 may be configured to configure the accelerator 128 with a vSwitch 308.

Virtual I/O block 306 may be embodied as one or more I/O ports, queues, or other I/O interfaces accessible by VNF 302. Virtual I/O block 306 may be coupled with, embodied as, or otherwise include one or more para-virtualized drivers that may provide high performance I/O for VNF 302. Illustratively, virtual I/O block 306 may be embodied as one or more virtio (semi-virtual) queues, drivers, and/or other associated components. VMM 304 may be further configured to couple each VNF 302 to a para-virtualization interface provided by virtual I/O block 306.

Each physical interface 310 may be embodied as an ethernet PHY, MAC, or other physical interface. Each physical interface 310 is coupled to a port 312 of an external switch 106 (e.g., a ToR switch) by a network link. The network link may include one or more communication channels, wires, backplanes, optical links, communication channels, and/or other communication components.

vSwitch308 is configured to handle network traffic associated with VNF 302. The accelerators 128 and/or processors 120 may access network traffic via the virtual I/O block 306. Network traffic may be processed within a coherency domain shared by the accelerators 128 and the processors 120. For example, network traffic may be communicated between processor 120 and accelerator 128 via coherent interconnect 124. The vSwitch308 may forward network traffic from the VNF 302 to the switch 106 and/or from the switch 106 to the VNF 302 via the physical interface 310 and the corresponding port 312. vSwitch308 may also forward network traffic between multiple VNFs 302.

Referring now to fig. 4, one potential embodiment of the accelerator 128 is shown the illustrated accelerator 128 may include a virtual I/O block 306, vSwitch308, and/or physical interface 310 as described above the illustrated accelerator 128 may be embodied as, for example, AN AFU 210 of the accelerator 128 as shown, the accelerator 128 includes a complete packet processing pipeline, in particular, the illustrative accelerator 128 includes a gDMA block 402, a pipeline configuration block 404, AN open vSwitch (ovs) virtio handler (handler) block 406, AN I/O configuration block 408, a retimer card 410, a 10 Gb MAC block 412, a greturn (ingress) rate limiter 414, a tunneling block 416 with AN exact match block 426, a vxmeflow (megaflow) block 428, and a network virtualized tunnel block 416, AN OpenFlow classifier 422, a tunneling block with AN exact match block 426, a vxmeflow (mega flow) block 428, and AN OpenFlow encapsulation block 428 using a generic routing encapsulation (NVGRE) block 420, and a tunneling block 434, a tunneling block 432, a switch block 438, a tunneling block, a switch block 434, a forwarding block 442, and a forwarding block 424 with a packet egress packet encapsulation (nvex) encapsulation block 442, a QoS encapsulation block 438, a packet encapsulation block, a forwarding block, a packet encapsulation.

In an illustrative embodiment, physical interface 310 may be embodied as MAC 412 and/or retimer 410. As described above, those components are coupled to the ports 312 of the external switch 106 via network links. Similarly, in the illustrative embodiment, virtual I/O block 306 may be embodied as gDMA 402 and/or virtio handler 406. The vSwitch308 may be embodied as the remaining components of the accelerator 128. For example, the illustrative accelerator 128 may receive incoming network traffic via the retimer 410 and the MAC 412 and provide the data to the tunnel 416. Similarly, the accelerator may receive network traffic generated by VNF 302 via gDMA 402, virtio handler 406, and entry rate limiter 414 and provide the data to tunnel 416. The accelerator 128 processes network traffic using the tunnel 416, packet infrastructure 432 (including the OF classifier 422 and FIB/action table 424), and tunnel 434. After processing, the network traffic is provided to crossbar switch 440 and egress QoS/traffic shaping 442. Network traffic destined for switch 106 is provided to MAC 412 for egress. Network traffic destined for VNF 302 is provided to virtio handler 406.

Referring now to fig. 5, in use, the computing device 102 may perform a method 500 for accelerating network processing. It should be appreciated that in some embodiments, the operations of method 500 may be performed by one or more components of environment 300 of computing device 102 as shown in fig. 3. The method 500 begins at block 502, where the computing device 102 configures the AFU 210 of the accelerator 128 for vSwitch308 operation. The computing device 102 may perform the configuration or partial configuration of the FPGA 128, for example, using a bitstream or other code for vSwitch308 functionality. The computing device 102 may also configure network routing rules, flow rules, actions, QoS, and other network configurations of the vSwitch 308.

In block 504, the computing device 102 binds one or more ports or other physical interfaces 310 of the accelerator 128 to the external switch 106. For example, the computing device 102 may bind one or more MAC, PHY, or other ethernet ports of the accelerator 128 to corresponding port(s) of the external switch 106. The ports of the accelerator 128 may be embodied as fixed-function hardware ports, or reconfigurable "soft" ports. In some embodiments, the ports of the accelerators 128 may be preconfigured or otherwise provided by a manufacturer or other entity associated with the accelerators 128.

In block 506, computing device 102 configures VNF 302 for network processing. Computing device 102 may, for example, load VNF 302 or otherwise initialize VNF 302. VNF 302 may be provided by a tenant or other user of computing device 102. In block 508, the computing device 102 binds the VNF 302 to the virtio queue of the accelerator 128 or other para-virtualized interface of the accelerator 128. Computing device 102 may configure VNF 302, for example, with one or more para-virtualized drivers, queues, buffers, or other interfaces to accelerators 128.

In block 510, computing device 102 determines whether additional VNFs 302 should be configured. If so, method 500 loops back to block 506 to load additional VNFs 302. Each additional VNF 302 may be bound to one or more dedicated virtio queues or other interfaces of the accelerator 128. However, the accelerator 128 need not be bound to an additional port of the switch 106. Referring back to block 510, if no additional VNFs 302 remain to be configured, method 500 proceeds to block 512.

In block 512, computing device 102 utilizes VNF 302 to process network workloads and virtual switches 308 of accelerators 128 to process network traffic. Each of VNFs 302 may generate and/or receive network traffic (e.g., packet frames). For example, each VNF 302 may read and/or write network packet data to a buffer in system memory 132 corresponding to a virtual I/O queue. The vSwitch308 may perform a complete packet processing pipeline operation on the network traffic data. In some embodiments, in block 514, the computing device 102 processes network traffic using the VNF 302 and the vSwitch308 in the same coherency domain. For example, the computing device 102 may communicate data (e.g., virtio queue data) between the processor 120 and the accelerator 128 via the coherent interconnect 124. VNF 302 and vSwitch308 may concurrently, simultaneously, or otherwise process network data utilizing a coherent interconnect 124, which coherent interconnect 124 provides data coherency between a last level cache of processor 120, a cache or other local memory of accelerator 128, and memory 132. In some embodiments, a complete frame of packets may be passed between the processor 120 and the accelerator 128 such that multiple switching actions may occur simultaneously. After processing the network data, the method 500 loops back to block 512 to continue processing network traffic using the VNF 302 and the vSwitch 308. In some embodiments, computing device 102 may dynamically load and/or unload VNF 302 during execution of method 500.

It should be appreciated that in some embodiments, the method 500 may be embodied as various instructions stored on a computer-readable medium that may be executed by the processor 120, the accelerator 128, and/or other components of the computing device 102 to cause the computing device 102 to perform the method 500. The computer-readable medium may be embodied as any type of medium capable of being read by computing device 102, including but not limited to memory 132, data storage device 134, a firmware device, other memory or data storage devices of computing device 102, a portable medium readable by peripheral devices 138 of computing device 102, and/or other media.

Examples of the invention

Illustrative examples of the techniques disclosed herein are provided below. Embodiments of the technology may include any one or more of the examples described below, as well as any combination thereof.

Example 1 includes a computing device to accelerate network processing, the computing device comprising: an accelerator to couple a first network port of a virtual switch of the accelerator with a second network port of the network switch via a network link; and a processor for performing a plurality of virtual network functions in response to the coupling of the first network port and the second network port; wherein the virtual switch is to process network traffic associated with the plurality of virtual network functions in response to execution of the plurality of virtual network functions.

Example 2 includes the subject matter of example 1, and further includes a virtual machine monitor to configure the accelerator with the virtual switch.

Example 3 includes the subject matter of any of examples 1 and 2, and wherein the accelerator comprises a field programmable gate array, and wherein the virtual switch comprises an application function of the field programmable gate array.

Example 4 includes the subject matter of any of examples 1-3, and wherein processing network traffic comprises processing network traffic within a coherency domain shared by the accelerator and the processor.

Example 5 includes the subject matter of any one of examples 1-4, and further comprising: a coherent interconnect coupling the processor and the accelerator; wherein processing the network traffic comprises transferring the network traffic between the processor and the accelerator via a coherent interconnect.

Example 6 includes the subject matter of any of examples 1-5, and further comprising: a virtual machine monitor to couple each of the virtual network functions to a para-virtualization interface of the accelerator; wherein processing network traffic comprises processing network traffic associated with the para-virtualized interface.

Example 7 includes the subject matter of any of examples 1-6, and wherein processing network traffic comprises forwarding network traffic from the plurality of network functions to the network switch via the first network port and the second network port.

Example 8 includes the subject matter of any of examples 1-7, and wherein processing network traffic comprises forwarding network traffic received from the network switch to the plurality of network functions via the first network port and the second network port.

Example 9 includes the subject matter of any of examples 1-8, and wherein processing network traffic comprises forwarding network traffic between the first virtual network function and the second virtual network function.

Example 10 includes the subject matter of any of examples 1-9, and wherein each of the virtual network functions comprises a virtual machine.

Example 11 includes the subject matter of any of examples 1-10, and wherein the accelerator comprises an application specific integrated circuit.

Example 12 includes the subject matter of any of examples 1-11, and wherein the processor and the accelerator are included in a multi-chip package of the computing device.

Example 13 includes a method for accelerating network processing, the method comprising: coupling, by a computing device, a first network port of a virtual switch of an accelerator of the computing device with a second network port of a network switch via a network link; responsive to coupling the first network port with the second network port, performing, by the computing device, a plurality of virtual network functions with a processor of the computing device; and responsive to executing the plurality of virtual network functions, processing, by the computing device, network traffic associated with the plurality of virtual network functions with a virtual switch of the accelerator.

Example 14 includes the subject matter of example 13, and further comprising configuring, by the computing device, the accelerator with the virtual switch.

Example 15 includes the subject matter of any one of examples 13 and 14, and wherein the accelerator comprises a field programmable gate array, and wherein the virtual switch comprises an application function of the field programmable gate array.

Example 16 includes the subject matter of any of examples 13-15, and wherein processing network traffic comprises processing network traffic within a coherency domain shared by the accelerator and the processor.

Example 17 includes the subject matter of any of examples 13-16, and wherein processing network traffic comprises transmitting the network traffic between the processor and the accelerator via a coherent interconnect of the computing device.

Example 18 includes the subject matter of any one of examples 13-17, and further comprising: coupling, by the computing device, each of the virtual network functions to a para-virtualization interface of the accelerator; wherein processing network traffic comprises processing network traffic associated with the para-virtualized interface.

Example 19 includes the subject matter of any of examples 13-18, and wherein processing network traffic comprises forwarding network traffic from the plurality of network functions to the network switch via the first network port and the second network port.

Example 20 includes the subject matter of any of examples 13-19, and wherein processing network traffic comprises forwarding network traffic received from a network switch to the plurality of network functions via the first network port and the second network port.

Example 21 includes the subject matter of any of examples 13-20, and wherein processing network traffic comprises forwarding network traffic between the first virtual network function and the second virtual network function.

Example 22 includes the subject matter of any one of examples 13-21, and wherein each of the virtual network functions comprises a virtual machine.

Example 23 includes the subject matter of any one of examples 13-22, and wherein the accelerator comprises an application specific integrated circuit.

Example 24 includes the subject matter of any of examples 13-23, and wherein the processor and the accelerator are included in a multi-chip package of the computing device.

Example 25 includes a computing device comprising: a processor; and a memory having stored therein a plurality of instructions that, when executed by the processor, cause the computing device to perform the method of any of examples 13-24.

Example 26 includes one or more non-transitory computer-readable storage media comprising a plurality of instructions stored thereon that, in response to being executed, cause a computing device to perform the method of any of examples 13-24.

Example 27 includes a computing device comprising means for performing the method of any of examples 13-24.

Claims

1. A computing device for accelerating network processing, the computing device comprising:

an accelerator to couple a first network port of a virtual switch of the accelerator with a second network port of the network switch via a network link; and

a processor to perform a plurality of virtual network functions in response to coupling of the first network port with the second network port;

wherein the virtual switch is to process network traffic associated with the plurality of virtual network functions in response to execution of the plurality of virtual network functions.

2. The computing device of claim 1, further comprising a virtual machine monitor to configure the accelerator with the virtual switch.

3. The computing device of claim 2, wherein the accelerator comprises a field programmable gate array, and wherein the virtual switch comprises an application function unit of the field programmable gate array.

4. The computing device of claim 1, wherein to process network traffic comprises to process network traffic within a coherency domain shared by the accelerator and the processor.

5. The computing device of claim 4, further comprising:

a coherent interconnect coupling the processor and the accelerator;

wherein processing the network traffic comprises transferring the network traffic between the processor and the accelerator via a coherent interconnect.

6. The computing device of claim 1, further comprising:

a virtual machine monitor to couple each of the virtual network functions to a para-virtualization interface of the accelerator;

wherein processing network traffic comprises processing network traffic associated with the para-virtualized interface.

7. The computing device of claim 1, wherein processing network traffic comprises forwarding network traffic from the plurality of network functions to a network switch via a first network port and a second network port.

8. The computing device of claim 1, wherein to process network traffic comprises to forward network traffic received from a network switch to the plurality of network functions via a first network port and a second network port.

9. The computing device of claim 1, wherein processing network traffic comprises forwarding network traffic between a first virtual network function and a second virtual network function.

10. The computing device of claim 1, wherein each of the virtual network functions comprises a virtual machine.

11. The computing device of claim 1, wherein the accelerator comprises an application specific integrated circuit.

12. The computing device of claim 1, wherein the processor and accelerator are included in a multi-chip package of the computing device.

13. A method for accelerating network processing, the method comprising:

coupling, by a computing device, a first network port of a virtual switch of an accelerator of the computing device with a second network port of a network switch via a network link;

responsive to coupling the first network port with the second network port, performing, by the computing device, a plurality of virtual network functions with a processor of the computing device; and

in response to executing the plurality of virtual network functions, processing, by the computing device, network traffic associated with the plurality of virtual network functions with a virtual switch of the accelerator.

14. The method of claim 13, further comprising configuring, by the computing device, the accelerator with the virtual switch.

15. The method of claim 14, wherein the accelerator comprises a field programmable gate array, and wherein the virtual switch comprises an application function unit of the field programmable gate array.

16. The method of claim 13, wherein processing network traffic comprises processing network traffic within a coherency domain shared by an accelerator and a processor.

17. The method of claim 16, wherein processing network traffic comprises transferring network traffic between a processor and an accelerator via a coherent interconnect of a computing device.

18. The method of claim 13, further comprising:

coupling, by the computing device, each of the virtual network functions to a para-virtualization interface of the accelerator;

19. The method of claim 13, wherein processing network traffic comprises forwarding network traffic from the plurality of network functions to a network switch via a first network port and a second network port.

20. The method of claim 13, wherein processing network traffic comprises forwarding network traffic received from a network switch to the plurality of network functions via a first network port and a second network port.

21. The method of claim 13, wherein processing network traffic comprises forwarding network traffic between a first virtual network function and a second virtual network function.

22. The method of claim 13, wherein each of the virtual network functions comprises a virtual machine.

23. A computing device, comprising:

a processor; and

a memory having stored therein a plurality of instructions that, when executed by the processor, cause the computing device to perform the method of any of claims 13-22.

24. One or more non-transitory computer-readable storage media comprising a plurality of instructions stored thereon that in response to being executed result in a computing device performing the method of any of claims 13-22.

25. A computing device comprising means for performing the method of any of claims 13-22.