WO2015121750A1 - Système et procédé de communication de données entre des interfaces virtuelles - Google Patents

Système et procédé de communication de données entre des interfaces virtuelles Download PDF

Info

Publication number
WO2015121750A1
WO2015121750A1 PCT/IB2015/000343 IB2015000343W WO2015121750A1 WO 2015121750 A1 WO2015121750 A1 WO 2015121750A1 IB 2015000343 W IB2015000343 W IB 2015000343W WO 2015121750 A1 WO2015121750 A1 WO 2015121750A1
Authority
WO
WIPO (PCT)
Prior art keywords
virtual
shared memory
data
virtual interface
memory space
Prior art date
Application number
PCT/IB2015/000343
Other languages
English (en)
Inventor
Vincent Jardin
Olivier MATZ
David Marchand
Original Assignee
6Wind
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 6Wind filed Critical 6Wind
Publication of WO2015121750A1 publication Critical patent/WO2015121750A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/544Buffers; Shared memory; Pipes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45595Network integration; Enabling network access in virtual machine instances

Definitions

  • the present disclosure relates to the field of data communication in virtualized environments.
  • Such virtualized environments typically include virtual machines which are software emulations of a physical machine executing software through a software system usually referred to as a hypervisor or a virtual machine manager. These software emulations may also use some on-CPU specific acceleration instructions.
  • the first challenge relates to the number of Virtual Machines (VMs) per computer (e.g. a server blade).
  • VMs Virtual Machines
  • This VM density is rapidly increasing, leveraging ongoing improvements in the performance of the processors (e.g. x86, Power or ARM) used on those blades.
  • a typical server blade in a service provider data center hosts at least 50 VMs, with that number expected to grow to hundreds within a few years. Because of this growth in the number of VMs running on each server blade, the data center network needs to expand beyond its current limit at the Top-of-Rack (ToR), to a model where a virtual switch on each server blade is used to distribute the increasing volume of network traffic to virtualized applications.
  • This virtual switch function is typically implemented using the open-source Open vSwitch (OVS) or an equivalent proprietary virtual switch.
  • OVS Open vSwitch
  • the second challenge is the network bandwidth required by VMs.
  • VMs With the constant growth in rich media applications, individual VMs can require sustained network bandwidth of 10Gbps or more. As VM density increases, this bandwidth requirement can quickly outstrip the capacity of a standard virtual switch, constraining either the number of VMs that can be instantiated on a blade or the performance seen by the users of those VMs.
  • the new extensions to the 6WINDGate networking software provide solutions to these three challenges, delivering a data plane solution that achieves 5x - 10x acceleration for the baseline Layer 2 switching function.
  • data center operators can achieve the increases in VM density that are enabled by on-going improvements in the performance of server processors. They can also deliver high network bandwidth to individual VMs, addressing the performance needs of users running streaming media applications or other bandwidth-hungry workloads.
  • the communication between virtual machines and host side of the hypervisors are based on virtualized or para-virtualized interfaces between the virtual machines and hypervisors.
  • virtualized and para-virtualized interfaces typically emulate the behaviors of hardware interfaces (e.g. Peripherical Component Interconnect (PCI) interfaces or Ethernet ports) for interruptions and data polling from the interfaces.
  • PCI Peripherical Component Interconnect
  • Ethernet ports Ethernet ports
  • the VM-to-VM traffic is processed through virtual switches or virtual routers that interconnect the VMs. It avoids setting a full-mesh interconnect between the VMs but it adds some computing pressure on the hypervisors.
  • the network throughput performance may be considered good enough for most Operating Systems (OS) running on the virtual machine.
  • OS Operating Systems
  • the guest systems i.e. the operating systems running on VMs, processing the packets remain substantially idle until they are notified of new packets or new events being queued for processing.
  • the notification is triggered by the hypervisor which provides the emulation of the virtualized and para-virtualized interfaces.
  • Such technologies e.g. the opensource project dpdk.org (Intel® DPDK - Data Plane Development Kit)
  • dpdk.org Intel® DPDK - Data Plane Development Kit
  • polling technology provides performances that are satisfactory in terms of packet processing rate, they have significant CPU usage drawbacks. Indeed, they involve dedicating CPU resource to the queue management and polling, so that such resource is never idle and available for other tasks.
  • CPUs whether virtual or not) handling the polling are exclusively dedicated to the polling and packet processing tasks.
  • Another object of the present subject disclosure is to provide systems, software and methods for alleviating the drawbacks of conventional hypervisor software systems.
  • a method that uses a hypervisor's provided shared memory in order to create memory-based communication between virtual machines (also referred in the present subject disclosure to as, "guests") and hypervisors, while combining it with the external messaging interfaces of the hypervisors is proposed.
  • the proposed method provides the benefit of avoiding CPU-load of CPU resource being dedicated to communication tasks when using polling technologies.
  • the present disclosure also provides a new set of interfaces for packet and event processing in order to provide efficient data plane for virtualized environments.
  • a method for data communication between virtual interfaces in a virtualized environment running on a computer host comprising a processor operatively coupled with a memory, the virtualized environment comprising first and second virtual interfaces associated with one or several virtual nodes.
  • the method comprises: allocating in the memory a shared memory space of the virtualized environment; assigning in the shared memory space a sub-space associated with the second virtual interface for data communication; upon receipt at the first virtual interface of data to be communicated to the second virtual interface, storing the data in the sub-space of the shared memory space associated with the second virtual interface; and extracting the data from the sub-space of the shared memory space for processing by the second virtual interface.
  • the process of the present subject disclosure contributes to minimizing the processor usage of virtual nodes (e.g. virtual machines), and as a consequence to minimizing usage of the host processor.
  • virtual nodes e.g. virtual machines
  • the present subject disclosure provides improved virtual interfaces for communication between virtual machines, virtual machine to host, using a shared memory space, and in some embodiments using a notification mechanism implemented externally to the hypervisor.
  • the notification function may be implemented on the host, it can be independent from the hypervisor management.
  • the hypervisor function may be viewed as a client of the notification function which provides notification services to various types of clients, including virtual machines.
  • first and second virtual interfaces may respectively be associated with a first and second virtual machines of the virtualized environment.
  • first and second virtual interfaces may be associated with a virtual machine of the virtualized environment and a virtual switch or virtual router of the virtualized environment, respectively.
  • the data communication method further comprises: notifying the second virtual interface of storage of data in the shared memory space, and the second virtual interface extracting the data from the shared memory space responsive to the notifying.
  • the data communication method further comprises: the second virtual interface polling the shared memory and extracting the packet from the shared memory space responsive to the polling.
  • the data communication method further comprises: communicating to the first and second virtual interfaces information for accessing the shared memory space, wherein information for accessing the shared memory space includes a file name identifying the shared memory space. In one or more embodiments, the data communication method further comprises: notifying the second virtual interface of incoming data packets further to the storing of a plurality of data packets in the shared memory space.
  • a system for data communication between virtual interfaces comprising: a computer host comprising a processor operatively coupled with a memory, configured to: run a virtualized environment comprising first and second virtual interfaces associated with one or several virtual nodes; allocate in the memory a shared memory space of the virtualized environment; assign in the shared memory space a sub-space associated with the second virtual interface for data communication; upon receipt at the first virtual interface of data to be communicated to the second virtual interface, store the data in the sub-space of the shared memory space associated with the second virtual interface; and extract the data from the sub-space of the shared memory space for processing by the second virtual interface.
  • a computer program product comprising computer program code tangibly embodied in a computer readable medium, said computer program code comprising instructions to, when provided to a computer system and executed, cause said computer to perform a method for data communication according to the present subject disclosure.
  • a set of data representing, through compression or encoding, a computer program according to the present subject disclosure.
  • a non-transitory computer- readable storage medium storing a computer program that, when executed, causes a system comprising a processor operatively coupled with a memory, to perform a method for data communication between virtual interfaces in a virtualized environment running on a computer host comprising a processor operatively coupled with a memory, the virtualized environment comprising first and second virtual interfaces associated with one or several virtual nodes, the method comprising: allocating in the memory a shared memory space of the virtualized environment; assigning in the shared memory space a sub-space associated with the second virtual interface for data communication; upon receipt at the first virtual interface of data to be communicated to the second virtual interface, storing the data in the sub- space of the shared memory space associated with the second virtual interface; and extracting the data from the sub-space of the shared memory space for processing by the second virtual interface.
  • Figure 1 is a schematic diagram illustrating a host computing system running a virtual environment in accordance with one or more embodiments.
  • Figure 2 is a flow-chart illustrating an exemplary process of data communication between two virtual interfaces in accordance with one or more embodiments.
  • Figure 3 is a flow-chart illustrating an exemplary process of creating and configuring a virtual interface in accordance with one or more embodiments.
  • Figure 4 is a flow-chart illustrating an exemplary process of data communication between two virtual interfaces in accordance with one or more embodiments.
  • Figure 5 shows a computer system in accordance with one or more embodiments. Description of embodiments
  • each described function, engine, block of the block diagrams and flowchart illustrations can be implemented in hardware, software, firmware, middleware, microcode, or any suitable combination thereof. If implemented in software, the functions, engines, blocks of the block diagrams and/or flowchart illustrations can be implemented by computer program instructions or software code, which may be stored or transmitted over a computer-readable medium, or loaded onto a general purpose computer, special purpose computer or other programmable data processing apparatus to produce a machine, such that the computer program instructions or software code which execute on the computer or other programmable data processing apparatus, create the means for implementing the functions described herein.
  • Embodiments of computer-readable media includes, but are not limited to, both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another.
  • a "computer storage media” may be any physical media that can be accessed by a computer. Examples of computer storage media include, but are not limited to, a flash drive or other flash memory devices (e.g. memory keys, memory sticks, key drive), CD-ROM or other optical storage, DVD, magnetic disk storage or other magnetic storage devices, memory chip, RAM, ROM, EEPROM, smart cards, or any other suitable medium from that can be used to carry or store program code in the form of instructions or data structures which can be read by a computer processor.
  • various forms of computer-readable media may transmit or carry instructions to a computer, including a router, gateway, server, or other transmission device, wired (coaxial cable, fiber, twisted pair, DSL cable) or wireless (infrared, radio, cellular, microwave).
  • the instructions may comprise code from any computer-programming language, including, but not limited to, assembly, C, C++, Visual Basic, HTML, PHP, Java, Javascript, Python, and bash scripting.
  • the fast path environment or any equivalent is used as an example of a set of data-plane prim itives that can be either implemented in hardware (FPGA, ASICs) or software.
  • the words data-plane and fast path can be used to describe the same technology as long as they describe technology that offloads packet processing from networking stacks provided with an operating system (which may be referred as a "slow path").
  • the Intel® DPDK environment or any equivalent is used as an example of set of libraries providing primitives for building packet processing environments. Therefore, the proposed method may be implemented with software tools such as the Intel® DPDK environment. It can be based on source code from dpdk.org, any derivative or original software implementation of a packet processing environment. For example, ODP (Open Data Plane) is another alternative mainly focused on ARM families of processors.
  • exemplary means serving as an example, instance, or illustration. Any aspect or design described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other aspects or designs.
  • Figure 1 shows a host computer (100) running a virtualized environment comprising virtual machines VMi (103), VM 2 (104), and VM 3 (105), which may be instantiated by a hypervisor software running on the host (100).
  • the architecture of the hypervisor software may comprise a virtual machine side (101 ) in which virtual machines of the virtualized environment are instantiated, and a host side (102) in which other software processes of the virtualized environment are instantiated.
  • virtual nodes executed on the machine side (101 ) of the hypervisor are sometimes referred to as “guests”, whereas virtual nodes executed on the host side (102) of the hypervisor are sometimes referred to as "hosts".
  • data transmission from a virtual node running on the machine side (101 ) of the hypervisor to a virtual node running on the host side (102) of the hypervisor are sometimes referred to as "guest-to-host", or "VM-to-host” transmission or communication
  • data transmission from a virtual node running on the host side (102) of the hypervisor to a virtual node running on the machine side (101 ) of the hypervisor are sometimes referred to as "host-to-guest", or "host-to-VM” transmission or communication
  • data communication between two virtual nodes running on the machine side (101 ) of the hypervisor are sometimes referred to as "guest-to- guest", or "VM-to-VM” transmission or communication
  • data communication between two virtual nodes running on the host side (102) of the hypervisor are sometimes referred
  • the processing virtualized environment shown on Fig. 1 includes a virtual switch node (106) which also may run on the host and be instantiated by the hypervisor on the host side of the hypervisor (102).
  • the virtual switch node provides the functions of a network switch for data packet or data unit switching between interfaces, and may be configured for providing in addition routing services, in which case it may be referred as a virtual switch/virtual router, or "vSwitch/vRouter", as is the case on Fig. 1 .
  • the virtual switch node may implement a fast data plane technology, for example based on the DPDK development environment. However the present subject disclosure is not limited to the DPDK development environment and any virtual switch and/or virtual router may be used.
  • Each of the virtual machines VM (103), VM 2 (1 04), and VM 3 (1 05) may comprise one or several virtual interfaces ⁇ VNlCl (103a), VNlCl (103b), VNIC (103c), VNlCl (104a), VNIC (104b), and VNlCf (1 05a)).
  • the virtual switch (106) may also include one or several virtual interfaces ⁇ VNIC S (1 06a), and VNIC% S (106b)), as illustrated in Fig. 1 .
  • Each of the virtual interfaces ( VNICl (103a), VNlCl (1 03b), VNlCl (103c), VNlCl (104a), VNIC1 ⁇ 2 (1 04b), VNlC (105a), VNIC? S (1 06a), and VNIC? (106b)) is associated with a virtual node (e.g. a virtual machine, a virtual switch, or a virtual router, etc.
  • a virtual node e.g. a virtual machine, a virtual switch, or a virtual router, etc.
  • the virtualized environment emulates an interface for data communication between the virtual node to which it is associated and another interface, which may be virtualized, para-virtualized, or non-virtualized, associated with a node, which may be the same or another virtual node, or a non-virtual node, internal or external to the virtualized environment and/or the host computer (100).
  • another interface which may be virtualized, para-virtualized, or non-virtualized, associated with a node, which may be the same or another virtual node, or a non-virtual node, internal or external to the virtualized environment and/or the host computer (100).
  • the virtual interfaces ⁇ VNICl (1 03a), VNlC (103b), VNlCl (103c), VNlCl (104a), VNlCl (104b), VNIC (105a), VN!C ⁇ (1 06a), and VNIC (106b)) may implement a virtual network interface card (vN IC) function, such as, for example, a virtual Ethernet port function.
  • vN IC virtual network interface card
  • a virtual network interface function may implement a virtual interface on which a virtual node (a virtual machine such as VM-i, VM 2 , VM 3 , or the virtual switch (106) vSwitch/vRouter) is logically connected so as to send or receive data.
  • the virtual machine VMi (1 03) runs 3 virtual network interfaces ( VNICl (103a), VNlCl (1 03b), and VNlCl (1 03c)), the virtual machine VM 2 (104) runs 2 virtual interfaces ⁇ VNIC? (1 04a) and VTV/Cf (104b)), and the virtual machine VM 3 (1 05) runs one virtual interface ⁇ VNlC (1 05a)).
  • the virtual switch (1 06) runs two virtual interfaces ⁇ VNIC ⁇ (1 06a), and VNIC$ S (106b)).
  • the virtual machines VMi (103), VM 2 (104), VM 3 (1 05) may also be provided with respective operating system software (OSS) (1 03d, 1 04c, 1 05d), such as Linux, Windows, Solaris, Android, etc. , which may provide typical OS functions for operating a computer (such as memory management, task management, CPU load management, etc.), for example through a kernel OS software (herein referred as an OS kernel” or a "kernel”), and network stack functions (1 03f, 104d) with an interface driver (103e, 104e).
  • OSS operating system software
  • the interface driver (103e, 104e) may be configured for driving respective one or several virtual interfaces (VNICl (103a), VNIC (103b), VNICl (103c), VNlC (104a), VNIC (104b), and V NIC? (105a)).
  • the virtual machines VMi (103), VM 2 (104), VM 3 (105) may also be provided with one or several applications (103g, 104f, 105e) whose instantiation is managed by respective operating system software (OSS) (103d, 104c, 105d).
  • OSS operating system software
  • one or several of the virtual nodes may be configured so as to include a fast path data plane technology.
  • each of the virtual interfaces may be configured to be driven by an OS kernel bypass port which implements a virtual interface function through fast path interface drivers and network stacks, instead of or in addition to being configured as an OS kernel port which implements a virtual interface function through interface drivers and network stacks provided in the OS kernel.
  • the virtual interface driver may be configured to be an OS kernel bypass port or driver (105b), such as, for example, the DPDK Poll Mode Driver (DPDK PMD), in which case network stack functions (105c) customized to be interoperable with the DPDK network interface driver (105b) may be used in place of network stacks and interface driver (not shown in the figure) provided in the OS (105d) running on the corresponding virtual node (105).
  • the network stacks and interface driver provided in the OS (105d) may in turn be used when the OS kernel bypass driver (105b) is not used, or is not in operation.
  • the OS kernel bypass driver (105b) may be configured to operate in two different modes: a sleep mode and an operation mode.
  • the OS kernel bypass driver (105b) uses a polling mechanism in order to poll the corresponding virtual interface (105a) for new data to be processed.
  • the polling rate may be high in order to ensure high rate data processing.
  • the OS kernel bypass driver (105b) may perform polling of the virtual interface (105a) at a rate smaller than an operating mode rate, or may temporarily stop polling the virtual interface (105a).
  • Wake-up and sleeping mechanisms may be implemented in order to transition from the sleep mode to the operation mode, and from the operation mode to the sleep mode, respectively.
  • FIG. 1 Shown on Fig. 1 are virtual interfaces ⁇ VNICl (103c) and VNlC (104a)) which may be used for communication between two virtual machines, such as, for example, VM (103) and VM 2 (104), and virtual interfaces (VNIC1 ⁇ 2 (104b), VNlCl (105a), and VNIC (106a)) which may be used for communication between the virtual switch (106) vSwitch/vRouter and a virtual machine, such as, for example, VM 2 (104) and VM 3 (105).
  • VNIC1 ⁇ 2 vSwitch/vRouter
  • a virtual machine such as, for example, VM 2 (104) and VM 3 (105).
  • fast path data plane functions are provided on the host (for example in the virtual switch (106) vSwitch/vRouter shown on Fig. 1 )
  • such virtual interfaces may be used for communication between these fast data plane functions running on the host and any virtual machine using a virtual interface according to the present subject disclosure
  • Figure 2 is a flow diagram illustrating a proposed process in accordance with one or more embodiments.
  • the virtualized environment is configured (201 ) with the creation of one or several virtual nodes, which may be instantiated as discussed above by a hypervisor, and the creation of first and second virtual interfaces associated with the one or several virtual nodes of the virtualized environment.
  • the first and/or second virtual interface may be included in a virtual node, and the creation of such virtual interface may include instantiating the virtual node and configuring the same so as to create a virtual interface associated thereto.
  • the first and/or second virtual interface may not be included in a virtual node, and the creation of such virtual interface may include instantiating the virtual interface independently from that of the virtual node, and configuring an association between the virtual interface and a virtual node associated thereto.
  • a shared memory space of the virtualized environment is allocated (202) in the memory of the host computer on which the virtualized environment is executed. In one or more embodiments, this shared memory space is used for data communication between the first and second virtual interfaces of the virtualized environment. In one or more embodiments, the allocation and management of shared memory space may be performed through a shared memory management application, referred to in the following as shared memory server, shared memory framework API, or SHMEM.
  • shared memory management application referred to in the following as shared memory server, shared memory framework API, or SHMEM.
  • the allocated shared memory space may then be mapped through the assigning (203) in the shared memory space of a sub-space associated with the second virtual interface for data communication.
  • a first sub-space may be allocated in the shared memory space for use by the first virtual interface, and a second sub-space may be allocated in the shared memory space for use by the second virtual interface.
  • the first and second sub-spaces may preferably not overlap.
  • the assigned memory sub-spaces may be organized for data communication in various ways. For example, memory queues may be created in the assigned sub- spaces. Such memory queues may be managed by the shared memory server provided by the present subject disclosure.
  • the definition of the memory queues in the allocated shared memory space may occur with the allocating the shared memory space, depending on the configuration of the shared memory server. At least two memory queues may be created in the shared memory space for each virtual interface, one queue being used for storing received data, and the other being used for storing data to be transmitted.
  • outgoing data When outgoing data is received by the first virtual interface, to be communicated (transmitted) to the second virtual interface (recipient virtual interface), such outgoing data may be stored (204) in the sub-space of the shared memory space associated with the second interface.
  • the data is extracted (205) from the shared memory space and provided to the second virtual interface for further processing.
  • the extracting the data from the sub-space associated with the second virtual interface may be performed further to a polling of the shared memory space, and/or a notification provided to the second virtual interface.
  • the second virtual interface is configured to poll the shared memory space for detecting new data to be extracted from the sub-space to which it is associated.
  • a polling mechanism is typically used in environments which provide OS kernel bypass data plane processing, such as the dpdk.org (Intel® DPDK) environment.
  • Such environments provide a virtual interface driver interoperable with network stacks so as to bypass the virtual interface driver and network stacks provided by the OS kernel.
  • the present subject disclosure provides a Poll Mode Driver (referred to as "PMD") scheme for virtual interfaces running in such environments.
  • PMD-type virtual interfaces may therefore be configured to poll the shared memory space used for data communication between virtual interfaces according to the present subject disclosure. In order to minimize CPU usage, this PMD-type virtual interface can work in sleep/operation mode along with notification as already described for the virtual interface (105a) associated with an OS kernel bypass driver (105b).
  • At least some virtual interfaces may not operate in an environment which provides OS kernel bypass data plane processing, but instead in an environment which requires processing the data (e.g. packets, events, etc.) through the OS kernel interfaces (including the virtual interface driver and network stacks provided in the OS kernel).
  • the present subject disclosure provides an OS kernel interface driver (referred to as "vNIC") process that uses a notification framework for informing a recipient virtual interface that data is to be extracted from the shared memory space.
  • the notification framework provides notification services for handling communications through virtual interfaces according to the present subject disclosure.
  • data e.g. packets, events, etc.
  • data may first be notified to the recipient through a notification function.
  • the hypervisor's trigger function may therefore not be used in the proposed process as opposed to the virtualized or para-virtualized technologies which require some use of the hypervisor to provide the virtual network devices, along with their interruption emulation.
  • the vSwitch/vRouter fast path stack (106) may be sent through the virtual interfaces VNIC?
  • the virtual switch/virtual router (106) requests from the application managing the shared memory, e.g. from the shared memory server, to post a notification towards the guests recipient VMs.
  • the guests VMs being in receipt of a notification can start processing one or more packets from their respective receive queues without over using the CPUs, as they do not have necessarily to keep polling thanks to the combined use of the notification and of the enqueuing of packets through the shared memory mechanism of the hypervisor.
  • the communication between virtual interfaces through a shared memory according to the proposed data communication method may be performed between any combinations of virtual interfaces in a virtualized environment (for example, referring to Fig. 1 , any combination of the virtual interfaces VNlC (103a), VNlC (103b), VNlC (103c), VNlC (104a), VNIC (104b), VNlC (105a), VNIC? (106a), and VNIC S (106b)), each of which may be of the PMD type or the vNIC type as described above.
  • Figure 3 is a flow diagram illustrating the creation and mapping (300) of a vNIC type virtual interface. As described above, a vNIC type virtual interface may be configured to use the notification framework provided by the present subject disclosure.
  • the hypervisor software environment includes one or more shared memory framework APIs (herein referred to as "SHMEM”) for managing (including creating) shared memory between guests and host side of the hypervisor.
  • SHMEM shared memory framework APIs
  • Examples of such APIs include Qemu's ivshmem or vmware's VMCI. They also usually include notification APIs (herein referred to as "NOTIF”) for managing (including creating, sending and receiving) guest-to-guest, guest-to-host, and/or host-to-guest notifications.
  • notification API framework include Qemu's ivshmem MS l-X for external applications.
  • a shared memory space is created (301 ) using a shared memory API, for instance as provided by the hypervisor operating the virtualized environment.
  • a shared memory API for instance as provided by the hypervisor operating the virtualized environment.
  • the Qemu's shared memory API "Nahanni” and/or “ivshmem” may be used for allocating the shared memory space according to the present subject disclosure.
  • the Vmware API "VMCI” may be used for allocating the shared memory space according to the present subject disclosure.
  • the virtual interface data communication process according to the present subject disclosure may be implemented using a virtual interface data communication software (here-below referred to as "fast-vnic-ivshmem-server") running on the computer host on which the virtualized environment is running, for example on the host side of the hypervisor of such virtualized environment, in which case it may be referred to as a host/hypervisor server.
  • a virtual interface data communication software here-below referred to as "fast-vnic-ivshmem-server”
  • fast-vnic-ivshmem-server running on the computer host on which the virtualized environment is running, for example on the host side of the hypervisor of such virtualized environment, in which case it may be referred to as a host/hypervisor server.
  • the creation (301 ) of a shared memory space in a virtualized environment running on the computer host in the memory of such computer host may be performed using a virtual interface data communication software, for example through routine calls to the shared memory framework API SHMEM.
  • the creation of notifications (302) may also be performed using the virtual interface data communication software, for example through routine calls to the notification API NOTIF.
  • the Qemu's notification API "Nahanni MSI-X" may be used for triggering notifications according to the present subject disclosure.
  • IRQ or semaphore based technologies may also be used for implementing the notification framework according to the present subject disclosure.
  • this exemplary "fast-vnic-ivshmem- server" virtual interface data communication software application may run on the host side of the hypervisor.
  • fast-vnic-ivshmem-server is run per virtual network interface.
  • the "-I 16M -m /dev/shm/fast-vnic-shm1 " arguments are used to require the creation of a set of communication queues of size 16 Mbytes for the received data to be queued.
  • the 7dev/shm/fast-vnic-shm1 " argument designates the name of a shared memory file to be created.
  • the "-S /tmp/ivshm-sock1 -n 32" arguments are used to require the creation of a notification channel made of 32 mailboxes through the descriptor named 7tmp/ivshm-sock1".
  • this fast-vnic-ivshmem server may be running for each virtual interface to be created since each interface has its own shared memory and notification channel with its respective name.
  • a virtual interface may then be created and configured for using the allocated notification mailbox and shared memory space for data communication, for example through the indication of the file name of the shared memory file.
  • an interface for the event model which is being used is created.
  • the VNIC or VNIC virtual interface may be created using the following fp-rte routine call (specific aspects related to the present subject disclosure are emphasized):
  • This command creates two virtual interfaces into the host for VNIC and VN1C% S assuming respectively their shared memory - fast-vnic-shm l and fast-vnic- shm2 - and their notification channels - /tmp/ivshm-sock1 and /tmp/ivshm-sock2 - have been created by two processes fast-vnic-ivshmem-server for each of them.
  • Equivalent commands may be used to create virtual interfaces into the guest either using a userland kernel bypass application (another DPDK based fast path running into the guest for instance) or using a kernel driver.
  • a Linux kernel netdevice for virtual interfaces VNIC , VNIC , VNIC , VNICf , or V/Cf shown on Fig. 1 may be created using: guest$ modprobe fast_vnic.ko.
  • Figure 4 illustrates the use of virtual interfaces for data communication according to the present subject disclosure.
  • Shown on the right-handside of the diagram of figure 4 is an exemplary process that may be used on the data transmitter virtual interface side, while the left- handside shows an exemplary process that may be used on the data receiver or recipient virtual interface side.
  • a process is executed as a loop (401 ) comprising the following operations:
  • Data (for example in the form of one or several packets, or one or several events) is received (402) for transmission to another virtual interface.
  • the data to be transmitted is then stored (403) in the shared memory (enqueued in the shared memory), and a notification of received data to the recipient virtual interface is triggered (404).
  • the notification (405) may be provided to the recipient virtual interface through a mailbox flag, a notification message, a MSI-X notification or any other appropriate notification scheme.
  • a process is also executed as a loop (406) comprising the following operations:
  • the recipient virtual interface waits (407) for a notification of received data.
  • such waiting operation may comprise a loop though which the virtual interface regularly checks a mailbox flag or receipt of a message.
  • the virtual interface may wait for an interrupt request (IRQ), for a MSI-X message, or on triggers.
  • IRQ interrupt request
  • the received data is extracted (408) from the shared memory, and then processed (409).
  • the kernel driver fast_vnic.ko can be notified of the readiness of packets thanks to either IRQ or MSI-X notified toward the guests.
  • the notifications are triggered by the fp-rte process that posts (404) such events toward the guests through the mailboxes (notification channels) that have been created by the fast-vnic-ivshmem-server.
  • the mailboxes (notification channels) /tmp/ivshm-sock1 and /tmp/ivshm-sock2 are unix domain sockets with some event file descriptors that allows such two-way notifications (404).
  • 32 mailboxes for notifications are created for the virtual interface VNIC , which allows the sending of notifications on specific queues of the shared memory (/dev/shm/fast- vnic-shml ).
  • a notification may be directed to a packet, an event, or a plurality thereof.
  • the set of packets/events (also referred to as “batch” or “bulk”) being queued in memory are extracted from the queue and processed by the recipient.
  • the virtual interface VNIC may request the transmission of a notification from a notification routine [NOTIF] (107) of the virtual interface data communication software fast-vnic-ivshmem-server, which, responsive to the request, notifies the recipient virtual interface (for instance VNlC , VNlC , VNlC , VNlC , VN1C3 ⁇ 4, or VNIC ) so as to wake-up the corresponding guest recipient for processing the received data.
  • a notification routine [NOTIF] (107) of the virtual interface data communication software fast-vnic-ivshmem-server which, responsive to the request, notifies the recipient virtual interface (for instance VNlC , VNlC , VNlC , VNlC , VN1C3 ⁇ 4, or VNIC ) so as to wake-up the corresponding guest recipient for processing the received data.
  • This notification scheme offers the advantage of avoiding a polling mechanism which may result in a severe computing overhead. It allows the virtual machines to remain idle until reception of
  • the virtual interface VNlCl (104a) may also request the transmission of a notification from a notification routine [NOTIF] (107) of the virtual interface data communication software fast-vnic-ivshmem-server, which, responsive to the request, notifies the recipient virtual interface VNlCl (103c) of the virtual machine VMi (103) so as to wake-up the recipient virtual interface VNlCl (103c) for processing the received data.
  • this VM-to-VM service uses the relay of notifications provided by the framework of the hypervisor. It allows the VMs to remain idle until a packet and/or event has to be processed.
  • a notification to such recipient virtual interface may not be made, as the polling mechanism may be used for polling the shared memory space in which received data has been stored.
  • This has the advantage of minimizing the processing latency at the recipient virtual interface for received data.
  • the proposed method does not require the use of any acceleration services which may be integrated into the virtualized or para-virtualized interfaces, or into the back-end of these interfaces.
  • VNIC ⁇ (103a), VNIC (103b), VNICl (103c), VNIC (104a), VNIC (104b), VNIC (105a), VNIC 5 (106a), and VNIC S (106b) virtual interfaces (PMD or vNIC), and their respective back-end
  • offloading services such as TSO (TCP segment Offload), LRO (Large Receive Offload), RSS (Receive Side Scaling) or any packet and event distribution mechanism, GRO (Generic Receive Offload), RSC (Receive Side Coalescing), TOE (TCP offload engine) or any other mechanisms that can minimize the number of transactions for transmitting and receiving network packets and events, without necessarily using acceleration services.
  • TSO TSO
  • LRO Large Receive Offload
  • RSS Receive Side Scaling
  • GRO Generic Receive Offload
  • RSC Receive Side Coalescing
  • TOE TCP offload engine
  • the usual purpose of these back-end engines is to provide a segmentation and reassembly layer of network packets
  • VNIC (104a), VNIC (104b), VNIC (105a), VNIC 3 (106a), and VNIC S (106b) virtual interfaces are providing usual network services, other driver level services, such as a watchdog of the [NOTIF] framework, may be included in an embodiment of the present subject disclosure in order to restart kicking the processing in case of lack of [NOTIF], link up/down management, MTU management, counters and any other state of art management infrastructure of an interface.
  • driver level services such as a watchdog of the [NOTIF] framework
  • a computer system (500) which may correspond to the computer host (100) shown on Fig. 1 , includes a processing unit (501 ), which includes one or several processors (502), such as a central processing unit (CPU) or any other hardware processor, associated memory (503) (for example, a RAM memory, a cache memory, a flash memory, etc.), a storing unit (504) (for example a hard drive, an optical disk such as a CD or a DVD, a flash memory key, etc.), and numerous other elements and functionalities typical of today's computers (not shown on the figure).
  • processors for example, a central processing unit (CPU) or any other hardware processor
  • associated memory (503) for example, a RAM memory, a cache memory, a flash memory, etc.
  • a storing unit (504) for example a hard drive, an optical disk such as a CD or a DVD, a flash memory key, etc.
  • the processing unit (501 ) may also comprise an input/output interface unit (505) for driving the interfaces between the processing unit (501 ) and input.output means of the system (500).
  • the system (500) may include input means, such as a keyboard (506), a mouse (507), or a microphone (not shown).
  • the system (500) may also include output means, such as a monitor (508) (for example, a LCD monitor, a LED display, or a CRT display, etc.).
  • the computer system (500) may also be connected to a network (509) (for example, a local area network (LAN), a Wide Area Network (WAN) such as the Internet, or any other similar type of network) via a network interface connection (not shown).
  • LAN local area network
  • WAN Wide Area Network
  • Information and signals described herein can be represented using any of a variety of different technologies and techniques.
  • data, instructions, commands, information, signals, bits, symbols, and chips can be represented by voltages, currents, electromagnetic waves, magnetic fields or particles, optical fields or particles, or any combination thereof.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

L'invention concerne un procédé de communication de données entre des interfaces virtuelles au moyen d'une mémoire partagée dans un environnement virtualisé, l'environnement virtualisé comprenant une première et une seconde interface virtuelle associées à un ou plusieurs nœuds virtuels. Le procédé comprend les étapes suivantes : allocation dans la mémoire d'un espace de mémoire partagé de l'environnement virtualisé; attribution dans l'espace de mémoire partagé d'un sous-espace associé à la seconde interface virtuelle pour la communication de données; après réception, au niveau de la première interface virtuelle, de données à communiquer à la seconde interface virtuelle (402), stockage des données dans le sous-espace de l'espace mémoire partagé associé à la seconde interface virtuelle (403); et extraction des données du sous-espace de l'espace mémoire partagé pour traitement par la seconde interface virtuelle (408).
PCT/IB2015/000343 2014-02-14 2015-02-11 Système et procédé de communication de données entre des interfaces virtuelles WO2015121750A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201461939969P 2014-02-14 2014-02-14
US61/939969 2014-02-14

Publications (1)

Publication Number Publication Date
WO2015121750A1 true WO2015121750A1 (fr) 2015-08-20

Family

ID=53052893

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2015/000343 WO2015121750A1 (fr) 2014-02-14 2015-02-11 Système et procédé de communication de données entre des interfaces virtuelles

Country Status (1)

Country Link
WO (1) WO2015121750A1 (fr)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017155545A1 (fr) * 2016-03-11 2017-09-14 Tektronix Texas, Llc. Données d'horodatage reçues par un système de surveillance dans une nfv
CN108243118A (zh) * 2016-12-27 2018-07-03 华为技术有限公司 转发报文的方法和物理主机
US10437523B2 (en) 2016-02-25 2019-10-08 Red Hat Israel, Ltd. Secure receive packet processing for network function virtualization applications
US20200004572A1 (en) * 2018-06-28 2020-01-02 Cable Television Laboratories, Inc Systems and methods for secure network management of virtual network functions
CN111143199A (zh) * 2019-12-11 2020-05-12 烽火通信科技股份有限公司 一种云平台中检测dpdk应用程序内存越界访问的方法
US20210357242A1 (en) * 2020-05-18 2021-11-18 Dell Products, Lp System and method for hardware offloading of nested virtual switches
CN114416292A (zh) * 2021-12-31 2022-04-29 北京字节跳动网络技术有限公司 定位设备串口的虚拟化方法、设备、装置、介质及产品

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040083481A1 (en) * 2002-10-24 2004-04-29 International Business Machines Corporation System and method for transferring data between virtual machines or other computer entities
US20050114855A1 (en) * 2003-11-25 2005-05-26 Baumberger Daniel P. Virtual direct memory acces crossover
US20100217916A1 (en) * 2009-02-26 2010-08-26 International Business Machines Corporation Method and apparatus for facilitating communication between virtual machines
US20110010428A1 (en) * 2007-12-21 2011-01-13 Kevin Rui Peer-to-peer streaming and api services for plural applications

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040083481A1 (en) * 2002-10-24 2004-04-29 International Business Machines Corporation System and method for transferring data between virtual machines or other computer entities
US20050114855A1 (en) * 2003-11-25 2005-05-26 Baumberger Daniel P. Virtual direct memory acces crossover
US20110010428A1 (en) * 2007-12-21 2011-01-13 Kevin Rui Peer-to-peer streaming and api services for plural applications
US20100217916A1 (en) * 2009-02-26 2010-08-26 International Business Machines Corporation Method and apparatus for facilitating communication between virtual machines

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10437523B2 (en) 2016-02-25 2019-10-08 Red Hat Israel, Ltd. Secure receive packet processing for network function virtualization applications
WO2017155545A1 (fr) * 2016-03-11 2017-09-14 Tektronix Texas, Llc. Données d'horodatage reçues par un système de surveillance dans une nfv
US10908941B2 (en) 2016-03-11 2021-02-02 Tektronix Texas, Llc Timestamping data received by monitoring system in NFV
EP3554025A4 (fr) * 2016-12-27 2019-11-20 Huawei Technologies Co., Ltd. Procédé de transmission de paquet, et hôte physique
CN108243118A (zh) * 2016-12-27 2018-07-03 华为技术有限公司 转发报文的方法和物理主机
US20200004572A1 (en) * 2018-06-28 2020-01-02 Cable Television Laboratories, Inc Systems and methods for secure network management of virtual network functions
US11822946B2 (en) * 2018-06-28 2023-11-21 Cable Television Laboratories, Inc. Systems and methods for secure network management of virtual network functions
CN111143199A (zh) * 2019-12-11 2020-05-12 烽火通信科技股份有限公司 一种云平台中检测dpdk应用程序内存越界访问的方法
CN111143199B (zh) * 2019-12-11 2022-08-05 烽火通信科技股份有限公司 一种云平台中检测dpdk应用程序内存越界访问的方法
US20210357242A1 (en) * 2020-05-18 2021-11-18 Dell Products, Lp System and method for hardware offloading of nested virtual switches
US11740919B2 (en) * 2020-05-18 2023-08-29 Dell Products L.P. System and method for hardware offloading of nested virtual switches
CN114416292A (zh) * 2021-12-31 2022-04-29 北京字节跳动网络技术有限公司 定位设备串口的虚拟化方法、设备、装置、介质及产品
CN114416292B (zh) * 2021-12-31 2024-05-28 北京字节跳动网络技术有限公司 定位设备串口的虚拟化方法、设备、装置、介质及产品

Similar Documents

Publication Publication Date Title
US10579426B2 (en) Partitioning processes across clusters by process type to optimize use of cluster specific configurations
WO2015121750A1 (fr) Système et procédé de communication de données entre des interfaces virtuelles
CN108964959B (zh) 一种用于虚拟化平台的网卡直通系统及数据包监管方法
US9031081B2 (en) Method and system for switching in a virtualized platform
US20180109471A1 (en) Generalized packet processing offload in a datacenter
US10237354B2 (en) Technologies for offloading a virtual service endpoint to a network interface card
US9176767B2 (en) Network interface card device pass-through with multiple nested hypervisors
US8761187B2 (en) System and method for an in-server virtual switch
US8776090B2 (en) Method and system for network abstraction and virtualization for a single operating system (OS)
US10693801B2 (en) Packet drop reduction in virtual machine migration
WO2019195003A1 (fr) Commutation rdma virtuelle pour applications conteneurisées
EP3472988A1 (fr) Fourniture de services de plan de données destinée à des applications
WO2018111987A1 (fr) Serveur reconfigurable
US20150355946A1 (en) “Systems of System” and method for Virtualization and Cloud Computing System
KR101636308B1 (ko) 전기 통신 네트워크 애플리케이션들을 위한 코어 추상화 계층
US20140219287A1 (en) Virtual switching based flow control
Ram et al. {Hyper-Switch}: A Scalable Software Virtual Switching Architecture
US20150319250A1 (en) Technologies for accelerating network virtualization
US8914803B2 (en) Flow control-based virtual machine request queuing
US10541842B2 (en) Methods and apparatus for enhancing virtual switch capabilities in a direct-access configured network interface card
Wang et al. vSocket: virtual socket interface for RDMA in public clouds
US20220276809A1 (en) Interface between control planes
Schultz et al. Performance analysis of packet capture methods in a 10 gbps virtualized environment
Suzuki et al. Device-level IoT with virtual I/O device interconnection
US20230319133A1 (en) Network interface device to select a target service and boot an application

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15720776

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15720776

Country of ref document: EP

Kind code of ref document: A1