WO2011072423A1 - Cooperated interrupt moderation for a virtualization environment - Google Patents

Cooperated interrupt moderation for a virtualization environment Download PDF

Info

Publication number
WO2011072423A1
WO2011072423A1 PCT/CN2009/001480 CN2009001480W WO2011072423A1 WO 2011072423 A1 WO2011072423 A1 WO 2011072423A1 CN 2009001480 W CN2009001480 W CN 2009001480W WO 2011072423 A1 WO2011072423 A1 WO 2011072423A1
Authority
WO
WIPO (PCT)
Prior art keywords
interrupt
overflow
guest
latency
host
Prior art date
Application number
PCT/CN2009/001480
Other languages
French (fr)
Inventor
Yaozu Dong
Yunhong Jiang
Kun TIAN
Original Assignee
Intel Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Corporation filed Critical Intel Corporation
Priority to US13/516,149 priority Critical patent/US9176770B2/en
Priority to EP16182820.7A priority patent/EP3115894A1/en
Priority to EP09852161.0A priority patent/EP2513792B1/en
Priority to PCT/CN2009/001480 priority patent/WO2011072423A1/en
Publication of WO2011072423A1 publication Critical patent/WO2011072423A1/en
Priority to US14/930,413 priority patent/US9921868B2/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/20Handling requests for interconnection or transfer for access to input/output bus
    • G06F13/24Handling requests for interconnection or transfer for access to input/output bus using interrupt
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45545Guest-host, i.e. hypervisor is an application program itself, e.g. VirtualBox
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4812Task transfer initiation or dispatching by interrupt, e.g. masked
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5077Logical partitioning of resources; Management or configuration of virtualized resources
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/30Definitions, standards or architectural aspects of layered protocol stacks
    • H04L69/32Architecture of open systems interconnection [OSI] 7-layer type protocol stacks, e.g. the interfaces between the data link level and the physical level
    • H04L69/321Interlayer communication protocols or service data unit [SDU] definitions; Interfaces between layers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45595Network integration; Enabling network access in virtual machine instances
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2213/00Indexing scheme relating to interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F2213/24Interrupt
    • G06F2213/2408Reducing the frequency of interrupts generated from peripheral to a CPU

Definitions

  • the disclosure relates to interrupt moderation in a virtualization environment.
  • a processor and/or a device and/or interface may be shared by a plurality of guests.
  • a single physical processor may be configured as a plurality of virtual CPUs. Each virtual CPU may be configured to share the physical processor resources with other virtual CPU(s).
  • a guest may include a virtual CPU and a guest process configured to execute on the virtual CPU.
  • a single physical device and/or interface may be configured as a plurality of virtual devices and/or interfaces, with each virtual device and/or interface configured to share the resources of the device and/or interface with other virtual device(s) and/or interface(s).
  • One technique for sharing resources includes sharing processor resources in "time slices".
  • active means executing on at least a portion of a processor.
  • a virtual machine monitor may be configured to manage scheduling guest access to the processor.
  • the VMM may schedule a guest in (“active") or schedule a guest out (“inactive") based on time, e.g., at particular time intervals.
  • the VMM may also be configured to schedule a guest in or out in response to an "event", e.g., an interrupt from the device.
  • Interrupt frequency may depend on the type of device and/or interface and/or the number of virtual devices and/or interfaces active on the device and/or interface.
  • a network adapter may be configured to interrupt a host to indicate that packets have been received.
  • a network adapter may interrupt relatively frequently.
  • Figure 1 illustrates one exemplary system embodiment consistent with the present disclosure
  • Figure 2 illustrates a functional block diagram of two exemplary embodiments of interrupt moderation and interrupt moderation circuitry consistent with the present disclosure
  • FIGS. 3 A and 3B illustrate two flowcharts of exemplary operations consistent with the present disclosure.
  • FIGS. 4A and 4B illustrate two more flowcharts of exemplary operations consistent with the present disclosure
  • An overflow interval is defined.
  • the overflow interrupt interval is used to trigger activation of an inactive guest so that the guest may respond to a critical event.
  • the overflow interrupt interval may be used to prevent receive buffer overflow, when a guest is or has/been inactive on a processor.
  • the guest, including a network application may be active for a first time interval and inactive for a second time interval.
  • a latency interrupt interval may be defined. The latency interrupt interval is configured for interrupt moderation when the network application associated with a packet flow is active, i.e., when the guest including the network application is active on a processor.
  • a network adapter may be configured to interrupt a host based on network traffic, e.g., receiving one or more packets. Typically, packets are received in bursts. In order to reduce the number of interrupts, interrupts may be moderated. For example, the network adapter may be configured to send an interrupt to the host if a time period corresponding to the latency interrupt interval has passed since a prior interrupt and a packet in an identified packet flow has been received ("event"). In another example, a device driver in the host may be configured to delay processing received packets for a time interval. In both examples, a plurality of packets associated with the identified packet flow may be received during the time interval. The plurality of packets may then be processed by the device driver in the host.
  • interrupt moderation using only the latency interrupt interval may be inadequate.
  • a guest associated with an identified packet flow may or may not be active when packet(s) corresponding to the identified flow are received.
  • guest includes a virtual CPU and an associated guest process configured to execute on the virtual CPU.
  • time slice or “scheduler tick” at which the VMM is configured to schedule guests in and out may be longer than the latency interrupt interval. If the guest associated with the packet flow is inactive, the VMM may schedule the guest in response to the interrupt.
  • the VMM shares processor resources with the guest(s). If there are a relatively large number of interrupts, the VMM may consume a significant portion of processor resources handling the interrupts.
  • a receive buffer may overflow.
  • Embodiments consistent with the present disclosure are configured to provide interrupts at the latency interrupt interval when packet(s) are received and an associated guest is active.
  • Embodiments are further configured to trigger activation of an inactive guest so that the guest may respond to a critical event. For example, an interrupt at the overflow interrupt interval may be used to prevent receive buffer overflow, when the guest is or has/been inactive.
  • System 100 of this embodiment generally includes a host system 102 and a network adapter 104 in communication with the host system 102.
  • the host system 102 of this embodiment includes a host processor 106 and system memory 108.
  • the host processor 106 may include at least one core processing unit (hereinafter "core"), generally labeled CPU 1 ,..., CPU z.
  • a core may host one or more virtual processing units, e.g., VCPU A and VCPU B.
  • the virtual CPUs may share the core in time slices.
  • System memory 108 may host virtual machine monitor (VMM) 1 10, operating system code 1 13 (e.g., OS kernel code) and network adapter device driver code 1 12.
  • the VMM 1 10 may include the OS kernel code 1 13.
  • Network adapter device driver code 1 12 may be included in the VMM 1 10 and/or the OS kernel code 113.
  • the OS kernel code 1 13 and the VMM 1 10 may be combined.
  • VMM may be implemented in circuitry, for example, in processor 106.
  • System memory may be configured to host at least one guest process.
  • Each guest process 1 1 1 1 A, B, ... , n may include a guest device driver 1 17A, B, ... , n, a guest operating system (Guest OS) 1 15A, B,..., n, and a plurality of applications.
  • Device driver 1 12 and/or guest device drivers 1 17A, B,..., n when executed, are configured to communicate with the network adapter 104, as will be explained in greater detail below.
  • a device e.g., network adapter 104
  • device driver 1 12 may not be present and/or may not be utilized. Instead, guest device drivers 1 17A,..., n may communicate with the network adapter 104.
  • a guest including a VCPU and an associated guest process may be executed in a core of processor 106 when the guest is scheduled in.
  • a guest is active when it is scheduled in and inactive when it is scheduled out.
  • VCPU A of Guest A or VCPU B of Guest B may be scheduled on CPU 1 , meaning that CPU 1 has the primary responsibility for executing instructions and exchanging commands and data related to the guest operating system, guest device driver and applications associated with Guest A and Guest B.
  • Guest A and Guest B may share CPU 1 using, for example, different time slices.
  • At least one application associated with each guest process 1 1 1 A, B, ..., n running in system memory 108 may include a "network application,” meaning that such an application involves receiving and/or sending packets from/to the network adaptor 104.
  • network application meaning that such an application involves receiving and/or sending packets from/to the network adaptor 104.
  • other system applications, including non-network applications may be running in system memory 108.
  • Virtual machine monitor 1 10 is configured to manage sharing the host processor 106 among the plurality of guest processes residing in system memory 108. Specifically, VMM 1 10 is configured to schedule a guest, including a guest process and virtual CPU, in a core for processing. Scheduling a guest may occur, for example, upon system
  • VMM 1 10 may be configured to activate ("schedule in") a guest at a time interval.
  • a guest e.g., Guest A
  • an interrupt may be received that network traffic, e.g., received packets, is available for processing by a network application and/or guest device driver of Guest A.
  • the VMM 1 10 may activate Guest A in response to the interrupt.
  • the VMM 1 10 may be configured to manage a state, i.e., active or inactive, of each guest. In this manner, the VMM 1 10 may manage scheduling associated with sharing one or more cores between a plurality of guests.
  • Network adapter 104 may comprise a network interface card (NIC) 1 14 that generally includes media access control (MAC) circuitry 1 16 and physical interface (PHY) circuitry 1 18.
  • MAC circuitry 1 16 may be configured to assemble data to be transmitted into packets, that include destination and source addresses along with network control information and error detection hash values.
  • PHY circuitry 1 18 may include encoding and decoding circuitry (not shown) to encode and decode data packets.
  • NIC 1 14 may be coupled to a medium to receive one or more packet flows, as indicated by packet flow 130.
  • NIC 114 may also include a plurality of receive queues, labeled Queue A, Queue B,..., Queue n. Receive queues Queue A, Queue B,..., Queue n are configured to reference packets associated with a particular application received by the NIC 114 (via incoming packet flow 130).
  • Network traffic associated with a packet flow may be identified based on one or more fields in each packet in the packet flow.
  • the packet flow ID for a TCP packet may include a sequence of source IP address, destination IP address, source port number, and destination port number, L2/L4 data, etc., any of which can be utilized to ID the packet flow.
  • other packet protocols may be identified, e.g., using UDP packet information.
  • An identified packet flow may be associated with a network application in a guest. When a packet associated with the packet flow ID is received, the guest may be active (corresponding to executing on at least a portion of processor 106) or the guest may be inactive.
  • a network adapter 104 consistent with the present disclosure may include a physical interface PI 126 and a plurality of virtual interfaces Vis 124 A,..., n.
  • the physical interface 126 is configured to manage physical resources of the network adapter 104 and may be configured to communicate with device driver 1 12 associated with host OS 1 13 and VMM 1 10.
  • the physical interface 126 may be configured to manage sharing the network adapter between the virtual interfaces 124A, B n.
  • Each virtual interface 124 A, B ..., n may include interrupt moderation circuitry 120A,...,n.
  • Interrupt moderation circuitry 120A,..., n may be configured to moderate interrupts for packet flows associated with each virtual interface 124 A, B ... , n.
  • a guest in the host and a virtual function and queue in the network adapter may be configured as a "complete" system.
  • the guest and virtual function are sharing physical resources, it may appear to the guest and virtual function that each "owns" its respective physical resource.
  • the scheduling and processing associated with sharing the physical resources may generally be performed by the VMM 1 10 and PI 126.
  • the VMM 1 10 may be configured to manage the PI 126.
  • Figure 2 is a functional block diagram 200 illustrating two exemplary
  • interrupt moderation circuitry 120A For simplicity, in Figure 2, the designators, i.e., A, B,..., n have been omitted.
  • the functional block diagram 200 applies to any one or more of the virtual functions 120A,..., n in Figure 1.
  • certain portions of the system 100 depicted in Figure 1 have been omitted for clarity (for example, CPU 106, system memory 108, MAC circuitry 1 16 and PHY circuitry 1 18), but it is to be understood that like parts of Figure 2 can be implemented in a manner consistent with an embodiment depicted in Figure 1 , or alternatively in other system implementations, without departing from this embodiment.
  • a first exemplary embodiment includes an overflow control register 230 while a second exemplary embodiment does not include the overflow control register 230.
  • Both embodiments include a latency interrupt register "latency ITR" 202, an overflow interrupt register “overflow ITR” 212, control circuitry 220 and an event flag(s) register 222.
  • the latency ITR 202 may include a latency counter 204 and a latency interrupt interval 206.
  • the overflow ITR 212 may include an overflow counter 214 and an overflow interrupt interval 216.
  • Latency ITR 202 is configured to facilitate interrupt moderation at a latency interrupt interval.
  • Overflow ITR 212 is configured to facilitate providing an interrupt at an overflow interrupt interval for the virtual function associated with interrupt moderation circuitry 120 and the virtual function's associated guest.
  • the latency interrupt interval 206 may be determined based on interrupt moderation in a native environment.
  • the latency interrupt interval 206 is configured for a guest that is active.
  • the overflow interrupt interval 216 is configured to trigger activation of an associated guest so that the associated guest may respond to a critical event.
  • the overflow interrupt interval may be used to prevent receive buffer overflow, when the associated guest is or has/been inactive.
  • the overflow interrupt interval 216 may be determined based, at least in part, on a size of the receive buffer and a speed of the network adapter. If network traffic is received, destined for the associated guest, and the associated guest is not active, the received packets may be placed in the associated guest's receive buffer by direct memory access.
  • the interrupt moderation circuitry 120 is configured to cause an interrupt to the VMM at the expiration of the overflow interrupt interval. This interrupt is configured to result in the VMM scheduling in the associated guest and removal of the packets from the receive buffer for processing.
  • latency interrupt interval 206 and overflow interrupt interval 216 may be configured to store an interval count corresponding to a time duration of a latency interrupt interval and a time duration of an overflow interrupt interval, respectively.
  • Each respective counter 204, 214 may then be configured to count down from latency interrupt interval 206 and overflow interrupt interval 216, respectively, to zero.
  • Control circuitry 220 may be configured to determine whether an event flag in the event flag(s) register 222 indicates that a packet associated with a packet flow ID has been received. If such a packet has been received, control circuitry 220 is configured to generate interrupt 224 to VMM 1 10. VMM 1 10 and/or device driver 1 12 may then send an interrupt to guest device driver 1 17.
  • the latency counter 204 and overflow counter 214 are configured to be reset (and counting commenced) by guest device driver 1 17 and/or control circuitry 220, as will be described in more detail below.
  • each counter 204, 214 may be reset when an interrupt is triggered.
  • latency counter 204 may be reset by guest device driver based, at least in part, on packet processing in guest.
  • counters 204, 214 and interrupt intervals 206, 216 have been described above as count-down counters and counting intervals, respectively, other configurations are possible.
  • counters 204, 214 may count up to interrupt intervals 206, 216, respectively.
  • counters 204, 214 may correspond to timers and interrupt intervals 206, 216 may correspond to time out intervals.
  • Control circuitry 220 is configured to receive and/or to receive an indication of incoming packet flow(s) 130.
  • Event flag(s) register 222 is configured to store an event flag associated with a respective packet flow destined for an application in an associated guest.
  • Control circuitry 220 is configured to set an event flag in the event flag(s) register 222 indicating that a packet corresponding to a packet flow ID has been received. For example, control circuitry 220 may set the associated event flag when a first packet is received corresponding to an associated packet flow ID. "First packet" means the first packet received following an associated interrupt. The event flag may be cleared when an interrupt is triggered.
  • Control circuitry 220 may be configured to generate an interrupt 224 to VMM 1 10 if a latency interrupt interval expires and/or an overflow interrupt interval expires, and the event flag indicates that an associated packet has been received. If the associated guest is active, VMM 1 10 may forward the interrupt to the associated guest device driver 1 17. In an embodiment, control circuitry 220 may be configured to generate an interrupt 224 to associated guest device driver 1 17 if a latency interrupt interval expires and to VMM 1 10 if an overflow interrupt interval expires, and the event flag indicates that an associated packet has been received. In this embodiment, an interrupt vector associated with the interrupt may indicate (identify) the associated guest driver.
  • control circuitry 220 may be configured to reset latency counter 204 and/or overflow counter 206.
  • guest device driver 1 17 may be configured to reset latency counter 204 and/or overflow counter 206 and/or event flag(s).
  • overflow control 230 may be configured to indicate whether a guest is active or inactive.
  • VMM 1 10 and/or device driver 1 12 may be configured to set and/or reset a guest state indicator in overflow control register 230 when VMM 1 10 schedules guest in (active) or out (inactive).
  • the associated guest device driver 1 17 may be configured to set the guest state indicator when guest becomes active.
  • the guest state indicator may be set and/or reset using a memory mapped input/output ("MMIO") operation.
  • MMIO memory mapped input/output
  • interrupt moderation circuitry 120 is configured to generate an interrupt to its associated guest device driver 1 17 and/or to VMM 1 10 when latency interrupt interval and/or overflow interrupt interval expires and a packet corresponding to an associated packet flow ID has been received.
  • Exemplary Methodology Figures 3A and 3B illustrate flowcharts 300, 350 of exemplary operations consistent with the present disclosure.
  • the operations illustrated in this embodiment may be performed by circuitry and/or software modules associated with a network adaptor (e.g., adapter 104 depicted in Fig. 1), or such operations may be performed by circuitry and/or software modules associated with a host system (or other components, e.g., Guest/VCPU), or a combination thereof.
  • a network adaptor e.g., adapter 104 depicted in Fig. 1
  • a host system or other components, e.g., Guest/VCPU
  • operations of this embodiment may be performed by network adapter 104, e.g., by interrupt moderation circuitry 120.
  • operations of this embodiment may be performed by interrupt moderation circuitry 120A in virtual function 124 A for associated guest 1 1 1A.
  • Operations of this embodiment may begin at start 305.
  • whether a packet has been received may be determined.
  • a packet associated with a packet flow may be received by virtual function 124A.
  • An event flag in event flag(s) register 222 of interrupt moderation circuitry 120 may be set.
  • Operation 310 may read the event flag to determine whether a packet has been received. If a packet has not been received, e.g., event flag is not set, program flow may pause at operation 310 until a packet has been received.
  • an interrupt interval may be determined at operation 315.
  • the interrupt interval may be the latency interrupt interval or the overflow interrupt interval. If an interrupt interval has not expired, program flow may pause at operation 315 until an interrupt interval expires. If an interrupt interval has expired, an interrupt may be triggered 320.
  • overflow control register 230 may be queried to determine the state of the associated guest. If the associated guest is active, latency counter 204 may be reset at operation 330 and may begin counting corresponding to starting a latency interrupt interval. If the associated guest is not active, overflow counter 214 may be reset at operation 335 and may begin counting corresponding to starting an overflow interrupt interval.
  • These exemplary operations are configured to trigger an interrupt at the overflow interrupt interval if the guest associated with a packet flow ID is inactive and an associated packet is received or to trigger an interrupt at the latency interrupt interval if the guest is active and an associated packet is received.
  • operations of this embodiment may be performed, for example, by a VMM in the host system and/or an associated guest device driver.
  • the VMM may be implemented in circuitry and/or software. Operations of this embodiment may begin at start 345.
  • the state of an associated guest may be changed. For example, an active guest may be scheduled out or the inactive guest may be scheduled in.
  • the state of the guest may be changed (i.e., scheduled) based on a timer.
  • the state of the guest may be changed based on an event, e.g., an interrupt to the VMM.
  • the overflow control register 230 may be updated.
  • the overflow control register 230 in the interrupt moderation circuitry 120 is configured to indicate the guest state to control circuitry 220 to support resetting and starting the appropriate interval counter.
  • the overflow control register 230 may be updated by the VMM 1 10 and/or a guest device driver, when a guest is scheduled in.
  • Whether an interrupt from a device, e.g., network adapter 104, has been received may then be determined 360. If such an interrupt has not been received, program flow may pause at operation 360 until an interrupt is received. If an interrupt is received, the VMM may provide a virtual interrupt to the associated guest, so that received packets associated with the interrupt may be processed by, e.g., the associated guest device driver and/or network application running in the associated guest, if the guest is active. If the guest is not active when the interrupt is received, the guest may be scheduled in by the VMM.
  • Whether to change the guest state may be determined at operation 370. If the guest state is to be changed, program flow may proceed to operation 350. If the guest state is not to be changed, program flow may proceed to operation 360 to determine whether an interrupt from a device has been received.
  • FIGS. 4A and 4B illustrate flowcharts 400, 450 of exemplary operations of another embodiment consistent with the present disclosure.
  • the operations illustrated in this embodiment may be performed by circuitry and/or software modules associated with a network adaptor (e.g., adapter 104 depicted in Fig. 1), or such operations may be performed by circuitry and/or software modules associated with a host system (or other components, e.g., Guest/VCPU), or a combination thereof.
  • a network adaptor e.g., adapter 104 depicted in Fig. 1
  • a host system or other components, e.g., Guest/VCPU
  • operations of this embodiment may be performed by network adapter 104, e.g., by interrupt moderation circuitry 120.
  • the latency overflow counter is reset by a guest device driver when the associated guest is active.
  • the guest device driver may also reset the overflow counter.
  • the overflow counter has been reset. Operations according to this embodiment may begin at start 405.
  • Whether a packet has been received may be determined 410.
  • a packet associated with a packet flow may be received by virtual interface 124A.
  • An event flag in event flag(s) register 222 of interrupt moderation circuitry 120 may be set.
  • Operation 410 may read the event flag to determine whether a packet has been received. If a packet has not been received, e.g., event flag is not set, program flow may pause at operation 410 until a packet has been received.
  • interrupt interval may be determined at operation 415. For example, the overflow interrupt interval and/or the latency interrupt interval may be expired. If an interrupt interval has not expired, program flow may pause at operation 415. If an interrupt interval has expired, flow may proceed to operation 415 and an interrupt may be triggered 420. For example, the interrupt may be provided from interrupt moderation circuitry 120 to an associated guest device driver and/or to the VMM 1 10. At operation 425, the overflow counter may be reset, starting an overflow interrupt interval. Flow may then proceed to operation 410.
  • Operation 460 may include determining whether an interrupt from a device, e.g., network adapter 104, has been received. If a interrupt has not been received, program flow may pause at operation 460 until an interrupt is received. If an interrupt has been received, received packets may be processed at operation 465. For example, if the guest associated with the packets is active, the guest device driver and/or a network application may process the received packets. If the guest is inactive, the VMM 1 10 may schedule in the guest to process the packets.
  • a device e.g., network adapter 104
  • Operation 470 may include resetting the latency counter.
  • the guest is active.
  • the guest device driver and/or network application may be configured to reset the latency counter upon completion of packet processing.
  • Operation 475 may be included in some embodiments.
  • Operation 475 includes resetting the overflow counter. The overflow counter may be reset at the completion of packet processing, similar to resetting the latency counter. Program flow may then proceed to operation 460.
  • the embodiments illustrated in Figures 4A and 4B are configured to provide interrupt moderation at the latency interrupt interval when a guest is active and to provide an interrupt at the overflow interrupt interval, e.g., to prevent receive buffer overflow.
  • the embodiments illustrated in Figures 4A and 4B do not include an explicit guest state register. Rather a guest device driver may be configured to reset the latency counter when it completes packet processing, thereby implicitly "informing" a network adapter that the guest associated with a packet flow is active.
  • operating system 1 13, VMM 1 10 and/or guest operating system(s) 1 15A,..., n may manage system resources and control tasks that are run on system 102.
  • guest OS 1 15 A,..., n may be implemented using Microsoft Windows, HP-UX, Linux, or UNIX, although other operating systems may be used.
  • the ndis.sys driver may be utilized at least by guest device driver 1 17A,..., n and an intermediate driver (not shown).
  • the ndis.sys driver may be utilized to define application programming interfaces (APIs) that can be used for transferring packets between layers.
  • APIs application programming interfaces
  • Guest operating system 1 15A,..., n may implement one or more protocol stacks (not shown).
  • a protocol stack may execute one or more programs to process packets.
  • An example of a protocol stack is a TCP/IP (Transport Control Protocol/Internet Protocol) protocol stack comprising one or more programs for handling (e.g., processing or generating) packets to transmit and/or receive over a network.
  • a protocol stack may alternatively be comprised on a dedicated sub-system such as, for example, a TCP offload engine.
  • memory 108 and/or memory associated with the network adaptor 104 may comprise one or more of the following types of memory: semiconductor firmware memory, programmable memory, non-volatile memory, read only memory, electrically programmable memory, random access memory, flash memory, magnetic disk memory, and/or optical disk memory.
  • memory 108 and/or memory associated with the network adaptor 104 may comprise other and/or later-developed types of computer-readable memory.
  • Embodiments of the methods described herein may be implemented in a system that includes one or more storage mediums having stored thereon, individually or in combination, instructions that when executed by one or more processors perform the methods.
  • the processor may include, for example, a system CPU (e.g., core processor of Fig. 1) and/or programmable circuitry such as the MAC circuitry.
  • a system CPU e.g., core processor of Fig. 1
  • programmable circuitry such as the MAC circuitry
  • the storage medium may include any type of tangible medium, for example, any type of disk including floppy disks, optical disks, compact disk read-only memories (CD-ROMs), compact disk rewritables (CD-RWs), and magneto-optical disks, semiconductor devices such as read-only memories (ROMs), random access memories (RAMs) such as dynamic and static RAMs, erasable programmable read-only memories (EPROMs), electrically erasable programmable read-only memories (EEPROMs), flash memories, magnetic or optical cards, or any type of media suitable for storing electronic instructions.
  • ROMs read-only memories
  • RAMs random access memories
  • EPROMs erasable programmable read-only memories
  • EEPROMs electrically erasable programmable read-only memories
  • flash memories magnetic or optical cards, or any type of media suitable for storing electronic instructions.
  • Ethernet communications protocol may be capable permitting communication using a Transmission Control Protocol/Internet Protocol
  • Ethernet protocol may comply or be compatible with the Ethernet standard published by the Institute of Electrical and Electronics Engineers (IEEE) titled “IEEE 802.3 Standard", published in March, 2002 and/or later versions of this standard.
  • IEEE Institute of Electrical and Electronics Engineers
  • a "PHY” may be defined as an object and/or circuitry used to interface to one or more devices, and such object and/or circuitry may be defined by one or more of the communication protocols set forth herein.
  • the PHY may comprise a physical PHY comprising transceiver circuitry to interface to the applicable
  • the PHY may alternately and/or additionally comprise a virtual PHY to interface to another virtual PHY or to a physical PHY.
  • PHY circuitry 224 may comply or be compatible with, the aforementioned IEEE 802.3 Ethernet communications protocol, which may include, for example, 100BASE-TX, 100BASE-T, 10GBASE-T, 10GBASE-KR, 10GBASE-KX4/XAUI, 40GbE and or lOOGbE compliant PHY circuitry, and/or PHY circuitry that is compliant with an after-developed communications protocol.
  • Circuitry may comprise, for example, singly or in any combination, hardwired circuitry, programmable circuitry, state machine circuitry, and/or firmware that stores instructions executed by programmable circuitry.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Bus Control (AREA)
  • Stored Programmes (AREA)

Abstract

Generally, this disclosure describes systems (and methods) for moderating interrupts in a virtualization environment. An overflow interrupt interval is defined. The overflow interrupt interval is used for triggering activation of an inactive guest so that the guest may respond to a critical event. The guest, including a network application, may be active for a first time interval and inactive for a second time interval. A latency interrupt interval may be defined. The latency interrupt interval is configured for interrupt moderation when the network application associated with a packet flow is active, i.e., when the guest including the network application is active on a processor. Of course, many alternatives, variations, and modifications are possible without departing from this embodiment.

Description

COOPERATED INTERRUPT MODERATION FOR A VIRTUALIZATION
ENVIRONMENT
FIELD
The disclosure relates to interrupt moderation in a virtualization environment.
BACKGROUND
In a virtualization environment, a processor and/or a device and/or interface may be shared by a plurality of guests. A single physical processor may be configured as a plurality of virtual CPUs. Each virtual CPU may be configured to share the physical processor resources with other virtual CPU(s). A guest may include a virtual CPU and a guest process configured to execute on the virtual CPU. Similarly, a single physical device and/or interface may be configured as a plurality of virtual devices and/or interfaces, with each virtual device and/or interface configured to share the resources of the device and/or interface with other virtual device(s) and/or interface(s).
One technique for sharing resources includes sharing processor resources in "time slices". In other words, for a plurality of guests, a subset of the plurality may be active at any point in time. As used herein, "active" means executing on at least a portion of a processor. A virtual machine monitor ("VMM") may be configured to manage scheduling guest access to the processor. The VMM may schedule a guest in ("active") or schedule a guest out ("inactive") based on time, e.g., at particular time intervals. The VMM may also be configured to schedule a guest in or out in response to an "event", e.g., an interrupt from the device.
Interrupt frequency may depend on the type of device and/or interface and/or the number of virtual devices and/or interfaces active on the device and/or interface. For example, a network adapter may be configured to interrupt a host to indicate that packets have been received. Depending on the speed of the adapter and/or the number of active virtual devices and/or interfaces and network traffic, i.e., frequency at which packets are received, a network adapter may interrupt relatively frequently. BRIEF DESCRIPTION OF THE DRAWINGS
Features and advantages of embodiments of the claimed subject matter will become apparent as the following Detailed Description proceeds, and upon reference to the Drawings, wherein like numerals depict like parts, and in which:
Figure 1 illustrates one exemplary system embodiment consistent with the present disclosure;
Figure 2 illustrates a functional block diagram of two exemplary embodiments of interrupt moderation and interrupt moderation circuitry consistent with the present disclosure;
Figures 3 A and 3B illustrate two flowcharts of exemplary operations consistent with the present disclosure; and
Figures 4A and 4B illustrate two more flowcharts of exemplary operations consistent with the present disclosure;
Although the following Detailed Description will proceed with reference being made to illustrative embodiments, many alternatives, modifications, and variations thereof will be apparent to those skilled in the art.
DETAILED DESCRIPTION
Generally, this disclosure describes systems (and methods) of moderating interrupts in a virtualization environment. An overflow interval is defined. The overflow interrupt interval is used to trigger activation of an inactive guest so that the guest may respond to a critical event. For example, the overflow interrupt interval may be used to prevent receive buffer overflow, when a guest is or has/been inactive on a processor. The guest, including a network application, may be active for a first time interval and inactive for a second time interval. A latency interrupt interval may be defined. The latency interrupt interval is configured for interrupt moderation when the network application associated with a packet flow is active, i.e., when the guest including the network application is active on a processor.
A network adapter may be configured to interrupt a host based on network traffic, e.g., receiving one or more packets. Typically, packets are received in bursts. In order to reduce the number of interrupts, interrupts may be moderated. For example, the network adapter may be configured to send an interrupt to the host if a time period corresponding to the latency interrupt interval has passed since a prior interrupt and a packet in an identified packet flow has been received ("event"). In another example, a device driver in the host may be configured to delay processing received packets for a time interval. In both examples, a plurality of packets associated with the identified packet flow may be received during the time interval. The plurality of packets may then be processed by the device driver in the host.
In a virtualization environment, interrupt moderation using only the latency interrupt interval may be inadequate. For example, a guest associated with an identified packet flow may or may not be active when packet(s) corresponding to the identified flow are received. As used herein, "guest" includes a virtual CPU and an associated guest process configured to execute on the virtual CPU. Further, the "time slice" or "scheduler tick" at which the VMM is configured to schedule guests in and out may be longer than the latency interrupt interval. If the guest associated with the packet flow is inactive, the VMM may schedule the guest in response to the interrupt. The VMM shares processor resources with the guest(s). If there are a relatively large number of interrupts, the VMM may consume a significant portion of processor resources handling the interrupts. If the device is configured to reduce its interrupt frequency and/or interrupts are disabled, and packets are received, a receive buffer may overflow. Embodiments consistent with the present disclosure are configured to provide interrupts at the latency interrupt interval when packet(s) are received and an associated guest is active. Embodiments are further configured to trigger activation of an inactive guest so that the guest may respond to a critical event. For example, an interrupt at the overflow interrupt interval may be used to prevent receive buffer overflow, when the guest is or has/been inactive.
System Architecture
Figure 1 illustrates one exemplary system embodiment consistent with the present disclosure. System 100 of this embodiment generally includes a host system 102 and a network adapter 104 in communication with the host system 102. The host system 102 of this embodiment includes a host processor 106 and system memory 108. The host processor 106 may include at least one core processing unit (hereinafter "core"), generally labeled CPU 1 ,..., CPU z. A core may host one or more virtual processing units, e.g., VCPU A and VCPU B. In this example, the virtual CPUs may share the core in time slices.
System memory 108 may host virtual machine monitor (VMM) 1 10, operating system code 1 13 (e.g., OS kernel code) and network adapter device driver code 1 12. The VMM 1 10 may include the OS kernel code 1 13. Network adapter device driver code 1 12 may be included in the VMM 1 10 and/or the OS kernel code 113. In some embodiments, the OS kernel code 1 13 and the VMM 1 10 may be combined. In some configurations, VMM may be implemented in circuitry, for example, in processor 106.
System memory may be configured to host at least one guest process. Each guest process 1 1 1 A, B, ... , n may include a guest device driver 1 17A, B, ... , n, a guest operating system (Guest OS) 1 15A, B,..., n, and a plurality of applications. Device driver 1 12 and/or guest device drivers 1 17A, B,..., n, when executed, are configured to communicate with the network adapter 104, as will be explained in greater detail below.
In some embodiments, a device, e.g., network adapter 104, may be dedicated to, i.e., assigned to, one guest. In this embodiment, device driver 1 12 may not be present and/or may not be utilized. Instead, guest device drivers 1 17A,..., n may communicate with the network adapter 104.
A guest including a VCPU and an associated guest process may be executed in a core of processor 106 when the guest is scheduled in. In other words, a guest is active when it is scheduled in and inactive when it is scheduled out. For example, as depicted in Figure 1 , VCPU A of Guest A or VCPU B of Guest B may be scheduled on CPU 1 , meaning that CPU 1 has the primary responsibility for executing instructions and exchanging commands and data related to the guest operating system, guest device driver and applications associated with Guest A and Guest B. In other words, Guest A and Guest B may share CPU 1 using, for example, different time slices. It should be noted at the outset that at least one application associated with each guest process 1 1 1 A, B, ..., n running in system memory 108 may include a "network application," meaning that such an application involves receiving and/or sending packets from/to the network adaptor 104. Of course, other system applications, including non-network applications, may be running in system memory 108.
Virtual machine monitor 1 10 is configured to manage sharing the host processor 106 among the plurality of guest processes residing in system memory 108. Specifically, VMM 1 10 is configured to schedule a guest, including a guest process and virtual CPU, in a core for processing. Scheduling a guest may occur, for example, upon system
initialization and may also be performed dynamically during operation of the system 100. For example, VMM 1 10 may be configured to activate ("schedule in") a guest at a time interval. In another example, a guest, e.g., Guest A, may be inactive and an interrupt may be received that network traffic, e.g., received packets, is available for processing by a network application and/or guest device driver of Guest A. The VMM 1 10 may activate Guest A in response to the interrupt. The VMM 1 10 may be configured to manage a state, i.e., active or inactive, of each guest. In this manner, the VMM 1 10 may manage scheduling associated with sharing one or more cores between a plurality of guests.
Network adapter 104 may comprise a network interface card (NIC) 1 14 that generally includes media access control (MAC) circuitry 1 16 and physical interface (PHY) circuitry 1 18. MAC circuitry 1 16 may be configured to assemble data to be transmitted into packets, that include destination and source addresses along with network control information and error detection hash values. PHY circuitry 1 18 may include encoding and decoding circuitry (not shown) to encode and decode data packets. NIC 1 14 may be coupled to a medium to receive one or more packet flows, as indicated by packet flow 130. NIC 114 may also include a plurality of receive queues, labeled Queue A, Queue B,..., Queue n. Receive queues Queue A, Queue B,..., Queue n are configured to reference packets associated with a particular application received by the NIC 114 (via incoming packet flow 130).
Network traffic associated with a packet flow may be identified based on one or more fields in each packet in the packet flow. For example, the packet flow ID for a TCP packet may include a sequence of source IP address, destination IP address, source port number, and destination port number, L2/L4 data, etc., any of which can be utilized to ID the packet flow. Of course, other packet protocols may be identified, e.g., using UDP packet information. An identified packet flow may be associated with a network application in a guest. When a packet associated with the packet flow ID is received, the guest may be active (corresponding to executing on at least a portion of processor 106) or the guest may be inactive.
A network adapter 104 consistent with the present disclosure may include a physical interface PI 126 and a plurality of virtual interfaces Vis 124 A,..., n. The physical interface 126 is configured to manage physical resources of the network adapter 104 and may be configured to communicate with device driver 1 12 associated with host OS 1 13 and VMM 1 10. The physical interface 126 may be configured to manage sharing the network adapter between the virtual interfaces 124A, B n. Each virtual interface 124 A, B ..., n may include interrupt moderation circuitry 120A,...,n. Interrupt moderation circuitry 120A,..., n may be configured to moderate interrupts for packet flows associated with each virtual interface 124 A, B ... , n.
In a virtualization environment, a guest in the host and a virtual function and queue in the network adapter may be configured as a "complete" system. Although the guest and virtual function are sharing physical resources, it may appear to the guest and virtual function that each "owns" its respective physical resource. The scheduling and processing associated with sharing the physical resources may generally be performed by the VMM 1 10 and PI 126. The VMM 1 10 may be configured to manage the PI 126.
Figure 2 is a functional block diagram 200 illustrating two exemplary
embodiments of interrupt moderation circuitry 120A,..., n. For simplicity, in Figure 2, the designators, i.e., A, B,..., n have been omitted. The functional block diagram 200 applies to any one or more of the virtual functions 120A,..., n in Figure 1. In Figure 2, certain portions of the system 100 depicted in Figure 1 have been omitted for clarity (for example, CPU 106, system memory 108, MAC circuitry 1 16 and PHY circuitry 1 18), but it is to be understood that like parts of Figure 2 can be implemented in a manner consistent with an embodiment depicted in Figure 1 , or alternatively in other system implementations, without departing from this embodiment.
A first exemplary embodiment includes an overflow control register 230 while a second exemplary embodiment does not include the overflow control register 230. Both embodiments include a latency interrupt register "latency ITR" 202, an overflow interrupt register "overflow ITR" 212, control circuitry 220 and an event flag(s) register 222. The latency ITR 202 may include a latency counter 204 and a latency interrupt interval 206. Similarly, the overflow ITR 212 may include an overflow counter 214 and an overflow interrupt interval 216. Latency ITR 202 is configured to facilitate interrupt moderation at a latency interrupt interval. Overflow ITR 212 is configured to facilitate providing an interrupt at an overflow interrupt interval for the virtual function associated with interrupt moderation circuitry 120 and the virtual function's associated guest.
The latency interrupt interval 206 may be determined based on interrupt moderation in a native environment. In other words, the latency interrupt interval 206 is configured for a guest that is active. The overflow interrupt interval 216 is configured to trigger activation of an associated guest so that the associated guest may respond to a critical event. For example, the overflow interrupt interval may be used to prevent receive buffer overflow, when the associated guest is or has/been inactive. For example, the overflow interrupt interval 216 may be determined based, at least in part, on a size of the receive buffer and a speed of the network adapter. If network traffic is received, destined for the associated guest, and the associated guest is not active, the received packets may be placed in the associated guest's receive buffer by direct memory access. If the associated guest is not scheduled in so that the guest device driver and/or network application may process the packets, the interrupt moderation circuitry 120 is configured to cause an interrupt to the VMM at the expiration of the overflow interrupt interval. This interrupt is configured to result in the VMM scheduling in the associated guest and removal of the packets from the receive buffer for processing.
For example, latency interrupt interval 206 and overflow interrupt interval 216 may be configured to store an interval count corresponding to a time duration of a latency interrupt interval and a time duration of an overflow interrupt interval, respectively. Each respective counter 204, 214 may then be configured to count down from latency interrupt interval 206 and overflow interrupt interval 216, respectively, to zero. When latency counter 204 and/or overflow counter 214 reach(es) zero, Control circuitry 220 may be configured to determine whether an event flag in the event flag(s) register 222 indicates that a packet associated with a packet flow ID has been received. If such a packet has been received, control circuitry 220 is configured to generate interrupt 224 to VMM 1 10. VMM 1 10 and/or device driver 1 12 may then send an interrupt to guest device driver 1 17.
The latency counter 204 and overflow counter 214 are configured to be reset (and counting commenced) by guest device driver 1 17 and/or control circuitry 220, as will be described in more detail below. For example, each counter 204, 214 may be reset when an interrupt is triggered. In another example, latency counter 204 may be reset by guest device driver based, at least in part, on packet processing in guest.
Although counters 204, 214 and interrupt intervals 206, 216 have been described above as count-down counters and counting intervals, respectively, other configurations are possible. For example, counters 204, 214 may count up to interrupt intervals 206, 216, respectively. In another example, counters 204, 214 may correspond to timers and interrupt intervals 206, 216 may correspond to time out intervals.
Control circuitry 220 is configured to receive and/or to receive an indication of incoming packet flow(s) 130. Event flag(s) register 222 is configured to store an event flag associated with a respective packet flow destined for an application in an associated guest. Control circuitry 220 is configured to set an event flag in the event flag(s) register 222 indicating that a packet corresponding to a packet flow ID has been received. For example, control circuitry 220 may set the associated event flag when a first packet is received corresponding to an associated packet flow ID. "First packet" means the first packet received following an associated interrupt. The event flag may be cleared when an interrupt is triggered.
Control circuitry 220 may be configured to generate an interrupt 224 to VMM 1 10 if a latency interrupt interval expires and/or an overflow interrupt interval expires, and the event flag indicates that an associated packet has been received. If the associated guest is active, VMM 1 10 may forward the interrupt to the associated guest device driver 1 17. In an embodiment, control circuitry 220 may be configured to generate an interrupt 224 to associated guest device driver 1 17 if a latency interrupt interval expires and to VMM 1 10 if an overflow interrupt interval expires, and the event flag indicates that an associated packet has been received. In this embodiment, an interrupt vector associated with the interrupt may indicate (identify) the associated guest driver. In some embodiments, control circuitry 220 may be configured to reset latency counter 204 and/or overflow counter 206. In some embodiments, guest device driver 1 17 may be configured to reset latency counter 204 and/or overflow counter 206 and/or event flag(s).
In the first exemplary embodiment, overflow control 230 may be configured to indicate whether a guest is active or inactive. VMM 1 10 and/or device driver 1 12 may be configured to set and/or reset a guest state indicator in overflow control register 230 when VMM 1 10 schedules guest in (active) or out (inactive). The associated guest device driver 1 17 may be configured to set the guest state indicator when guest becomes active. For example, the guest state indicator may be set and/or reset using a memory mapped input/output ("MMIO") operation.
As described herein with respect to Figures 1 and 2, interrupt moderation circuitry 120 is configured to generate an interrupt to its associated guest device driver 1 17 and/or to VMM 1 10 when latency interrupt interval and/or overflow interrupt interval expires and a packet corresponding to an associated packet flow ID has been received.
Exemplary Methodology Figures 3A and 3B illustrate flowcharts 300, 350 of exemplary operations consistent with the present disclosure. The operations illustrated in this embodiment may be performed by circuitry and/or software modules associated with a network adaptor (e.g., adapter 104 depicted in Fig. 1), or such operations may be performed by circuitry and/or software modules associated with a host system (or other components, e.g., Guest/VCPU), or a combination thereof.
Turning to Figure 3A, operations of this embodiment may be performed by network adapter 104, e.g., by interrupt moderation circuitry 120. For example, operations of this embodiment may be performed by interrupt moderation circuitry 120A in virtual function 124 A for associated guest 1 1 1A. Operations of this embodiment may begin at start 305. At operation 310 of this embodiment, whether a packet has been received may be determined. For example, a packet associated with a packet flow may be received by virtual function 124A. An event flag in event flag(s) register 222 of interrupt moderation circuitry 120 may be set. Operation 310 may read the event flag to determine whether a packet has been received. If a packet has not been received, e.g., event flag is not set, program flow may pause at operation 310 until a packet has been received.
If a packet has been received, i.e., event flag is set, whether an interrupt interval has expired may be determined at operation 315. The interrupt interval may be the latency interrupt interval or the overflow interrupt interval. If an interrupt interval has not expired, program flow may pause at operation 315 until an interrupt interval expires. If an interrupt interval has expired, an interrupt may be triggered 320.
At operation 325, whether an associated guest, e.g., Guest 1 1 1 A, is active may be determined. For example, overflow control register 230 may be queried to determine the state of the associated guest. If the associated guest is active, latency counter 204 may be reset at operation 330 and may begin counting corresponding to starting a latency interrupt interval. If the associated guest is not active, overflow counter 214 may be reset at operation 335 and may begin counting corresponding to starting an overflow interrupt interval.
These exemplary operations are configured to trigger an interrupt at the overflow interrupt interval if the guest associated with a packet flow ID is inactive and an associated packet is received or to trigger an interrupt at the latency interrupt interval if the guest is active and an associated packet is received.
Turning to Figure 3B, operations of this embodiment may be performed, for example, by a VMM in the host system and/or an associated guest device driver. As described herein, the VMM may be implemented in circuitry and/or software. Operations of this embodiment may begin at start 345. At operation 350 of this embodiment, the state of an associated guest may be changed. For example, an active guest may be scheduled out or the inactive guest may be scheduled in. For example, the state of the guest may be changed (i.e., scheduled) based on a timer. In another example, the state of the guest may be changed based on an event, e.g., an interrupt to the VMM. At operation 355, the overflow control register 230 may be updated. The overflow control register 230 in the interrupt moderation circuitry 120 is configured to indicate the guest state to control circuitry 220 to support resetting and starting the appropriate interval counter. The overflow control register 230 may be updated by the VMM 1 10 and/or a guest device driver, when a guest is scheduled in.
Whether an interrupt from a device, e.g., network adapter 104, has been received may then be determined 360. If such an interrupt has not been received, program flow may pause at operation 360 until an interrupt is received. If an interrupt is received, the VMM may provide a virtual interrupt to the associated guest, so that received packets associated with the interrupt may be processed by, e.g., the associated guest device driver and/or network application running in the associated guest, if the guest is active. If the guest is not active when the interrupt is received, the guest may be scheduled in by the VMM.
Whether to change the guest state may be determined at operation 370. If the guest state is to be changed, program flow may proceed to operation 350. If the guest state is not to be changed, program flow may proceed to operation 360 to determine whether an interrupt from a device has been received.
Figures 4A and 4B illustrate flowcharts 400, 450 of exemplary operations of another embodiment consistent with the present disclosure. The operations illustrated in this embodiment may be performed by circuitry and/or software modules associated with a network adaptor (e.g., adapter 104 depicted in Fig. 1), or such operations may be performed by circuitry and/or software modules associated with a host system (or other components, e.g., Guest/VCPU), or a combination thereof.
Turning to Figure 4A, operations of this embodiment may be performed by network adapter 104, e.g., by interrupt moderation circuitry 120. For this embodiment, it is assumed that the latency overflow counter is reset by a guest device driver when the associated guest is active. The guest device driver may also reset the overflow counter. It is further assumed that the overflow counter has been reset. Operations according to this embodiment may begin at start 405. Whether a packet has been received may be determined 410. For example, a packet associated with a packet flow may be received by virtual interface 124A. An event flag in event flag(s) register 222 of interrupt moderation circuitry 120 may be set. Operation 410 may read the event flag to determine whether a packet has been received. If a packet has not been received, e.g., event flag is not set, program flow may pause at operation 410 until a packet has been received.
If a packet has been received, i.e., event flag is set, whether in interrupt interval has expired may be determined at operation 415. For example, the overflow interrupt interval and/or the latency interrupt interval may be expired. If an interrupt interval has not expired, program flow may pause at operation 415. If an interrupt interval has expired, flow may proceed to operation 415 and an interrupt may be triggered 420. For example, the interrupt may be provided from interrupt moderation circuitry 120 to an associated guest device driver and/or to the VMM 1 10. At operation 425, the overflow counter may be reset, starting an overflow interrupt interval. Flow may then proceed to operation 410.
Turning to Figure 4B, operations of this embodiment may be performed, for example, by a VMM in the host system and/or a guest device driver. Operations according to this embodiment may begin at start 455. Operation 460 may include determining whether an interrupt from a device, e.g., network adapter 104, has been received. If a interrupt has not been received, program flow may pause at operation 460 until an interrupt is received. If an interrupt has been received, received packets may be processed at operation 465. For example, if the guest associated with the packets is active, the guest device driver and/or a network application may process the received packets. If the guest is inactive, the VMM 1 10 may schedule in the guest to process the packets.
Operation 470 may include resetting the latency counter. For operation 470, it is assumed that the guest is active. For example, the guest device driver and/or network application may be configured to reset the latency counter upon completion of packet processing. Operation 475 may be included in some embodiments. Operation 475 includes resetting the overflow counter. The overflow counter may be reset at the completion of packet processing, similar to resetting the latency counter. Program flow may then proceed to operation 460.
The embodiments illustrated in Figures 4A and 4B are configured to provide interrupt moderation at the latency interrupt interval when a guest is active and to provide an interrupt at the overflow interrupt interval, e.g., to prevent receive buffer overflow. The embodiments illustrated in Figures 4A and 4B do not include an explicit guest state register. Rather a guest device driver may be configured to reset the latency counter when it completes packet processing, thereby implicitly "informing" a network adapter that the guest associated with a packet flow is active.
While the foregoing is prided as exemplary system architectures and
methodologies, modifications to the present disclosure are possible. For example, operating system 1 13, VMM 1 10 and/or guest operating system(s) 1 15A,..., n may manage system resources and control tasks that are run on system 102. For example, guest OS 1 15 A,..., n may be implemented using Microsoft Windows, HP-UX, Linux, or UNIX, although other operating systems may be used. When a Microsoft Windows operating system is used, the ndis.sys driver may be utilized at least by guest device driver 1 17A,..., n and an intermediate driver (not shown). For example, the ndis.sys driver may be utilized to define application programming interfaces (APIs) that can be used for transferring packets between layers.
Guest operating system 1 15A,..., n may implement one or more protocol stacks (not shown). A protocol stack may execute one or more programs to process packets. An example of a protocol stack is a TCP/IP (Transport Control Protocol/Internet Protocol) protocol stack comprising one or more programs for handling (e.g., processing or generating) packets to transmit and/or receive over a network. A protocol stack may alternatively be comprised on a dedicated sub-system such as, for example, a TCP offload engine.
Other modifications are possible. For example, memory 108 and/or memory associated with the network adaptor 104 (not shown) may comprise one or more of the following types of memory: semiconductor firmware memory, programmable memory, non-volatile memory, read only memory, electrically programmable memory, random access memory, flash memory, magnetic disk memory, and/or optical disk memory.
Either additionally or alternatively, memory 108 and/or memory associated with the network adaptor 104 (not shown) may comprise other and/or later-developed types of computer-readable memory.
Embodiments of the methods described herein may be implemented in a system that includes one or more storage mediums having stored thereon, individually or in combination, instructions that when executed by one or more processors perform the methods. Here, the processor may include, for example, a system CPU (e.g., core processor of Fig. 1) and/or programmable circuitry such as the MAC circuitry. Thus, it is intended that operations according to the methods described herein may be distributed across a plurality of physical devices, such as processing structures at several different physical locations. The storage medium may include any type of tangible medium, for example, any type of disk including floppy disks, optical disks, compact disk read-only memories (CD-ROMs), compact disk rewritables (CD-RWs), and magneto-optical disks, semiconductor devices such as read-only memories (ROMs), random access memories (RAMs) such as dynamic and static RAMs, erasable programmable read-only memories (EPROMs), electrically erasable programmable read-only memories (EEPROMs), flash memories, magnetic or optical cards, or any type of media suitable for storing electronic instructions.
The Ethernet communications protocol, described herein, may be capable permitting communication using a Transmission Control Protocol/Internet Protocol
(TCP/IP). The Ethernet protocol may comply or be compatible with the Ethernet standard published by the Institute of Electrical and Electronics Engineers (IEEE) titled "IEEE 802.3 Standard", published in March, 2002 and/or later versions of this standard.
As used herein, a "PHY" may be defined as an object and/or circuitry used to interface to one or more devices, and such object and/or circuitry may be defined by one or more of the communication protocols set forth herein. The PHY may comprise a physical PHY comprising transceiver circuitry to interface to the applicable
communication link. The PHY may alternately and/or additionally comprise a virtual PHY to interface to another virtual PHY or to a physical PHY. PHY circuitry 224 may comply or be compatible with, the aforementioned IEEE 802.3 Ethernet communications protocol, which may include, for example, 100BASE-TX, 100BASE-T, 10GBASE-T, 10GBASE-KR, 10GBASE-KX4/XAUI, 40GbE and or lOOGbE compliant PHY circuitry, and/or PHY circuitry that is compliant with an after-developed communications protocol.
"Circuitry", as used in any embodiment herein, may comprise, for example, singly or in any combination, hardwired circuitry, programmable circuitry, state machine circuitry, and/or firmware that stores instructions executed by programmable circuitry.
The terms and expressions which have been employed herein are used as terms of description and not of limitation, and there is no intention, in the use of such terms and expressions, of excluding any equivalents of the features shown and described (or portions thereof), and it is recognized that various modifications are possible within the scope of the claims. Accordingly, the claims are intended to cover all such equivalents.

Claims

CLAIMS What is claimed is:
1. A method, comprising:
defining an overflow interrupt interval for a device, the overflow interrupt interval related to a critical event in a host, the critical event related to a guest comprising a network application, wherein the guest is active for a first time interval and inactive for a second time interval; and
generating a first interrupt from the device to the host if the overflow interrupt interval expires and a packet associated with the network application is received by the device.
2. The method of claim 1 , further comprising:
defining a latency interrupt interval for the device;
moderating interrupts, based on the latency interrupt interval, from the device to the host when the guest is active on the host, wherein the interrupts are moderated so that each interrupt occurs at the latency interrupt interval; and
generating a second interrupt from the device to the host if the latency interrupt interval expires and a packet has been received by the device.
3. The method of claim 2, further comprising:
indicating whether the guest is active or inactive using a register in the device, wherein the host is configured to update the register when a state of the guest changes; and determining whether the guest is active or inactive based, at least in part, on the register.
4. The method of claim 3, further comprising:
resetting a latency counter if the guest is active, wherein resetting the latency counter corresponds to starting another latency interrupt interval and the latency counter is related to the latency interrupt interval; and
resetting an overflow counter if the guest is inactive, wherein resetting the overflow counter corresponds to starting another overflow interrupt interval and the overflow counter is related to the overflow interrupt interval.
5. The method of claim 2, further comprising:
resetting an overflow counter related to the overflow interrupt interval, wherein the overflow counter is reset based on the first or the second interrupt and resetting the overflow counter corresponds to starting another overflow interrupt interval.
6. The method of claim 2, further comprising:
receiving the first or second interrupt at the host;
processing the received packets; and
resetting a latency counter related to the latency interrupt interval, wherein the packets are processed and the latency counter is reset by the host and resetting the latency counter corresponds to starting another latency interrupt interval.
7. The method of claim 6, further comprising:
resetting an overflow counter related to the overflow interrupt interval, wherein the overflow counter is reset by the host.
8. A system comprising, one or more storage mediums having stored thereon, individually or in combination, instructions that when executed by one or more processors result in the following operations comprising:
defining an overflow interrupt interval for a device, the overflow interrupt interval related to a critical event in a host, the critical event related to a guest comprising a network application, wherein the guest is active for a first time interval and inactive for a second time interval; and
generating a first interrupt from the device to the host if the overflow interrupt interval expires and a packet associated with the network application is received by the device.
9. The system of claim 8, wherein the instructions that when executed by one or more of the processors result in the following additional operations comprising:
defining a latency interrupt interval for the device;
moderating interrupts, based on the latency interrupt interval, from the device to the host when the guest is active on the host, wherein the interrupts are moderated so that each interrupt occurs at the latency interrupt interval; and
generating a second interrupt from the device to the host if the latency interrupt interval expires and a packet has been received by the device.
10. The system of claim 9, wherein the instructions that when executed by one or more of the processors result in the following additional operations comprising:
indicating whether the guest is active or inactive using a register in the device, wherein the host is configured to update the register when a state of the guest changes; and determining whether the guest is active or inactive based, at least in part, on the register.
1 1. The system of claim 10, wherein the instructions that when executed by one or more of the processors result in the following additional operations comprising:
resetting a latency counter if the guest is active, wherein resetting the latency counter corresponds to starting another latency interrupt interval and the latency counter is related to the latency interrupt interval; and
resetting an overflow counter if the guest is inactive, wherein resetting the overflow counter corresponds to starting another overflow interrupt interval and the overflow counter is related to the overflow interrupt interval.
12. The system of claim 9, wherein the instructions that when executed by one or more of the processors result in the following additional operations comprising:
resetting an overflow counter related to the overflow interrupt interval, wherein the overflow counter is reset based on the first or the second interrupt and resetting the overflow counter corresponds to starting another overflow interrupt interval.
13. The system of claim 9, wherein the instructions that when executed by one or more of the processors result in the following additional operations comprising:
receiving the first or second interrupt;
processing the received packets; and
resetting a latency counter related to the latency interrupt interval, wherein the packets are processed and the latency counter is reset by the host and resetting the latency counter corresponds to starting another latency interrupt interval.
14. The system of claim 13, wherein the instructions that when executed by one or more of the processors result in the following additional operations comprising:
resetting an overflow counter related to the overflow interrupt interval, wherein the overflow counter is reset by the host.
15. A system comprising:
a host comprising a processor coupled to host memory, wherein the host memory is configured to store a guest comprising a network application; and
a network adapter coupled to the host, the network adapter comprising interrupt moderation circuitry configured to:
store an overflow interrupt interval for the network adapter, the overflow interrupt interval related to a critical event in the host, the critical event related to the guest, wherein the guest is active for a first time interval and inactive for a second time interval; and
generate a first interrupt from the network adapter to the host if the overflow interrupt interval expires and a packet associated with the network application is received by the network adapter.
16. The system of claim 15, wherein:
the network adapter is configured to:
store a latency interrupt interval for the network adapter;
moderate interrupts, based on the latency interrupt interval, from the network adapter to the host when the guest is active on the host, wherein the interrupts are moderated so that each interrupt occurs at the latency interrupt interval; and
generate a second interrupt from the network adapter to the host if the latency interrupt interval expires and a packet has been received by the network adapter.
17. The system of claim 16, wherein: the host is configured to:
indicate whether the guest is active or inactive using a register in the interrupt moderation circuitry, and
update the register when a state of the guest changes; and the interrupt moderation circuitry is configured to:
determine whether the guest is active or inactive based, at least in part, on the register.
18. The system of claim 17, wherein the interrupt moderation circuitry is further configured to:
reset a latency counter if the guest is active, wherein resetting the latency counter corresponds to starting another latency interrupt interval and the latency counter is related to the latency interrupt interval; and
reset an overflow counter if the guest is inactive, wherein resetting the overflow counter corresponds to starting another overflow interrupt interval.
19. The system of claim 16, wherein the interrupt moderation circuitry is further configured to:
reset an overflow counter related to the overflow interrupt interval, wherein the overflow counter is reset based on the first or the second interrupt and resetting the overflow counter corresponds to starting another overflow interrupt interval.
20. The system of claim 16, wherein:
the host is further configured to:
receive the first or the second interrupt;
process the received packets; and
reset a latency counter related to the latency interrupt interval, wherein resetting the latency counter corresponds to starting another latency interrupt interval.
21. The network adapter of claim 20, wherein the host is further configured to:
reset an overflow counter related to the overflow interrupt interval.
PCT/CN2009/001480 2009-12-17 2009-12-17 Cooperated interrupt moderation for a virtualization environment WO2011072423A1 (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
US13/516,149 US9176770B2 (en) 2009-12-17 2009-12-17 Cooperated interrupt moderation for a virtualization environment
EP16182820.7A EP3115894A1 (en) 2009-12-17 2009-12-17 Cooperated interrupt moderation for a virtualization environment
EP09852161.0A EP2513792B1 (en) 2009-12-17 2009-12-17 Cooperated interrupt moderation for a virtualization environment
PCT/CN2009/001480 WO2011072423A1 (en) 2009-12-17 2009-12-17 Cooperated interrupt moderation for a virtualization environment
US14/930,413 US9921868B2 (en) 2009-12-17 2015-11-02 Cooperated interrupt moderation for a virtualization environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2009/001480 WO2011072423A1 (en) 2009-12-17 2009-12-17 Cooperated interrupt moderation for a virtualization environment

Related Child Applications (2)

Application Number Title Priority Date Filing Date
US13/516,149 A-371-Of-International US9176770B2 (en) 2009-12-17 2009-12-17 Cooperated interrupt moderation for a virtualization environment
US14/930,413 Continuation US9921868B2 (en) 2009-12-17 2015-11-02 Cooperated interrupt moderation for a virtualization environment

Publications (1)

Publication Number Publication Date
WO2011072423A1 true WO2011072423A1 (en) 2011-06-23

Family

ID=44166707

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2009/001480 WO2011072423A1 (en) 2009-12-17 2009-12-17 Cooperated interrupt moderation for a virtualization environment

Country Status (3)

Country Link
US (2) US9176770B2 (en)
EP (2) EP3115894A1 (en)
WO (1) WO2011072423A1 (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011072423A1 (en) * 2009-12-17 2011-06-23 Intel Corporation Cooperated interrupt moderation for a virtualization environment
CN103414535B (en) * 2013-07-31 2017-04-19 华为技术有限公司 Data sending method, data receiving method and relevant devices
US9424216B2 (en) * 2014-03-14 2016-08-23 International Business Machines Corporation Ascertaining configuration of a virtual adapter in a computing environment
US9374324B2 (en) 2014-03-14 2016-06-21 International Business Machines Corporation Determining virtual adapter access controls in a computing environment
US20160124874A1 (en) * 2014-10-30 2016-05-05 Sandisk Technologies Inc. Method and apparatus for interrupt coalescing
US10430220B1 (en) * 2015-09-24 2019-10-01 EMC IP Holding Company LLC Virtual devices as protocol neutral communications mediators
JP6594533B2 (en) * 2016-05-17 2019-10-23 三菱電機株式会社 Controller system
CN108139925B (en) 2016-05-31 2022-06-03 安华高科技股份有限公司 High availability of virtual machines
JP7000088B2 (en) * 2017-09-15 2022-01-19 株式会社東芝 Notification control device, notification control method and program
GB2571922B (en) * 2018-03-05 2020-03-25 Advanced Risc Mach Ltd External exception handling
JP6974254B2 (en) 2018-05-18 2021-12-01 ルネサスエレクトロニクス株式会社 Data processing device
US11080088B2 (en) * 2018-12-19 2021-08-03 Intel Corporation Posted interrupt processing in virtual machine monitor
US11768696B2 (en) * 2020-12-14 2023-09-26 Ati Technologies Ulc Security for microengine access

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007115425A1 (en) * 2006-03-30 2007-10-18 Intel Corporation Method and apparatus for supporting heterogeneous virtualization
CN101266635A (en) * 2006-12-27 2008-09-17 英特尔公司 Providing protected access to critical memory regions
US20080235426A1 (en) * 2007-03-23 2008-09-25 Debkumar De Handling shared interrupts in bios under a virtualization technology environment
CN101373443A (en) * 2008-09-23 2009-02-25 北京中星微电子有限公司 Method for responding and stopping response of host computer and processing peripheral interrupt

Family Cites Families (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7934020B1 (en) * 2003-09-19 2011-04-26 Vmware, Inc. Managing network data transfers in a virtual computer system
US7209994B1 (en) * 2004-05-11 2007-04-24 Advanced Micro Devices, Inc. Processor that maintains virtual interrupt state and injects virtual interrupts into virtual machine guests
US7707341B1 (en) * 2004-05-11 2010-04-27 Advanced Micro Devices, Inc. Virtualizing an interrupt controller
US20060064529A1 (en) * 2004-09-23 2006-03-23 International Business Machines Corporation Method and system for controlling peripheral adapter interrupt frequency by transferring processor load information to the peripheral adapter
US7853960B1 (en) * 2005-02-25 2010-12-14 Vmware, Inc. Efficient virtualization of input/output completions for a virtual device
US7779282B2 (en) 2006-12-29 2010-08-17 Intel Corporation Maintaining network connectivity while operating in low power mode
US8453143B2 (en) * 2007-09-19 2013-05-28 Vmware, Inc. Reducing the latency of virtual interrupt delivery in virtual machines
GB2462258B (en) * 2008-07-28 2012-02-08 Advanced Risc Mach Ltd Interrupt control for virtual processing apparatus
US10521265B2 (en) * 2008-09-19 2019-12-31 Microsoft Technology Licensing, Llc Coalescing periodic timer expiration in guest operating systems in a virtualized environment
US9407550B2 (en) * 2008-11-24 2016-08-02 Avago Technologies General Ip (Singapore) Pte. Ltd. Method and system for controlling traffic over a computer network
US8234432B2 (en) * 2009-01-26 2012-07-31 Advanced Micro Devices, Inc. Memory structure to store interrupt state for inactive guests
US8478924B2 (en) * 2009-04-24 2013-07-02 Vmware, Inc. Interrupt coalescing for outstanding input/output completions
US8244946B2 (en) * 2009-10-16 2012-08-14 Brocade Communications Systems, Inc. Interrupt moderation
WO2011072423A1 (en) * 2009-12-17 2011-06-23 Intel Corporation Cooperated interrupt moderation for a virtualization environment
US8291135B2 (en) * 2010-01-15 2012-10-16 Vmware, Inc. Guest/hypervisor interrupt coalescing for storage adapter virtual function in guest passthrough mode
US8612658B1 (en) * 2010-08-24 2013-12-17 Amazon Technologies, Inc. Interrupt reduction

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007115425A1 (en) * 2006-03-30 2007-10-18 Intel Corporation Method and apparatus for supporting heterogeneous virtualization
CN101266635A (en) * 2006-12-27 2008-09-17 英特尔公司 Providing protected access to critical memory regions
US20080235426A1 (en) * 2007-03-23 2008-09-25 Debkumar De Handling shared interrupts in bios under a virtualization technology environment
CN101373443A (en) * 2008-09-23 2009-02-25 北京中星微电子有限公司 Method for responding and stopping response of host computer and processing peripheral interrupt

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP2513792A4 *

Also Published As

Publication number Publication date
EP3115894A1 (en) 2017-01-11
US9921868B2 (en) 2018-03-20
US20130159580A1 (en) 2013-06-20
EP2513792A4 (en) 2013-06-05
EP2513792B1 (en) 2016-08-17
US9176770B2 (en) 2015-11-03
EP2513792A1 (en) 2012-10-24
US20160124766A1 (en) 2016-05-05

Similar Documents

Publication Publication Date Title
US9921868B2 (en) Cooperated interrupt moderation for a virtualization environment
US11182317B2 (en) Dual-driver interface
US9552216B2 (en) Pass-through network interface controller configured to support latency sensitive virtual machines
EP1856610B1 (en) Transmit completion event batching
US8726093B2 (en) Method and system for maintaining direct hardware access in the event of network interface card failure
WO2020236280A1 (en) System and method for facilitating operation management in a network interface controller (nic) for accelerators
US8543729B2 (en) Virtualised receive side scaling
US9354952B2 (en) Application-driven shared device queue polling
US9009702B2 (en) Application-driven shared device queue polling in a virtualized computing environment
US20150058848A1 (en) Encapsulation of an application for virtualization
US20100333112A1 (en) Method and System for Secure Communication Between Processor Partitions
JPWO2021130828A5 (en) In-server delay control device, server, in-server delay control method and program
US9612877B1 (en) High performance computing in a virtualized environment
US8762615B2 (en) Dequeue operation using mask vector to manage input/output interruptions
US10803007B1 (en) Reconfigurable instruction
US8041906B2 (en) Notification processing
Paterson et al. An RTEMS port for the AT6981 SpaceWire-enabled processor: Features and performance
Brief Intel® Ethernet Server Adapter I350

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 09852161

Country of ref document: EP

Kind code of ref document: A1

REEP Request for entry into the european phase

Ref document number: 2009852161

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2009852161

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 13516149

Country of ref document: US