WO2014116215A1 - Shared resource contention - Google Patents

Shared resource contention Download PDF

Info

Publication number
WO2014116215A1
WO2014116215A1 PCT/US2013/022766 US2013022766W WO2014116215A1 WO 2014116215 A1 WO2014116215 A1 WO 2014116215A1 US 2013022766 W US2013022766 W US 2013022766W WO 2014116215 A1 WO2014116215 A1 WO 2014116215A1
Authority
WO
WIPO (PCT)
Prior art keywords
contention
probe
region
vms
measurements
Prior art date
Application number
PCT/US2013/022766
Other languages
French (fr)
Inventor
Chris D. Hyser
Jerome Rolia
Diwakar Krishnamurthy
Joydeep Mukherjee
Original Assignee
Hewlett-Packard Development Company, L.P.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett-Packard Development Company, L.P. filed Critical Hewlett-Packard Development Company, L.P.
Priority to US14/759,659 priority Critical patent/US9954757B2/en
Priority to PCT/US2013/022766 priority patent/WO2014116215A1/en
Priority to EP13872320.0A priority patent/EP2948841A4/en
Priority to CN201380070722.8A priority patent/CN105074651A/en
Publication of WO2014116215A1 publication Critical patent/WO2014116215A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • G06F9/5088Techniques for rebalancing the load in a distributed system involving task migration
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/12Network monitoring probes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/301Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is a virtual computing platform, e.g. logically partitioned systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3089Monitoring arrangements determined by the means or processing involved in sensing the monitored data, e.g. interfaces, connectors, sensors, probes, agents
    • G06F11/3093Configuration details thereof, e.g. installation, enabling, spatial arrangement of the probes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • G06F11/3428Benchmarking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • G06F11/3433Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment for load management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0876Network utilisation, e.g. volume of load or congestion level
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/70Admission control; Resource allocation
    • H04L47/78Architectures of resource allocation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/70Admission control; Resource allocation
    • H04L47/83Admission control; Resource allocation based on usage prediction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/4557Distribution of virtual machine instances; Migration and load balancing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45591Monitoring or debugging support
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/815Virtual
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/508Monitor

Definitions

  • VMs Virtual machines
  • a V is a software implementation of a machine that executes programs like a physical machine.
  • VMs may be used to run multiple OS environments on the same computer in isolation from each other and each VM may host one or more applications. It is common for modern processors to run virtual machines, and the VMs may share caches, memory, and other resources.
  • Figure 1 illustrates a system
  • Figure 2 illustrates a system including sockets sharing resources.
  • Figure 3 illustrates a flow chart of a method.
  • Figure 4 illustrates a computer system that is operable to be used for the system in figures 1 or 2.
  • a probe is used to determine contention amongst resource consumers in a contention region in a shared resource environment.
  • the contention region is comprised of shared resources in the shared resource environment.
  • Resource consumers include any item utilizing the shared resources, such as applications running on a server or VMs and applications running on the VMs.
  • the probe for example comprises code running in the contention region simulating a resource consumer.
  • the probe utilizes the shared resources concurrently with the resource consumers to take measurements used to determine contention amongst the resource consumers for the shared resources.
  • Contention for example is the interference between the resource consumers concurrently utilizing the shared resources that cause the resource consumers to behave differently together than in isolation. Contention can cause performance or quality of service (QoS) of the resource consumers to degrade below a benchmark (e.g., threshold).
  • QoS quality of service
  • Resource consumers may include VMs.
  • Shared resources such as an L3 cache, a virtual machine monitor, memory or a network interface, typically are not provisioned for each VM but are still used by all the VMs.
  • the resource sharing may cause degradation of VM QoS. For example, if multiple VMs share an L3 cache or share a memory and one VM is dominating the cache and/or memory access, the performance of applications executed by the other VMs may suffer.
  • Probe measurements may be used to determine contention, and VM provisioning decisions may be made based on the performance measured by the probe. For example, a VM may be migrated outside the contention region, such as to another socket or machine, to improve QoS. In a cloud system, provisioning based on the probe measurements can be used to improve QoS and to meet service level agreement performance benchmarks.
  • FIG. 1 illustrates a shared resource environment contention determination system 100.
  • the system 100 comprises a shared resource environment including a contention region 110.
  • the contention region 110 includes shared resources 130 that are shared by resources consumers 108, which in this example include VMs 101a-n and the probe 102.
  • the contention region does not include VMs and may comprise a single machine or multiple machines where multiple workloads are being consolidated.
  • a resource consumer may include an application utilizing the shared resources but not running in a VM.
  • the shared resources 130 comprise shared resources 131a-f, such as memory, hierarchal caches (e.g., L3 cache and/or L1-L2 caches), cores, virtual machine monitor, network interface (e.g., NIC), servers, paths through a hypervisor or any resource that is shared.
  • a probe 102 also shares the shared resources 130 with the VMs 101a-n.
  • the VMs 101a-n may each run one or more applications, such as applications 102a-n.
  • the contention region 110 may be provided in a distributed computing system, such as a cloud system.
  • the VMs may be provisioned to run workloads of users of the distributed computing system. For example, users may need to run the applications 102a-n and VMs are provisioned on the shared resources 130 to run the applications to accommodate the demand of the users.
  • the probe 102 for example comprises machine readable instructions executed to simulate a workload on the shared resources 130.
  • the probe 102 is computer code that simulates operations of an application running on a VM in the contention region 110.
  • the probe 102 may contend for the shared resources 130 with the VMs 101a-n. As the contention for the shared resources 130 increases, it may cause degradation in performance of the applications 102a-n and the probe 102.
  • the measurements performed by the probe 102 may indicate an amount of contention and the degradation in performance.
  • the probe 102 may be tuned to make the measurements depending on the shared resources being measured.
  • the probe may be tuned according to the type of resource and attributes of the resource. For example, the probe 102 may make measurements for accessing an L3 cache.
  • the probe 102 is tuned for the cache size attribute. If the L3 cache is 8MB and the L3 cache is tuned for the incorrect size, such as 16MB, then the measurements may not be accurate.
  • the probe 102 may be tuned through an automated process or a process involving user input to vary parameters during tuning.
  • a load generator 120 determines measurements from or using the probe 102, compares the measurements to benchmarks, and can determine a contention value indicative of an amount of contention among the resource consumers 108 running in the contention region 1 10.
  • the benchmarks may be baseline measurements made by the probe 102 if the probe was not in contention with the VMs 101a-n. For example, the probe 102 is executed without the resource consumers 108 and makes performance measurements of the shared resources 130 to determine the benchmarks.
  • the load generator 120 may compare the measurements made by the probe 102 when sharing the resources 130 with the benchmarks.
  • the load generator 120 controls the probe 102 to execute at specific times to measure performance of the shared resources.
  • the load generator 120 for example is deployed on a separate host distinct from the contention region 110.
  • the load generator 120 may support multiple probes. A contention region may have more than one probe, however, the probe 102 may perform multiple different measurements to characterize a number of shared resources. Also, the load generator 120 may support probes in different contention regions. Some examples of the measurements determined performed using the probe 102 include response time and throughput.
  • the probe 102 can alternate between phases of measuring different metrics, and the alternating may be controlled by the load generator 120 instructing the probe 102 or may be determined by the probe itself.
  • one phase may be a connection phase include measuring probe response times.
  • the load generator 120 submits requests to the probe 102 and measures the probe's response time.
  • contention is determined from this external measure of the probe's response time.
  • This connection phase may include sensing contention at a virtual machine monitor related to TCP/IP connections.
  • a virtual machine monitor is Kernel Virtual Memory (KVM) in LINUX.
  • KVM Kernel Virtual Memory
  • the virtual machine monitor is one of the shared resources 130.
  • a memory phase may measure contention for a memory or a cache (e.g., L3 cache).
  • the response time to retrieve requested data from a memory or a cache is determined.
  • the response time is measured by the probe 102.
  • the probe 102 may execute a script in a loop that accesses memory in a way that uses a lot of L3 cache. The higher the response time, the greater the number of cache misses are inferred.
  • the probe 102 can report its response times to the load generator 120.
  • the load generator 120 can control which phase the probe 102 is in and is able to communicate any inferred performance degradations to the management system 140.
  • FIG. 2 shows an example of the system 100 including a socket or multiple sockets comprising the shared resources 130 in the contention region 110.
  • the contention region 110 in other examples may comprise multiple servers or another set of shared resources.
  • sockets 201 and 202 are shown.
  • the shared resources 130 include CPU cores 210a-h and 220a-h.
  • Other shared resources include the L3 caches 211 and 221 and memories 212 and 222.
  • a socket may comprise a physical package or chip including the CPU cores and L3 cache. Sockets may participate in inter-socket communications.
  • each of the sockets 201 and 202 are a separate contention region and in another example, the sockets 201 and 202 together represent the contention region 110.
  • VMs may be assigned per socket and each socket may include a probe.
  • VMs 230a-h run on cores 210a-h and probe 240 runs on core 210a.
  • VMs 231 a-h run on cores 220a-h and probe 241 runs on core 220a.
  • VMs may be pinned to the near memory to improve performance.
  • the probes 240 and 241 may report measures of their own internal performance to the load generator 120.
  • the load generator 120 may measure response times or throughputs of probes 240 and 241.
  • the load generator 120 may send contention values to the management system 140 and the management system 140 may make VM provisioning decisions based on the contention values.
  • Tuning of a probe is now described according to an example with respect to the connection phase and the memory phase for probe measurements.
  • Other techniques may be used to tune the probe and the probe may be tuned for determining measurements other than connection rates and memory access time/cache misses.
  • the probe may comprise a low overhead application capable of measuring metrics indicating contention among unmanaged shared resources.
  • Parameters for the probe e.g., c, t1 , and t2 described below, may be optimized in a one factor at a time manner until the probe has low resource utilization yet correctly reports the degradations as incurred by the micro-benchmark VMs.
  • micro-benchmarks are applications. Each micro-benchmark is designed to stress a particular shared resource. To tune one of the probe's phases, a corresponding microbenchmark is increased until performance degradation is observed. This process is repeated again but with the probe in place.
  • connection phase a web server instance within the probe is subjected to a burst of c Web connections per second (cps) over a period of t1 seconds.
  • the web server instance services a connection request by executing a CPU-bound PHP script.
  • tuning the probe in the memory phase comprises the probe executing a script that traverses an array of integers of size n for t2 seconds, where n is related to the L3 cache size for the socket. In this example, the value of n is chosen so as to cause a L3 miss rate of 0.1.
  • the values of c, t1 , n, and t2 are carefully tuned for each type of server in a resource pool such that the probe imposes low overhead while having the ability to identify contention among VMs.
  • the parameters of the probe are automatically tuned so that the probe 102 alternates between its connection and memory phases.
  • a controlled experiment is setup involving the application-intensive web microbenchmark.
  • two scenarios are identified with one VM per core, one with an arrival rate per VM that does not have a performance discontinuity and one with an arrival rate that does.
  • the discontinuity is a performance discontinuity. For example, based on an experiment, up to a load of 160 cps per VM there is no significant change in the mean response time of VMs. The per-core utilization at this load level is 0.75.
  • t1 is set to be the entire duration of the micro-benchmark's execution.
  • the probe is executed for the scenario without the discontinuity.
  • the c value is progressively decreased till the probe does not introduce any performance degradation for the VMs.
  • the probe is run with this setting for the other scenario with the performance discontinuity.
  • the value of t1 is now progressively decreased up to the point the probe is able to still identify the problem.
  • a similar approach of using a caching or memory micro-benchmark for selecting n first followed by t2 later is adopted for tuning the memory phase of the probe.
  • the probes may be used to determine measurements for example in a connection and a memory phase.
  • the load generator 120 determines the measurements and determines a contention value from the measurements.
  • the contention value for example is indicative of an amount of contention for a shared resource amongst the resource consumers and can be used to determine an actual capacity of the shared resources.
  • Contention values can be sent to the management system 140.
  • the contention values may indicate whether the contention exceeds a threshold. For example, the amount of contention may be used to report degradations beyond a threshold to the management system 140.
  • the management system 140 may include a scheduler that provisions VMs. Provisioning decisions may be based on the measurements.
  • a contention value determined from the measurements may include the actual capacity, which can be used to determine whether another VM can be provisioned in the contention region and still maintain a particular QoS.
  • f is a contention value
  • an f of greater than 1 means there is contention for the shared resources.
  • No contention means a resource consumer gets access to a shared resource as if no other resource consumer is using it.
  • Contention may include some degradation in performance from the no contention operation. For example, instead of completing 1 job per hour, an application completes .8 of the job per hour.
  • f 2
  • f 2
  • a method 300 is described with respect to the system 100 shown in figure 1 or 2 by way of example. The method may be performed by other systems.
  • Figure 3 shows the method 300 of estimating contention among resource consumers in a contention region in a shared resource environment.
  • the load generator 120 determines measurements of performance of the shared resources 130 in the contention region 120 from the probe 102. Examples of performance measurements include response time to establish a new TCP/IP connection and serve a web page or estimated cache misses.
  • the load generator compares the measurements to benchmarks.
  • the benchmarks may include values representing a desired performance.
  • the benchmarks may be determined from measurements of the probe 102 running in isolation in the contention region 110 where there is no contention with other resource consumers.
  • the load generator 120 determines a contention value representative of an amount of contention among the resource consumers running in the contention region based on the comparison.
  • the contention value may be the f value described above or an amount of contention above a benchmark.
  • Some or all of the method and operations and functions described above may be provided as machine readable instructions executable by a processor and stored on a non-transitory computer readable storage medium. For example, they may exist as program(s) comprised of program instructions in source code, object code, executable code or other formats.
  • FIG 4 there is shown a computer platform 400 for the load generator 120. It is understood that the illustration of the platform 400 is a generalized illustration and that the platform 400 may include additional components and that some of the components described may be removed and/or modified without departing from a scope of the platform 400.
  • the platform 400 includes processor(s) 401 , such as a central processing unit, ASIC or other type of processing circuit; a display 402 and/or other input/output devices, an interface 403, such as a network interface to a Local Area Network (LAN), a wireless 802.11x LAN, a 3G or 4G mobile WAN or a WiMax WAN; and a computer-readable medium 404. Each of these components may be operatively coupled to a bus 408.
  • a non-transitory computer readable medium (CRM), such as CRM 404 may be any suitable medium which stores instructions for execution by the processor(s) 401 for execution.
  • the CRM 404 may be non-volatile media, such as a magnetic disk or solid-state non-volatile memory or volatile media.
  • the CRM 404 may include machine instructions 405 for the load generator 120.

Abstract

Contention for shared resources in a shared resource environment may be determined based on measurements from a probe running in the shared resource environment. The measurements can be compared to benchmarks, and a contention value may be determined based on the comparison.

Description

SHARED RESOURCE CONTENTION
BACKGROUND
[001] Large scale shared resource pools such as private and public clouds are being used to host many kinds of applications. In many instances, virtualization is employed in the resource pools. Virtual machines (VMs) are created and run on a physical machine to host applications. A V is a software implementation of a machine that executes programs like a physical machine. VMs may be used to run multiple OS environments on the same computer in isolation from each other and each VM may host one or more applications. It is common for modern processors to run virtual machines, and the VMs may share caches, memory, and other resources.
BRIEF DESCRIPTION OF THE DRAWINGS
[002] Embodiments are described in detail in the following description with reference to the following figures. The figures show examples of the embodiments and like reference numerals indicate similar elements in the accompanying figures.
[003] Figure 1 illustrates a system.
[004] Figure 2 illustrates a system including sockets sharing resources.
[005] Figure 3 illustrates a flow chart of a method.
[006] Figure 4 illustrates a computer system that is operable to be used for the system in figures 1 or 2.
DETAILED DESCRIPTION
[007] For simplicity and illustrative purposes, the principles of the embodiments are described by referring mainly to examples thereof. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the embodiments. It is apparent however, to one of ordinary skill in the art, that the embodiments may be practiced without limitation to these specific details. In some instances, well known methods and structures have not been described in detail so as not to unnecessarily obscure the description of the embodiments.
[008] According to an embodiment, a probe is used to determine contention amongst resource consumers in a contention region in a shared resource environment. The contention region is comprised of shared resources in the shared resource environment. Resource consumers include any item utilizing the shared resources, such as applications running on a server or VMs and applications running on the VMs. The probe for example comprises code running in the contention region simulating a resource consumer. The probe utilizes the shared resources concurrently with the resource consumers to take measurements used to determine contention amongst the resource consumers for the shared resources. Contention for example is the interference between the resource consumers concurrently utilizing the shared resources that cause the resource consumers to behave differently together than in isolation. Contention can cause performance or quality of service (QoS) of the resource consumers to degrade below a benchmark (e.g., threshold).
[009] Resource consumers may include VMs. Shared resources, such as an L3 cache, a virtual machine monitor, memory or a network interface, typically are not provisioned for each VM but are still used by all the VMs. The resource sharing may cause degradation of VM QoS. For example, if multiple VMs share an L3 cache or share a memory and one VM is dominating the cache and/or memory access, the performance of applications executed by the other VMs may suffer. Probe measurements may be used to determine contention, and VM provisioning decisions may be made based on the performance measured by the probe. For example, a VM may be migrated outside the contention region, such as to another socket or machine, to improve QoS. In a cloud system, provisioning based on the probe measurements can be used to improve QoS and to meet service level agreement performance benchmarks.
[0010] Figure 1 illustrates a shared resource environment contention determination system 100. The system 100 comprises a shared resource environment including a contention region 110. The contention region 110 includes shared resources 130 that are shared by resources consumers 108, which in this example include VMs 101a-n and the probe 102. In other examples, the contention region does not include VMs and may comprise a single machine or multiple machines where multiple workloads are being consolidated. A resource consumer may include an application utilizing the shared resources but not running in a VM. The shared resources 130 comprise shared resources 131a-f, such as memory, hierarchal caches (e.g., L3 cache and/or L1-L2 caches), cores, virtual machine monitor, network interface (e.g., NIC), servers, paths through a hypervisor or any resource that is shared. A probe 102 also shares the shared resources 130 with the VMs 101a-n. The VMs 101a-n may each run one or more applications, such as applications 102a-n. The contention region 110 may be provided in a distributed computing system, such as a cloud system. The VMs may be provisioned to run workloads of users of the distributed computing system. For example, users may need to run the applications 102a-n and VMs are provisioned on the shared resources 130 to run the applications to accommodate the demand of the users.
[0011] The probe 102 for example comprises machine readable instructions executed to simulate a workload on the shared resources 130. For example, the probe 102 is computer code that simulates operations of an application running on a VM in the contention region 110. The probe 102 may contend for the shared resources 130 with the VMs 101a-n. As the contention for the shared resources 130 increases, it may cause degradation in performance of the applications 102a-n and the probe 102. The measurements performed by the probe 102 may indicate an amount of contention and the degradation in performance.
[0012] The probe 102 may be tuned to make the measurements depending on the shared resources being measured. The probe may be tuned according to the type of resource and attributes of the resource. For example, the probe 102 may make measurements for accessing an L3 cache. The probe 102 is tuned for the cache size attribute. If the L3 cache is 8MB and the L3 cache is tuned for the incorrect size, such as 16MB, then the measurements may not be accurate. The probe 102 may be tuned through an automated process or a process involving user input to vary parameters during tuning.
[0013] A load generator 120 determines measurements from or using the probe 102, compares the measurements to benchmarks, and can determine a contention value indicative of an amount of contention among the resource consumers 108 running in the contention region 1 10. The benchmarks may be baseline measurements made by the probe 102 if the probe was not in contention with the VMs 101a-n. For example, the probe 102 is executed without the resource consumers 108 and makes performance measurements of the shared resources 130 to determine the benchmarks. The load generator 120 may compare the measurements made by the probe 102 when sharing the resources 130 with the benchmarks. Any statistically significant deviation between these sets of measures can be reported to a management system 140 with information on the type of resource contention that is present, thus allowing the management system 140 to initiate actions to remedy the problem, such as migrating a VM to another machine. In one example, the load generator 120 controls the probe 102 to execute at specific times to measure performance of the shared resources. The load generator 120 for example is deployed on a separate host distinct from the contention region 110. The load generator 120 may support multiple probes. A contention region may have more than one probe, however, the probe 102 may perform multiple different measurements to characterize a number of shared resources. Also, the load generator 120 may support probes in different contention regions. Some examples of the measurements determined performed using the probe 102 include response time and throughput. The probe 102 can alternate between phases of measuring different metrics, and the alternating may be controlled by the load generator 120 instructing the probe 102 or may be determined by the probe itself. For example, one phase may be a connection phase include measuring probe response times. For example, for the connection phase, the load generator 120 submits requests to the probe 102 and measures the probe's response time. In this example, contention is determined from this external measure of the probe's response time. This connection phase may include sensing contention at a virtual machine monitor related to TCP/IP connections. One example of a virtual machine monitor is Kernel Virtual Memory (KVM) in LINUX. The virtual machine monitor is one of the shared resources 130. A memory phase may measure contention for a memory or a cache (e.g., L3 cache). In one example, the response time to retrieve requested data from a memory or a cache is determined. The response time is measured by the probe 102. To test an L3 cache, for example, the probe 102 may execute a script in a loop that accesses memory in a way that uses a lot of L3 cache. The higher the response time, the greater the number of cache misses are inferred. The probe 102 can report its response times to the load generator 120. The load generator 120 can control which phase the probe 102 is in and is able to communicate any inferred performance degradations to the management system 140.
[0014] Modern processors have adopted multi-socket architectures with non-uniform memory access (NUMA) and level 3 (L3) caches shared by the CPU cores of each socket. Figure 2 shows an example of the system 100 including a socket or multiple sockets comprising the shared resources 130 in the contention region 110. The contention region 110 in other examples may comprise multiple servers or another set of shared resources. Referring to figure 2, sockets 201 and 202 are shown. The shared resources 130 include CPU cores 210a-h and 220a-h. Other shared resources include the L3 caches 211 and 221 and memories 212 and 222. A socket may comprise a physical package or chip including the CPU cores and L3 cache. Sockets may participate in inter-socket communications. In one example, each of the sockets 201 and 202 are a separate contention region and in another example, the sockets 201 and 202 together represent the contention region 110.
[0015] Multiple VMs may be assigned per socket and each socket may include a probe. For example, VMs 230a-h run on cores 210a-h and probe 240 runs on core 210a. VMs 231 a-h run on cores 220a-h and probe 241 runs on core 220a. VMs may be pinned to the near memory to improve performance. The probes 240 and 241 may report measures of their own internal performance to the load generator 120. Also, the load generator 120 may measure response times or throughputs of probes 240 and 241. The load generator 120 may send contention values to the management system 140 and the management system 140 may make VM provisioning decisions based on the contention values.
[0016] Tuning of a probe, such as probes 102, 240, or 241 , is now described according to an example with respect to the connection phase and the memory phase for probe measurements. Other techniques may be used to tune the probe and the probe may be tuned for determining measurements other than connection rates and memory access time/cache misses. The probe may comprise a low overhead application capable of measuring metrics indicating contention among unmanaged shared resources. Parameters for the probe, e.g., c, t1 , and t2 described below, may be optimized in a one factor at a time manner until the probe has low resource utilization yet correctly reports the degradations as incurred by the micro-benchmark VMs. For example, micro-benchmarks are applications. Each micro-benchmark is designed to stress a particular shared resource. To tune one of the probe's phases, a corresponding microbenchmark is increased until performance degradation is observed. This process is repeated again but with the probe in place.
[0017] One example of tuning the probe for the connection phase is now described. In the connection phase, a web server instance within the probe is subjected to a burst of c Web connections per second (cps) over a period of t1 seconds. The web server instance services a connection request by executing a CPU-bound PHP script. One example of tuning the probe in the memory phase comprises the probe executing a script that traverses an array of integers of size n for t2 seconds, where n is related to the L3 cache size for the socket. In this example, the value of n is chosen so as to cause a L3 miss rate of 0.1. The values of c, t1 , n, and t2 are carefully tuned for each type of server in a resource pool such that the probe imposes low overhead while having the ability to identify contention among VMs.
[0018] In one example, the parameters of the probe are automatically tuned so that the probe 102 alternates between its connection and memory phases. Specifically, for tuning c and t1 for the connection phase, a controlled experiment is setup involving the application-intensive web microbenchmark. Without using the probe, two scenarios are identified with one VM per core, one with an arrival rate per VM that does not have a performance discontinuity and one with an arrival rate that does. The discontinuity is a performance discontinuity. For example, based on an experiment, up to a load of 160 cps per VM there is no significant change in the mean response time of VMs. The per-core utilization at this load level is 0.75. However, there is significant increase in mean response time at a load of 165 cps per VM. A large c value is a cps that when added to the first case causes the discontinuity. t1 is set to be the entire duration of the micro-benchmark's execution. The probe is executed for the scenario without the discontinuity. The c value is progressively decreased till the probe does not introduce any performance degradation for the VMs. Next, the probe is run with this setting for the other scenario with the performance discontinuity. The value of t1 is now progressively decreased up to the point the probe is able to still identify the problem. A similar approach of using a caching or memory micro-benchmark for selecting n first followed by t2 later is adopted for tuning the memory phase of the probe.
[0019] The probes may be used to determine measurements for example in a connection and a memory phase. For example, the load generator 120 determines the measurements and determines a contention value from the measurements. The contention value for example is indicative of an amount of contention for a shared resource amongst the resource consumers and can be used to determine an actual capacity of the shared resources. Contention values can be sent to the management system 140. The contention values may indicate whether the contention exceeds a threshold. For example, the amount of contention may be used to report degradations beyond a threshold to the management system 140. The management system 140 may include a scheduler that provisions VMs. Provisioning decisions may be based on the measurements.
[0020] A contention value determined from the measurements may include the actual capacity, which can be used to determine whether another VM can be provisioned in the contention region and still maintain a particular QoS. For example, f is a contention value; f=1 means no contention and the resource consumers are behaving as if running in isolation; and an f of greater than 1 means there is contention for the shared resources. No contention means a resource consumer gets access to a shared resource as if no other resource consumer is using it. Contention may include some degradation in performance from the no contention operation. For example, instead of completing 1 job per hour, an application completes .8 of the job per hour.
[0021] In one example, the management system 140 may characterize CPU capacity requirements using numerical values referred to as shares. Shares are treated as "additive" so the total number of shares represent the total capacity. Assume C is the number of host CPU shares, P is the number of shares already provisioned, and N the number of shares required for an additional VM. Then after provisioning the host's available capacity A, A = C - f(P + N). f is the contention value. So if f =1 , e.g., no contention, then A = C - f(P + N). If f is greater than one then there is contention and the actual available capacity is measured as a function of contention. If f=2, this represents that the actual available capacity is half of what it is when compared to a no contention scenario. The f value may be determined from the measurements from the probe and the comparison to a benchmark. For example, if a measured response time is twice as long as a benchmark, then f may be set to 2.
[0022] A method 300 is described with respect to the system 100 shown in figure 1 or 2 by way of example. The method may be performed by other systems. Figure 3 shows the method 300 of estimating contention among resource consumers in a contention region in a shared resource environment. At 301 , the load generator 120 determines measurements of performance of the shared resources 130 in the contention region 120 from the probe 102. Examples of performance measurements include response time to establish a new TCP/IP connection and serve a web page or estimated cache misses. At 302, the load generator compares the measurements to benchmarks. The benchmarks may include values representing a desired performance. The benchmarks may be determined from measurements of the probe 102 running in isolation in the contention region 110 where there is no contention with other resource consumers. At 303, the load generator 120 determines a contention value representative of an amount of contention among the resource consumers running in the contention region based on the comparison. The contention value may be the f value described above or an amount of contention above a benchmark.
[0023] Some or all of the method and operations and functions described above may be provided as machine readable instructions executable by a processor and stored on a non-transitory computer readable storage medium. For example, they may exist as program(s) comprised of program instructions in source code, object code, executable code or other formats. [0024] Referring to figure 4, there is shown a computer platform 400 for the load generator 120. It is understood that the illustration of the platform 400 is a generalized illustration and that the platform 400 may include additional components and that some of the components described may be removed and/or modified without departing from a scope of the platform 400.
[0025] The platform 400 includes processor(s) 401 , such as a central processing unit, ASIC or other type of processing circuit; a display 402 and/or other input/output devices, an interface 403, such as a network interface to a Local Area Network (LAN), a wireless 802.11x LAN, a 3G or 4G mobile WAN or a WiMax WAN; and a computer-readable medium 404. Each of these components may be operatively coupled to a bus 408. A non-transitory computer readable medium (CRM), such as CRM 404 may be any suitable medium which stores instructions for execution by the processor(s) 401 for execution. For example, the CRM 404 may be non-volatile media, such as a magnetic disk or solid-state non-volatile memory or volatile media. The CRM 404 may include machine instructions 405 for the load generator 120.
[0026] While embodiments have been described with reference to the disclosure above, those skilled in the art are able to make various modifications to the described embodiments without departing from the scope of the embodiments as described in the following claims, and their equivalents.

Claims

What is claimed is:
1. A shared resource environment contention determination system comprising: a load generator executed by a processor to determine measurements from a probe running with virtual machines (VMs) in a contention region in a shared resource environment, wherein the measurements measure performance of shared resources utilized by the probe and the VMs in the contention region, compare the measurements to benchmarks, and determine a contention value representative of an amount of contention among the VMs running in the contention region based on the comparison, wherein the contention comprises interference among the VMs utilizing the shared resources.
2. The system of claim 1 , wherein the shared resources in the contention region include a virtual machine monitor and an L3 cache.
3. The system of claim 2, wherein the measurements for the shared resources comprise connection arrival rates measured for the virtual machine monitor and cache misses measured for the L3 cache.
4. The system of claim 1 , wherein the contention causes degradation in performance of applications running on the VMs that is determined from the contention value.
5. The system of claim 1 , wherein the probe comprises code executed in the contention region simulating operation of an application running on a VM in the contention region.
6. The system of claim 1 , wherein the measurements are for metrics measuring the performance of the shared resources in the contention region, and the benchmarks comprise measurements for the metrics measured by the probe running in the contention region in isolation without sharing the shared resources with the VMs.
7. The system of claim 1 , wherein the probe is tuned based on a type of the shared resource being measured and attributes of the shared resource.
8. The system of claim 1 , wherein the load generator is to determine if the amount of contention exceeds a threshold and to report the contention value to a management system to control the VMs if the amount of contention exceeds the threshold.
9. The system of claim 8, wherein the management system is to migrate a VM to reduce the impact of contention on the VMs in response to the amount of contention exceeding the threshold.
10. The system of claim 1 , wherein an available capacity of a shared resource in the contention region is calculated based on the contention value.
11. The system of claim 10, wherein the available capacity comprises a number of shares of one of the shared resources that are available for allocation to a VM in the contention region, and the available capacity A is equal to C - f(P+N), wherein C is total capacity in terms of shares of the shared resource, f is the contention value, P is a number of shares already provisioned and N is a number of shares required for an additional VM in the contention region.
12. A method of estimating contention among resource consumers in a contention region in a shared resource environment, the method comprising:
determining measurements of performance of shared resources in the contention region from a probe running with the resource consumers;
comparing the measurements to benchmarks; and
determining, by a processor, a contention value representative of an amount of contention among the resource consumers running in the contention region based on the comparison, wherein the contention comprises interference among the resource consumers and the probe utilizing shared resources in the contention region and the contention causes a degradation in performance of the resource consumers.
13. The method of claim 12, comprising:
determining if the amount of contention exceeds a threshold; and sending the contention value to a management system if the amount of contention exceeds the threshold to control consumption of the shared resources by the resource consumers based on the contention value.
14. The method of claim 12, wherein the resource consumers comprise
VMs running applications or applications not running on a VM.
15. A non-transitory computer readable medium including machine readable instructions executable by at least one processor to:
determine measurements of performance of a virtual machine monitor and an L3 cache in a contention region from a probe running with VMs in the contention region, wherein the probe simulates a VM running in the contention region;
compare the measurements to benchmarks; and
determine a contention value representative of an amount of contention among the VMs running in the contention region based on the comparison, wherein the contention comprises interference among the VMs using the virtual machine monitor and the L3 cache.
PCT/US2013/022766 2013-01-23 2013-01-23 Shared resource contention WO2014116215A1 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
US14/759,659 US9954757B2 (en) 2013-01-23 2013-01-23 Shared resource contention
PCT/US2013/022766 WO2014116215A1 (en) 2013-01-23 2013-01-23 Shared resource contention
EP13872320.0A EP2948841A4 (en) 2013-01-23 2013-01-23 Shared resource contention
CN201380070722.8A CN105074651A (en) 2013-01-23 2013-01-23 Shared resource contention

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2013/022766 WO2014116215A1 (en) 2013-01-23 2013-01-23 Shared resource contention

Publications (1)

Publication Number Publication Date
WO2014116215A1 true WO2014116215A1 (en) 2014-07-31

Family

ID=51227890

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2013/022766 WO2014116215A1 (en) 2013-01-23 2013-01-23 Shared resource contention

Country Status (4)

Country Link
US (1) US9954757B2 (en)
EP (1) EP2948841A4 (en)
CN (1) CN105074651A (en)
WO (1) WO2014116215A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018508847A (en) * 2015-01-05 2018-03-29 アンキ,インコーポレイテッド Adaptive data analysis service

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160050112A1 (en) * 2014-08-13 2016-02-18 PernixData, Inc. Distributed caching systems and methods
US10152357B1 (en) * 2016-05-02 2018-12-11 EMC IP Holding Company LLC Monitoring application workloads scheduled on heterogeneous elements of information technology infrastructure
US20180173547A1 (en) * 2016-12-20 2018-06-21 Intel Corporation Pinning of virtual network function (vnf) deployments using hardware metrics
JP7035858B2 (en) * 2018-07-03 2022-03-15 富士通株式会社 Migration management program, migration method and migration system
US10831543B2 (en) * 2018-11-16 2020-11-10 International Business Machines Corporation Contention-aware resource provisioning in heterogeneous processors
US11507425B2 (en) 2019-11-19 2022-11-22 Huawei Cloud Computing Technologies Co., Ltd. Compute instance provisioning based on usage of physical and virtual components
US20230031963A1 (en) * 2021-07-20 2023-02-02 Oracle International Corporation System and method for estimation of performance impact upon a hypervisor in a cloud environment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6785880B1 (en) * 1999-07-29 2004-08-31 International Business Machines Corporation Tooling framework system and method for code generation
US7240094B2 (en) * 1997-07-03 2007-07-03 Centra Software Inc. Method and system for synchronizing and serving multimedia in a distributed network
US20090089788A1 (en) * 2007-09-28 2009-04-02 International Business Machines Corporation System and method for handling resource contention
US20120144489A1 (en) * 2010-12-07 2012-06-07 Microsoft Corporation Antimalware Protection of Virtual Machines

Family Cites Families (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1186317B1 (en) * 2000-06-13 2009-12-02 Microsoft Corporation Multilingual user interface for an operating system
CA2453610A1 (en) * 2003-12-17 2005-06-17 Ibm Canada Limited-Ibm Canada Limitee System and method for sharing resource properties in a multi-user environment
CN1912839A (en) * 2005-08-10 2007-02-14 乐金电子(昆山)电脑有限公司 Making of resource group and method of saving storage space in application program
CN100383733C (en) * 2005-10-10 2008-04-23 华为技术有限公司 Software subsystem of electronic information system
US8099487B1 (en) 2006-07-06 2012-01-17 Netapp, Inc. Systems and methods for determining placement of virtual machines
US8606911B2 (en) 2009-03-02 2013-12-10 Headwater Partners I Llc Flow tagging for service policy implementation
US8751627B2 (en) 2009-05-05 2014-06-10 Accenture Global Services Limited Method and system for application migration in a cloud
US20110035248A1 (en) 2009-08-07 2011-02-10 Loic Juillard Distributed Service Platform Computing with a Guaranteed Quality of Service
US20110055838A1 (en) * 2009-08-28 2011-03-03 Moyes William A Optimized thread scheduling via hardware performance monitoring
US8589921B2 (en) 2009-11-30 2013-11-19 Red Hat Israel, Ltd. Method and system for target host optimization based on resource sharing in a load balancing host and virtual machine adjustable selection algorithm
US8464255B2 (en) 2010-03-12 2013-06-11 Microsoft Corporation Managing performance interference effects on cloud computing servers
US8572612B2 (en) 2010-04-14 2013-10-29 International Business Machines Corporation Autonomic scaling of virtual machines in a cloud computing environment
US8707300B2 (en) 2010-07-26 2014-04-22 Microsoft Corporation Workload interference estimation and performance optimization
US10203974B2 (en) * 2010-12-20 2019-02-12 Microsoft Technology Licensing, Llc Probe insertion via background virtual machine
WO2012125144A1 (en) 2011-03-11 2012-09-20 Joyent, Inc. Systems and methods for sizing resources in a cloud-based environment
US9268542B1 (en) * 2011-04-28 2016-02-23 Google Inc. Cache contention management on a multicore processor based on the degree of contention exceeding a threshold
US9262322B2 (en) * 2013-09-17 2016-02-16 Advanced Micro Devices, Inc. Method and apparatus for storing a processor architectural state in cache memory
US9760465B2 (en) * 2014-01-02 2017-09-12 International Business Machines Corporation Assessment of processor performance metrics by monitoring probes constructed using instruction sequences
US9374324B2 (en) * 2014-03-14 2016-06-21 International Business Machines Corporation Determining virtual adapter access controls in a computing environment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7240094B2 (en) * 1997-07-03 2007-07-03 Centra Software Inc. Method and system for synchronizing and serving multimedia in a distributed network
US6785880B1 (en) * 1999-07-29 2004-08-31 International Business Machines Corporation Tooling framework system and method for code generation
US20090089788A1 (en) * 2007-09-28 2009-04-02 International Business Machines Corporation System and method for handling resource contention
US20120144489A1 (en) * 2010-12-07 2012-06-07 Microsoft Corporation Antimalware Protection of Virtual Machines

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018508847A (en) * 2015-01-05 2018-03-29 アンキ,インコーポレイテッド Adaptive data analysis service
EP3243169A4 (en) * 2015-01-05 2018-10-03 Anki, Inc. Adaptive data analytics service

Also Published As

Publication number Publication date
EP2948841A1 (en) 2015-12-02
US20150350055A1 (en) 2015-12-03
EP2948841A4 (en) 2016-09-07
US9954757B2 (en) 2018-04-24
CN105074651A (en) 2015-11-18

Similar Documents

Publication Publication Date Title
US9954757B2 (en) Shared resource contention
TWI591542B (en) Cloud compute node,method and system,and computer readable medium
Chen et al. Cloudscope: Diagnosing and managing performance interference in multi-tenant clouds
Xu et al. Heterogeneity and interference-aware virtual machine provisioning for predictable performance in the cloud
US11714668B2 (en) Supporting quality-of-service for virtual machines based on operational events
US10025364B2 (en) GPU power measuring method of heterogeneous multi-core system
Liu et al. Optimizing virtual machine consolidation performance on NUMA server architecture for cloud workloads
US8839042B2 (en) Dynamic load calculation and predictive scaling
US9465630B1 (en) Assigning dynamic weighted variables to cluster resources for virtual machine provisioning
JP6168576B2 (en) Method, apparatus and system for virtual machine migration management
Amendola et al. Bandwidth management VMs live migration in wireless fog computing for 5G networks
US11340945B2 (en) Memory congestion aware NUMA management
Hwang et al. Scale-out vs. scale-up techniques for cloud performance and productivity
RU2015114568A (en) AUTOMATED RESOURCE USE PROFILING
US20160156567A1 (en) Allocation method of a computer resource and computer system
Inomata et al. Proposal and evaluation of a dynamic resource allocation method based on the load of VMs on IaaS
Podolskiy et al. Iaas reactive autoscaling performance challenges
US9417927B2 (en) Runtime capacity planning in a simultaneous multithreading (SMT) environment
US10169102B2 (en) Load calculation method, load calculation program, and load calculation apparatus
WO2018086467A1 (en) Method, apparatus and system for allocating resources of application clusters under cloud environment
CN104335180A (en) Real time measurement of virtualization I/O processing delays
De Maio et al. A workload-aware energy model for virtual machine migration
Shirinbab et al. Performance comparison of kvm, vmware and xenserver using a large telecommunication application
Li et al. Informed live migration strategies of virtual machines for cluster load balancing
Mathews et al. A fast recursive least-squares second order Volterra filter

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 201380070722.8

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13872320

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 14759659

Country of ref document: US

REEP Request for entry into the european phase

Ref document number: 2013872320

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2013872320

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE