WO2018196296A1 - 一种非一致性内存访问架构下的虚拟机调度装置及方法 - Google Patents

一种非一致性内存访问架构下的虚拟机调度装置及方法 Download PDF

Info

Publication number
WO2018196296A1
WO2018196296A1 PCT/CN2017/106748 CN2017106748W WO2018196296A1 WO 2018196296 A1 WO2018196296 A1 WO 2018196296A1 CN 2017106748 W CN2017106748 W CN 2017106748W WO 2018196296 A1 WO2018196296 A1 WO 2018196296A1
Authority
WO
WIPO (PCT)
Prior art keywords
virtual machine
scheduling
algorithm
performance
module
Prior art date
Application number
PCT/CN2017/106748
Other languages
English (en)
French (fr)
Inventor
管海兵
马汝辉
李健
戚正伟
谭钧升
Original Assignee
上海交通大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海交通大学 filed Critical 上海交通大学
Priority to US16/466,184 priority Critical patent/US11204798B2/en
Publication of WO2018196296A1 publication Critical patent/WO2018196296A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5077Logical partitioning of resources; Management or configuration of virtualized resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0223User address space allocation, e.g. contiguous or non contiguous base addressing
    • G06F12/023Free address space management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0223User address space allocation, e.g. contiguous or non contiguous base addressing
    • G06F12/0284Multiple user address space allocation, e.g. using different base addresses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/485Task life-cycle, e.g. stopping, restarting, resuming execution
    • G06F9/4856Task life-cycle, e.g. stopping, restarting, resuming execution resumption being on a different machine, e.g. task migration, virtual machine migration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/4557Distribution of virtual machine instances; Migration and load balancing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45579I/O management, e.g. providing access to device drivers or storage
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45583Memory management, e.g. access or allocation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45591Monitoring or debugging support
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/25Using a specific main memory architecture
    • G06F2212/254Distributed memory
    • G06F2212/2542Non-uniform memory access [NUMA] architecture

Definitions

  • the present invention relates to the field of computer virtualization technologies, and in particular, to a virtual machine scheduling apparatus and method in a non-uniform memory access architecture.
  • Virtualization is one of the key technologies of cloud computing. Virtualization technology can virtualize a physical computer system into one or more virtual computer systems. Each virtual computer system (referred to as a client or virtual machine) has its own virtual hardware (such as CPU, memory, and devices) to provide a separate virtual machine execution environment. The real physical computer system running the virtual machine becomes the host. Virtualization technology is widely used in cloud computing and high-performance computing due to its fault tolerance and high resource utilization. Currently representative cloud computing providers include Facebook Cloud and Amazon Cloud.
  • VMM Virtual Machine Management
  • Etc. and abstract the underlying hardware resources into corresponding virtual device interfaces for use by virtual machines.
  • NUMA Non-Uniform Memory Access
  • the basic feature of the NUMA architecture is that it has multiple CPU modules.
  • Each CPU module consists of multiple CPU cores (such as eight) and has independent local memory, I/O slots, and so on.
  • interconnect modules such as Intel's Quick Path Interconnect
  • each CPU can access the entire system's memory.
  • accessing local memory will be much faster than accessing remote memory (memory of other nodes in the system).
  • the NUMA architecture poses significant challenges for virtual machine performance optimization because the host's NUMA topology is often invisible to virtual machines.
  • the technical problem to be solved by the present invention is to develop a virtual machine scheduling apparatus and method under a non-uniform memory access architecture, and invent a non-uniform memory access under the NUMA architecture.
  • a virtual machine scheduling apparatus and method under a non-uniform memory access architecture, and invent a non-uniform memory access under the NUMA architecture.
  • researchers only need to pay attention to the implementation of NUMA scheduling optimization algorithm, without worrying about the details of virtual machine information, performance and other data collection and virtual machine specific scheduling, thus greatly improving the research efficiency of researchers. .
  • the present invention provides a virtual machine scheduling apparatus under a non-uniform memory access architecture, including a performance monitoring module, an algorithm implementation interface module, and a virtual machine scheduling module;
  • the performance monitoring module is configured to monitor specific performance events by using a performance monitoring unit of the operating system kernel;
  • the algorithm implementation interface module is configured to expose the virtual machine scheduling function interface to the researcher, and pass the information of the performance monitoring module to the algorithm implementer, and the algorithm implementer returns the scheduling decision through the function;
  • the virtual machine scheduling module is configured to perform scheduling of the corresponding virtual machine VCPU and virtual machine memory according to the scheduling decision returned by the interface module.
  • performance events monitored by the performance monitoring module include virtual machine CPU usage, memory usage, cache loss rate, and I/O performance data.
  • the invention also provides a method for a virtual machine scheduling device under a non-uniform memory access architecture, comprising the following steps:
  • Step 1 The performance monitoring module obtains the NUMA topology information of the host, and monitors the virtual machine performance event through the kernel PMU.
  • Step 2 transmitting the host NUMA topology information and the virtual machine performance event to the algorithm implementation interface module;
  • Step 3 The algorithm implements an interface module calling algorithm, and waits for the scheduling algorithm to execute the scheduling decision that is sent to the virtual machine scheduling module after the execution of the scheduling algorithm is completed;
  • Step 4 The virtual machine scheduling module implements scheduling of the virtual machine VCPU and the memory by the scheduling decision delivered by the interface module according to the algorithm;
  • Step 5 After the virtual machine scheduling is completed, skip to step 1 and continue to perform performance monitoring on the virtual machine.
  • the scheduling algorithm includes: a greedy algorithm.
  • the host NUMA topology information includes the number of NUMA nodes, the distance between each NUMA node, and the NUMA node to which the I/O device is connected.
  • step 1 specifically includes real-time monitoring of performance events such as CPU usage, memory usage, and I/O usage rate of the virtual machine through the virtual machine monitor VMM, and real-time monitoring the operating system cache loss through the performance monitoring unit of the operating system kernel. Rate, performance events such as the number of cycles per second executed by the virtual machine instruction, and the topology of the host's non-uniform memory access architecture.
  • FIG. 1 is a schematic diagram of a non-uniform memory access architecture of a virtual machine scheduling device in a non-uniform memory access architecture according to a preferred embodiment of the present invention
  • FIG. 2 is a schematic diagram of a system architecture of a virtual machine scheduling apparatus in a non-uniform memory access architecture according to a preferred embodiment of the present invention.
  • FIG. 3 is a schematic diagram of an operation flow of a virtual machine scheduling method in a non-uniform memory access architecture according to a preferred embodiment of the present invention.
  • a preferred embodiment of the present invention provides a virtual machine scheduling apparatus in a non-uniform memory access architecture, which includes a performance monitoring module, an algorithm implementation interface module, and a virtual machine scheduling module.
  • the performance monitoring module is configured to monitor a specific performance event by using a performance monitoring unit of the operating system kernel;
  • the algorithm implementation interface module is configured to expose the virtual machine scheduling function interface to the researcher, and the performance monitoring module information Passed to the algorithm implementer, the algorithm implementer returns the scheduling decision through the function;
  • the virtual machine scheduling module is set to perform the scheduling of the corresponding virtual machine VCPU and the virtual machine memory according to the scheduling decision returned by the interface module.
  • Performance events monitored by the performance monitoring module include virtual machine CPU usage, memory usage, cache loss rate, and I/O performance data.
  • a preferred embodiment of the present invention provides a method for scheduling a virtual machine in a non-uniform memory access architecture, which includes the following steps:
  • Step 1 The performance monitoring module obtains the NUMA topology information of the host, and monitors the virtual machine performance event through the kernel PMU.
  • Step 2 The host NUMA topology information and the virtual machine performance event are transmitted to the algorithm implementation interface module; the host NUMA topology information includes the number of NUMA nodes, the distance between each NUMA node, and the NUMA node to which the I/O device is connected.
  • Step 3 The algorithm implements an interface module calling algorithm, and waits for the scheduling algorithm to execute the scheduling decision that is sent to the virtual machine scheduling module after the execution of the scheduling algorithm is completed;
  • Step 4 The virtual machine scheduling module implements scheduling of the virtual machine VCPU and the memory by the scheduling decision delivered by the interface module according to the algorithm;
  • Step 5 After the virtual machine scheduling is completed, skip to step 1 and continue to perform performance monitoring on the virtual machine.
  • the scheduling algorithm is a greedy algorithm, and the algorithm flow includes:
  • the algorithm inputs the host NUMA topology information and the virtual machine performance event delivered by the performance monitoring module;
  • the PacketsPerSecond VM is the number of network packets sent and received per second of the virtual machine monitored by the performance monitoring module, and the threshold is a predefined threshold.
  • n represents a certain NUMA node
  • N represents the number of NUMA nodes provided by the performance monitoring module
  • Mem[n] represents the number of memory pages of the virtual machine distributed in the NUMA node n
  • the data is also performed by the performance monitoring module.
  • ANMMatrix(n) is the distance between the NUMA nodes provided by the performance monitoring module. For N NUMA nodes, the algorithm selects the node with the largest value calculated by the above formula and dispatches the virtual machine to the node.
  • N is the number of NUMA nodes of the host
  • CPU[c] is the CPU usage of the virtual machine at node c
  • Mem[n] is the number of memory pages of the virtual machine distributed at NUMA node n
  • ANMMatrix(n) is Place The distance between the NUMA nodes provided by the performance monitoring module.
  • the algorithm calculates the value of each node by the above formula, then selects the node with the largest value and dispatches the virtual machine to the node.
  • Step 1 specifically includes real-time monitoring of virtual machine CPU usage, memory usage, I/O usage and other performance events through the virtual machine monitor VMM, and real-time monitoring the operating system cache loss rate through the operating system kernel performance monitoring unit.
  • Performance events such as the number of cycles per second executed by the virtual machine instruction, and obtain the topology information of the host's non-uniform memory access architecture.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

本发明公开了一种非一致性内存访问架构下的虚拟机调度装置及其方法,所述方法包括以下步骤:所述性能监控模块获取宿主机NUMA拓扑信息,并通过内核PMU监控虚拟机性能事件;将所述宿主机NUMA拓扑信息和所述虚拟机性能事件传递给所述算法实现接口模块;所述算法实现接口模块调用算法,等待调度算法执行完毕后将调度算法得出的调度决策传递给虚拟机调度模块;所述虚拟机调度模块根据算法实现接口模块传递的调度决策对虚拟机VCPU和内存的调度;所述虚拟机调度完成后,跳转到继续对虚拟机进行性能监控。本发明所述方法只需关注NUMA调度优化算法的实现,无需操心虚拟机信息、性能等数据的收集和虚拟机的具体调度等细节,从而极大的提高研究者的研究效率。

Description

一种非一致性内存访问架构下的虚拟机调度装置及方法 技术领域
本发明涉及计算机虚拟化技术领域,尤其涉及一种非一致性内存访问架构下的虚拟机调度装置及方法。
背景技术
虚拟化是云计算关键技术之一。虚拟化技术可以将一台物理计算机系统虚拟化为一台或多台虚拟计算机系统。每个虚拟计算机系统(简称为客户机或虚拟机)都拥有自己的虚拟硬件(如CPU、内存和设备等),来提供一个独立的虚拟机执行环境。而运行虚拟机的真实物理计算机系统则成为宿主机。虚拟化技术由于其具有的容错性和高资源利用率而广泛应用在云计算和高性能计算等领域。目前比较有代表性的云计算提供商包括阿里云和亚马逊云等。
在虚拟化环境里,虚拟机监视器(Virtual Machine Management,VMM)是存在于硬件和操作系统之间的一个软件管理层,其主要负责管理真实的物理资源,如CPU、内存及I/O设备等,并将底层的硬件资源抽象为对应的虚拟设备接口供虚拟机使用
与此同时,非一致性内存访问(Non-Uniform Memory Access,NUMA)架构因其可扩展性已经成为了现代服务器的主流架构。如图1所示,NUMA架构的基本特征是具有多个CPU模块,每个CPU模块由多个CPU核(如8个)组成,并且具有独立的本地内存、I/O插槽等。由于其节点之间通过互联模块(如Intel的Quick Path Interconnect)进行连接和信息交互,因此每个CPU可以访问整个系统的内存。显然,访问本地内存的速度将远远高于访问远程内存(系统内其它节点的内存)的速度。NUMA架构给虚拟机的性能优化带来了明显的挑战,因为宿主机的NUMA拓扑架构对于虚拟机来说往往是透明不可见的。
现在基本上所有的虚拟机监视器,包括Xen,KVM和VMware ESXi采用的方法都是尽量将一个虚拟机的虚拟CPU(VCPU)和所有的内存都调度到一个节点上来保持本地访问,但是这种方法存在很大的缺陷,因为系统的负载均衡技术和其他技术会动态的平衡CPU和内存间的负载,导致原本的放置策略被干扰,最后使策略失效。因此,在NUMA架构下,如何对虚拟机的VCPU和内存进行调度优化,已成为如今的一个热门研究领域。
然而,研究人员在研究NUMA调度优化算法时,除了实现算法,还需要考虑如何在特定平台系统上实现虚拟机性能信息、系统NUMA拓扑信息等的收集,虚拟机 VCPU和内存的调度等具体的细节。除此之外,在实现NUMA调度优化算法时,还需要考虑不同的VMM,比如XEN,KVM的接口的不同,这对于NUMA调度优化算法研究来说,是很大的负担,严重影响了研究者研究NUMA调度优化算法的效率。
因此,本领域的技术人员致力于开发一种非一致性内存访问架构下的虚拟机调度装置及方法,在NUMA架构下,通过发明的一种非一致性内存访问架构下的虚拟机调度装置,研究者只需关注NUMA调度优化算法的实现,无需操心虚拟机信息、性能等数据的收集和虚拟机的具体调度等细节,从而极大的提高研究者的研究效率。
发明内容
有鉴于现有技术的上述缺陷,本发明所要解决的技术问题是开发一种非一致性内存访问架构下的虚拟机调度装置及方法,在NUMA架构下,通过发明的一种非一致性内存访问架构下的虚拟机调度装置,研究者只需关注NUMA调度优化算法的实现,无需操心虚拟机信息、性能等数据的收集和虚拟机的具体调度等细节,从而极大的提高研究者的研究效率。
为实现上述目的,本发明提供了一种非一致性内存访问架构下的虚拟机调度装置,包括性能监控模块、算法实现接口模块和虚拟机调度模块;其中,
性能监控模块被设置为通过使用操作系统内核的性能监控单元来监控特定的性能事件;
算法实现接口模块被设置为通过暴露虚拟机调度函数接口给研究者实现,并将性能监控模块的信息传递给算法实现者,算法实现者通过函数返回调度决策;
虚拟机调度模块被设置为根据算法实现接口模块返回的调度决策进行相应的虚拟机VCPU和虚拟机内存的调度。
进一步地,性能监控模块监控的性能事件包括虚拟机的CPU使用率、内存使用率、缓存丢失率和I/O性能数据。
本发明还提供了一种非一致性内存访问架构下的虚拟机调度装置的方法,包括以下步骤:
步骤1、性能监控模块获取宿主机NUMA拓扑信息,并通过内核PMU监控虚拟机性能事件;
步骤2、将宿主机NUMA拓扑信息和虚拟机性能事件传递给算法实现接口模块;
步骤3、算法实现接口模块调用算法,等待调度算法执行完毕后将调度算法得出的调度决策传递给虚拟机调度模块;
步骤4、虚拟机调度模块根据算法实现接口模块传递的调度决策对虚拟机VCPU和内存的调度;
步骤5、虚拟机调度完成后,跳转到步骤1中继续对虚拟机进行性能监控。
进一步地,调度算法包括:贪心算法。
进一步地,宿主机NUMA拓扑信息包括NUMA节点数目、各个NUMA节点之间的距离以及I/O设备所连接的NUMA节点。
进一步地,步骤1具体包括通过虚拟机监视器VMM实时监控虚拟机的CPU使用率、内存使用率、I/O使用率等性能事件,通过操作系统内核的性能监控单元实时监控操作系统的缓存丢失率、虚拟机指令执行的每秒周期数等性能事件,并获得宿主机的非一致性内存访问架构的拓扑结构。
本发明有以下技术效果
(1)考虑了I/O设备的和处理器节点的亲和性,在传统的建模方法上添加了一个维度,系统更加能够反映出当今高性能I/O环境下的I/O设备的重要性;
(2)通过使用内核PMU来监控虚拟机性能事件,极大地减少了虚拟机性能监控的开销;
(3)通过将虚拟机调度装置分为三个模块,减少了模块之间的耦合度,每个模块之间可以独立设计开发,提高了研究者的研究开发效率。
以下将结合附图对本发明的构思、具体结构及产生的技术效果作进一步说明,以充分地了解本发明的目的、特征和效果。
附图说明
图1是本发明的一个较佳实施例的一种非一致性内存访问架构下的虚拟机调度装置的非一致性内存访问架构示意图
图2是本发明的一个较佳实施例的一种非一致性内存访问架构下的虚拟机调度装置的系统架构示意图。
图3是本发明的一个较佳实施例的一种非一致性内存访问架构下的虚拟机调度方法的运行流程示意图。
具体实施方式
如图2所示,本发明的一较佳实施例提供了一种非一致性内存访问架构下的虚拟机调度装置,其特征在于,包括性能监控模块、算法实现接口模块和虚拟机调度模块;其中,性能监控模块被设置为通过使用操作系统内核的性能监控单元来监控特定的性能事件;算法实现接口模块被设置为通过暴露虚拟机调度函数接口给研究者实现,并将性能监控模块的信息传递给算法实现者,算法实现者通过函数返回调度决策;虚拟机调度模块被设置为根据算法实现接口模块返回的调度决策进行相应的虚拟机VCPU和虚拟机内存的调度。
性能监控模块监控的性能事件包括虚拟机的CPU使用率、内存使用率、缓存丢失率和I/O性能数据。
如图3所示,本发明的一较佳实施例提供了一种非一致性内存访问架构下的虚拟机调度装置的方法,其特征在于,包括以下步骤:
步骤1、性能监控模块获取宿主机NUMA拓扑信息,并通过内核PMU监控虚拟机性能事件;
步骤2、将宿主机NUMA拓扑信息和虚拟机性能事件传递给算法实现接口模块;宿主机NUMA拓扑信息包括NUMA节点数目、各个NUMA节点之间的距离以及I/O设备所连接的NUMA节点。
步骤3、算法实现接口模块调用算法,等待调度算法执行完毕后将调度算法得出的调度决策传递给虚拟机调度模块;
步骤4、虚拟机调度模块根据算法实现接口模块传递的调度决策对虚拟机VCPU和内存的调度;
步骤5、虚拟机调度完成后,跳转到步骤1中继续对虚拟机进行性能监控。
其中,所述调度算法为贪心算法,其算法流程包括:
(1)算法输入为所述性能监控模块传递的宿主机NUMA拓扑信息和虚拟机性能事件;
(2)通过以下公式判断虚拟机是否是I/O密集型虚拟机:
if PacketsPerSecondVM>threshhold
其中,PacketsPerSecondVM为所述性能监控模块所监控出来的虚拟机的每秒收发网络数据包的数目,threshhold为预定义的一个阈值。
(3)如果通过公式判断出虚拟机是I/O密集型虚拟机,则进一步通过以下公式判断应该将虚拟机调度到哪个NUMA节点;
Figure PCTCN2017106748-appb-000001
其中,n表示某个NUMA节点,N表示所述性能监控模块所提供的NUMA节点数目,Mem[n]表示虚拟机分布在NUMA节点n的内存页数目,该数据也是由所述性能监控模块所提供。ANMMatrix(n)则是所述性能监控模块所提供的NUMA节点之间的距离。对于N个NUMA节点,算法选取出通过上述公式计算得到的值最大的节点,并将虚拟机调度到该节点。
(4),如果虚拟机不是I/O密集型虚拟机,则通过以下公式判断应该将虚拟机调度到哪个NUMA节点:
Figure PCTCN2017106748-appb-000002
其中,N表示宿主机的NUMA节点数目,CPU[c]表示虚拟机在节点c的CPU使用率,Mem[n]表示虚拟机分布在NUMA节点n的内存页数目,ANMMatrix(n)则是所 述性能监控模块所提供的NUMA节点之间的距离。对于N个NUMA节点,算法通过上述公式计算得到每个节点的值,然后选取值最大的节点并将虚拟机调度到该节点。
(5)算法返回根据上述流程所得出的虚拟机调度决策。
其中,步骤1具体包括通过虚拟机监视器VMM实时监控虚拟机的CPU使用率、内存使用率、I/O使用率等性能事件,通过操作系统内核的性能监控单元实时监控操作系统的缓存丢失率、虚拟机指令执行的每秒周期数等性能事件,并获得宿主机的非一致性内存访问架构的拓扑信息。
以上详细描述了本发明的较佳具体实施例。应当理解,本领域的普通技术人员无需创造性劳动就可以根据本发明的构思作出诸多修改和变化。因此,凡本技术领域中技术人员依本发明的构思在现有技术的基础上通过逻辑分析、推理或者有限的实验可以得到的技术方案,皆应在由权利要求书所确定的保护范围内。

Claims (7)

  1. 一种非一致性内存访问架构下的虚拟机调度装置,其特征在于,包括性能监控模块、算法实现接口模块和虚拟机调度模块;其中,
    所述性能监控模块被设置为通过使用操作系统内核的性能监控单元来监控特定的性能事件;
    所述算法实现接口模块被设置为通过暴露虚拟机调度函数接口给研究者实现,并将所述性能监控模块的信息传递给算法实现者,所述算法实现者通过函数返回调度决策;
    所述虚拟机调度模块被设置为根据所述算法实现接口模块返回的调度决策进行相应的虚拟机VCPU和虚拟机内存的调度。
  2. 如权利要求1所述的一种非一致性内存访问架构下的虚拟机调度装置,其特征在于,所述性能监控模块监控的性能事件包括虚拟机的CPU使用率、内存使用率、缓存丢失率和I/O性能数据。
  3. 如权利要求1所述的一种非一致性内存访问架构下的虚拟机调度装置,其特征在于,所述性能监控模块将监测到的性能事件包括虚拟机的CPU使用率、内存使用率、缓存丢失率和I/O性能数据传递给所述算法实现接口模块。
  4. 一种利用权利要求1-3所述的非一致性内存访问架构下的虚拟机调度装置的方法,其特征在于,包括以下步骤:
    步骤1、所述性能监控模块获取宿主机NUMA拓扑信息,并通过内核PMU监控虚拟机性能事件;
    步骤2、将所述宿主机NUMA拓扑信息和所述虚拟机性能事件传递给所述算法实现接口模块;
    步骤3、所述算法实现接口模块调用算法,等待调度算法执行完毕后将调度算法得出的调度决策传递给虚拟机调度模块;
    步骤4、所述虚拟机调度模块根据算法实现接口模块传递的调度决策对虚拟机VCPU和内存的调度;
    步骤5、所述虚拟机调度完成后,跳转到步骤1中继续对虚拟机进行性能监控。
  5. 如权利要求4所述的一种非一致性内存访问架构下的虚拟机调度方法,其特征在于,所述调度算法包括贪心算法。
  6. 如权利要求4所述的一种非一致性内存访问架构下的虚拟机调度方法,其特征在于,所述宿主机NUMA拓扑信息包括NUMA节点数目、各个NUMA节点之间的距离以及I/O设备所连接的NUMA节点。
  7. 如权利要求4所述的一种非一致性内存访问架构下的虚拟机调度方法,其特 征在于,所述步骤1具体包括通过虚拟机监视器VMM实时监控虚拟机的CPU使用率、内存使用率、I/O使用率等性能事件,通过操作系统内核的所述性能监控单元实时监控操作系统的缓存丢失率、虚拟机指令执行的每秒周期数等性能事件,并获得宿主机的非一致性内存访问架构的拓扑结构。
PCT/CN2017/106748 2017-04-24 2017-10-18 一种非一致性内存访问架构下的虚拟机调度装置及方法 WO2018196296A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/466,184 US11204798B2 (en) 2017-04-24 2017-10-18 Apparatus and method for virtual machine scheduling in non-uniform memory access architecture

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710272053.2A CN107168771A (zh) 2017-04-24 2017-04-24 一种非一致性内存访问架构下的虚拟机调度装置及方法
CN201710272053.2 2017-04-24

Publications (1)

Publication Number Publication Date
WO2018196296A1 true WO2018196296A1 (zh) 2018-11-01

Family

ID=59812369

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/106748 WO2018196296A1 (zh) 2017-04-24 2017-10-18 一种非一致性内存访问架构下的虚拟机调度装置及方法

Country Status (3)

Country Link
US (1) US11204798B2 (zh)
CN (1) CN107168771A (zh)
WO (1) WO2018196296A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113127146A (zh) * 2020-01-16 2021-07-16 上海盛霄云计算技术有限公司 一种异构动态随机调度方法及系统

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107168771A (zh) 2017-04-24 2017-09-15 上海交通大学 一种非一致性内存访问架构下的虚拟机调度装置及方法
US20190065333A1 (en) * 2017-08-23 2019-02-28 Unisys Corporation Computing systems and methods with functionalities of performance monitoring of the underlying infrastructure in large emulated system
CN107967180B (zh) * 2017-12-19 2019-09-10 上海交通大学 基于numa虚拟化环境下资源全局亲和度网络优化方法和系统
CN109117247B (zh) * 2018-07-18 2021-12-07 上海交通大学 一种基于异构多核拓扑感知的虚拟资源管理系统及方法
CN109918132B (zh) * 2019-03-26 2021-04-16 龙芯中科技术股份有限公司 一种指令安装方法、装置、电子设备及存储介质
CN114090223A (zh) * 2020-08-24 2022-02-25 北京百度网讯科技有限公司 访存请求调度方法、装置、设备以及存储介质
CN113835826A (zh) * 2021-08-13 2021-12-24 奇安信科技集团股份有限公司 虚拟机处理方法、装置、电子设备、程序产品及介质
CN113434371B (zh) * 2021-08-26 2022-01-25 阿里云计算有限公司 内存访问信息的采集方法、计算设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106293881A (zh) * 2016-08-11 2017-01-04 上海交通大学 一种基于非一致性i/o访问构架的性能监控器及其监控方法
CN106293944A (zh) * 2016-08-11 2017-01-04 上海交通大学 虚拟化多核环境下基于非一致性i/o访问系统和优化方法
CN106354543A (zh) * 2016-08-11 2017-01-25 上海交通大学 一种基于虚拟机和宿主机内存地址转换的numa内存迁页方法
CN107168771A (zh) * 2017-04-24 2017-09-15 上海交通大学 一种非一致性内存访问架构下的虚拟机调度装置及方法

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090077550A1 (en) * 2007-09-13 2009-03-19 Scott Rhine Virtual machine schedular with memory access control
US9535767B2 (en) * 2009-03-26 2017-01-03 Microsoft Technology Licensing, Llc Instantiating a virtual machine with a virtual non-uniform memory architecture
EP2290562A1 (en) * 2009-08-24 2011-03-02 Amadeus S.A.S. Segmented main-memory stored relational database table system with improved collaborative scan algorithm
US8443376B2 (en) * 2010-06-01 2013-05-14 Microsoft Corporation Hypervisor scheduler
US9465669B2 (en) * 2013-08-13 2016-10-11 Vmware, Inc. NUMA scheduling using inter-vCPU memory access estimation
US9800523B2 (en) * 2014-08-22 2017-10-24 Shanghai Jiao Tong University Scheduling method for virtual processors based on the affinity of NUMA high-performance network buffer resources
US10255091B2 (en) * 2014-09-21 2019-04-09 Vmware, Inc. Adaptive CPU NUMA scheduling

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106293881A (zh) * 2016-08-11 2017-01-04 上海交通大学 一种基于非一致性i/o访问构架的性能监控器及其监控方法
CN106293944A (zh) * 2016-08-11 2017-01-04 上海交通大学 虚拟化多核环境下基于非一致性i/o访问系统和优化方法
CN106354543A (zh) * 2016-08-11 2017-01-25 上海交通大学 一种基于虚拟机和宿主机内存地址转换的numa内存迁页方法
CN107168771A (zh) * 2017-04-24 2017-09-15 上海交通大学 一种非一致性内存访问架构下的虚拟机调度装置及方法

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113127146A (zh) * 2020-01-16 2021-07-16 上海盛霄云计算技术有限公司 一种异构动态随机调度方法及系统

Also Published As

Publication number Publication date
US20200073703A1 (en) 2020-03-05
CN107168771A (zh) 2017-09-15
US11204798B2 (en) 2021-12-21

Similar Documents

Publication Publication Date Title
WO2018196296A1 (zh) 一种非一致性内存访问架构下的虚拟机调度装置及方法
US11010053B2 (en) Memory-access-resource management
Zhang et al. Automatic memory control of multiple virtual machines on a consolidated server
CN103780655A (zh) 一种消息传递接口任务和资源调度系统及方法
CN104615480A (zh) 基于numa高性能网络处理器负载的虚拟处理器调度方法
Li et al. When I/O interrupt becomes system bottleneck: Efficiency and scalability enhancement for SR-IOV network virtualization
Wang et al. Impact of virtual machine granularity on cloud computing workloads performance
Hu et al. Towards efficient server architecture for virtualized network function deployment: Implications and implementations
Zeng et al. Raccoon: A novel network i/o allocation framework for workload-aware vm scheduling in virtual environments
US20100269119A1 (en) Event-based dynamic resource provisioning
Xu et al. Performance evaluation of parallel programming in virtual machine environment
Gupta et al. Towards elastic operating systems
CN109117247B (zh) 一种基于异构多核拓扑感知的虚拟资源管理系统及方法
Song et al. Evaluation of performance unfairness in numa system architecture
Azumah et al. Hybrid cloud service selection strategies: a qualitative meta-analysis
Wu et al. Synchronization-aware scheduling for virtual clusters in cloud
Singh et al. Towards VM consolidation using a hierarchy of idle states
Ye et al. Sova: A software-defined autonomic framework for virtual network allocations
Patel et al. Analysis of workloads for cloud infrastructure capacity planning
Wang et al. A scheduling algorithm based on resource overcommitment in virtualization environments
Yu et al. A novel GPU resources management and scheduling system based on virtual machines
Nam et al. Workload-aware resource management for software-defined compute
Song et al. A dynamic resource manager with effective resource isolation based on workload types in virtualized cloud computing environments
Luo et al. Optimizing the Memory Management of a Virtual Machine Monitor on a NUMA System
Kang et al. Fast parallel simulation of a manycore architecture with a flit-level on-chip network model

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17907063

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 07/02/2020)

122 Ep: pct application non-entry in european phase

Ref document number: 17907063

Country of ref document: EP

Kind code of ref document: A1