CN106293944B

CN106293944B - non-consistency-based I/O access system and optimization method under virtualized multi-core environment

Info

Publication number: CN106293944B
Application number: CN201610657524.7A
Authority: CN
Inventors: 管海兵; 钱建民; 李阳德; 马汝辉; 李健
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2016-08-11
Filing date: 2016-08-11
Publication date: 2019-12-10
Anticipated expiration: 2036-08-11
Also published as: CN106293944A

Abstract

The invention discloses a non-uniform I/O (input/output) access system and an optimization method based on a virtualized multi-core environment, which relate to the field of computer virtualization and comprise a performance detection module, a performance monitoring module and a performance monitoring module, wherein the performance detection module monitors hardware information of a virtual machine and a physical host in real time through a modified performance monitoring tool; the thread binding module judges whether the current system is in a low load or a high load according to the hardware information collected by the performance detection module, and binds the virtual machine thread on the node with higher load to another node with lower load if the current system is in the high load condition; and if the load of the current system is low, the memory migration module migrates the related threads to the node closest to the network adapter. The system establishes an affinity optimization model based on the I/O performance under the virtualized multi-core environment, and provides a real-time dynamic high-throughput low-delay optimal placement strategy for the system, so that the performance of multi-core resources and a high-performance network adapter is efficiently utilized, and the load of the system is effectively reduced.

Description

Non-consistency-based I/O access system and optimization method under virtualized multi-core environment

Technical Field

the invention relates to the field of computer virtualization, in particular to a non-uniform I/O access-based system in a virtualized multi-core environment and an optimization method thereof.

background

Virtualization is a key technology in cloud computing. Virtualization allows multiple computer systems to run on a physical computer, abstracting the hardware resources (CPU, memory, I/O devices, etc.) of the physical computer into on-demand resources similar to power, which are made available to customers. The use of the virtualization technology greatly reduces the investment of small enterprises on server purchase, and greatly improves the use efficiency of idle hosts, so that the virtualization technology widely exists in the high-performance servers at present, and representative virtualized cloud computing examples include amazon EC2 and arieba aristo cloud.

One key component in virtualization technology is a Virtual Machine Monitor (VMM). The virtual machine monitor is responsible for abstracting host hardware resources for virtual machine usage, as well as for management of virtual machines and communication between virtual machines, among other things. Traditional hardware resources include CPU resources, Memory resources, I/O resources, and the like, and under a Non-Uniform Memory Access (NUMA) architecture, a virtualization technology mainly focuses on improving performance of the hardware resources after virtualization. However, with the development of the current high-performance network technology and CPU multi-core technology, the performance of hardware is not a bottleneck, and instead, how to efficiently process the high-performance network I/O requests in a multi-core environment becomes a bottleneck, because a very small host processing delay will cause a huge performance degradation to the virtualized network application.

The I/O virtualization is one of important components of virtualization, mainly aims at the virtualization of a network card PCIe function, and expands the number of virtual machines as much as possible under the condition of not losing I/O performance. However, as the number of cores of a high-performance physical machine is continuously increased, the number of nodes for placing physical cores is also continuously increased, and how efficiently multiple physical cores access to I/O resources becomes more and more important. For this reason, Non-Uniform I/O access (NUIOA) is proposed based on the NUMA architecture as shown in fig. 1. The various devices are directly connected to the node packet of each physical core, the devices are asymmetric, a remote node can access a certain device only through internal connection between the nodes, and therefore the access is slower than that of a near node, the asymmetric access increases the delay of a system, and finally the performance of the virtual machine is reduced. Fig. 2 is a conventional remote memory access, but since the existing system does not consider the existence of a high-performance network device, the optimization strategy of the system omits the factor of a network card, and particularly in a high-performance network environment, the performance loss caused by the omission is huge. The CPU on node 2 has to transmit data to the network card through the internal connection between node 2 and node 1, and between node 1 and node 0, which not only increases the bandwidth occupation of the internal connection, but also increases the access delay of the data.

the existing general applications are basically deployed in the cloud and have the characteristics of networking and distribution, so that high-performance reliable network transmission plays a critical role in the effective operation of the applications, and therefore the importance of the I/O device must be considered in affinity modeling under the current multi-core architecture.

At present, basically all virtual machine monitors, such as Xen, KVM and VMware ESXi, basically adopt that VCPU and all memories of one virtual machine are scheduled to one node to keep local access, but this method has a great defect because load balancing technology and other technologies of the system can dynamically balance load between CPU and memory, which causes the original placement policy to be disturbed and finally causes the policy to fail. Existing dynamically placed models are based on NUMA, and the point of interest for modeling is the locality of a memory, or the locality of the memory and the hit rate of the cache are considered at the same time, and the importance of network I/O equipment is not considered. And moreover, the modeling of the hardware and the hardware only considers the affinity between the threads and does not consider the affinity between the threads, so that the modeling accuracy is problematic.

the existing I/O performance tuning method under the virtualization multi-core architecture mainly comprises thread binding and memory migration. The thread binding refers to binding a VCPU thread of a virtual machine running an application program to a specific node, the memory migration refers to migrating a memory of the virtual machine running the application program to the specific node, and if the VCPU thread and the memory of the virtual machine are bound to the same node, the affinity between a CPU and the memory can be maximized, so that the performance of the system is improved. However, the existing research mainly focuses on two dimensions of a CPU and a memory, and omits the existence of the factor of a network card. New I/O device based affinity modeling is performed to achieve optimal system performance.

therefore, the invention aims to develop a non-uniform I/O access system and an optimization method under a virtualized multi-core environment under the NUIOA architecture, and establishes affinity optimization modeling based on I/O performance under the virtualized multi-core environment, so that the performance of multi-core resources and a high-performance network adapter is efficiently utilized, the load of the system is effectively reduced, and the system is suitable for application under the current high-performance network environment.

Disclosure of Invention

in view of the above defects in the prior art, the technical problem to be solved by the present invention is how to perform affinity modeling on I/O-based devices in addition to two dimensions, i.e., CPU and memory, in a virtualized multi-core architecture, and provide a real-time, dynamic, high-throughput, low-latency optimal placement strategy for a system, so as to achieve optimal overall system performance.

In order to achieve the above object, the present invention provides a non-uniform I/O-based access system in a virtualized multi-core environment, comprising a performance monitoring module, a thread binding module, and a memory migration module, wherein the performance monitoring module is configured to monitor hardware information of a virtual machine and a physical host in real time through a modified performance monitoring tool; the thread binding module is configured to judge whether the current system is under low load or high load according to the hardware information collected by the performance detection module, and bind the virtual machine thread on the node with higher load to another node with lower load if the current system is under high load; the memory migration module is configured to migrate the associated thread to a node closest to the network adapter if the current system load is low.

Further, the hardware information includes the number of times of accessing pages and the number of times of I/O requests by the application program in the virtual machine, and the real-time CPU load and memory load of the physical host.

the invention also provides an optimization method based on the non-uniform I/O access system in the virtualized multi-core environment, which comprises the following steps:

(1) Providing a performance monitoring module, a thread binding module and a memory migration module;

(2) Simultaneously monitoring the access times of the internal pages of the virtual machine and the times of I/O requests in unit time through a performance monitoring module;

(3) monitoring the load of a CPU and the load of a memory of a physical host in real time through a performance monitoring module;

(4) When the load of a certain node of the physical host is higher than a threshold value, the node needs to perform thread migration, and the formula of the judgment condition is as follows:

the TMT matrix represents the storage distribution of the threads, the DT matrix represents the access delay among the nodes, and the standard for migrating the threads T to the nodes K instead of the nodes P is that the average delay to the nodes K is smaller than the average access delay to the nodes P;

(5) When a certain node load of the physical machine is in a normal range, migrating the accessed hot page of the thread distributed on the remote node to the local node, wherein the judgment formula of the hot page is as follows:

if NodeAcc[n][i]＞2*NodeAcc[n][j]

wherein NodeAcc [ n ] [ i ] represents the number of times that the node n accesses the page i;

(6) after the address of the hot page in the virtual machine is converted into the physical address of the physical machine, calling a page transfer function to transfer the hot page of the application program to a target node;

(7) And after the hot page migration is finished, continuing returning to the performance detection module to monitor the performance of the system.

In view of the existing defects of the existing multi-core system, the invention provides a high-throughput low-latency real-time placement scheduling strategy based on a virtualization environment under a NUIOA architecture, so as to improve the performance of an application program running in a virtual machine. Further elaboration is as follows:

The performance detection module monitors hardware information of the virtual machine and the physical host in real time through the modified performance monitoring tool, and the hardware information mainly comprises the times of accessing pages and I/O (input/output) request times of an application program in the virtual machine, the real-time load condition in the physical host and the like.

the thread binding module judges whether the current system is in low load or high load according to the hardware information collected by the performance detection module, and if the current system is in high load, the thread binding module needs to bind the virtual machine thread on the node with higher load to another node, so that the load is reduced.

And the memory migration module can migrate the related threads to the node closest to the network adapter if the current system is under the condition of lower load, so that excessive remote access is avoided, the internal bandwidth occupation among the nodes is reduced, and the throughput of the system is improved.

Compared with the existing modeling method based on the NUMA architecture, the access system and the optimization method have the following advantages that:

(1) The affinity of a network adapter and a processor node is considered, a dimension is added to the traditional modeling method, and the system can better reflect the importance of network equipment in the current high-performance network environment;

(2) Meanwhile, the affinity among multiple VCPU threads of the virtual machine is considered, and the threads with higher affinity are placed on the same node to reduce data communication among nodes;

(3) And a modeling matrix with finer granularity is adopted, an access delay matrix and a thread memory mapping matrix are established to carry out the final placement scheduling decision, and the fine granularity can more comprehensively embody the accuracy of modeling.

The conception, the specific structure and the technical effects of the present invention will be further described with reference to the accompanying drawings to fully understand the objects, the features and the effects of the present invention.

Drawings

FIG. 1 is a schematic diagram of a conventional non-coherent I/O access architecture;

FIG. 2 is a diagram of a conventional remote access without consideration of network card elements;

FIG. 3 is a system architecture diagram of a preferred embodiment of the present invention;

FIG. 4 is a flow chart of an optimization method according to a preferred embodiment of the present invention.

Detailed Description

the following describes embodiments of the present invention in detail with reference to the drawings, which are implemented on the premise of the technical solution of the present invention, and detailed embodiments and specific operation procedures are given below, but the scope of the present invention is not limited to the following embodiments.

As shown in FIG. 4, the method for optimizing a system based on non-uniform I/O access in a virtualized multi-core environment according to the present invention includes the following steps:

step 1, monitoring the access times of the internal pages of the virtual machine and the times of I/O requests in unit time simultaneously through a performance monitoring module. The number of times of the page accessed by each node is recorded by an array, and the number of times of the I/O request of the unit time class can be directly obtained. And simultaneously, the load in the physical host, mainly the load of a CPU and the load of a memory, is monitored in real time.

step 2, when the load of a certain node of the physical machine is higher than a certain threshold (the threshold is related to the configuration of the system), it indicates that the node needs to perform thread migration, and the specific migration method is represented by a formula:

The TMT matrix represents the memory distribution of the threads, the DT matrix represents the access delay between the nodes, and the criterion for migrating a thread T onto a node K instead of onto a node P is that the average delay to a node K is less than the average access delay to a node P.

step 3, when a certain node load of the physical machine is within a normal range (the threshold value is related to the configuration of the system), migrating the accessed hot pages of the thread distributed on the remote node to the local node as much as possible, and performing the migration by using a numactl API, wherein the determination of the hot pages adopts the following formula:

if NodeAcc[n][i]＞2*NodeAcc[n][j]

NodeAcc [ n ] [ i ] represents the number of times that the node n accesses the page i, and the formula shows that the maximum number of times that the node page is accessed is more than twice as large as the second maximum, and the previous pages are all considered to be hot pages. And then after the address of the hot page in the virtual machine is converted into the physical address of the physical machine, the page migration module calls a function move _ pages to migrate the hot page of the application program to the target node. And after the hot page migration is finished, continuing returning to the performance detection module to monitor the performance of the system.

The foregoing detailed description of the preferred embodiments of the invention has been presented. It should be understood that numerous modifications and variations could be devised by those skilled in the art in light of the present teachings without departing from the inventive concepts. Therefore, the technical solutions available to those skilled in the art through logic analysis, reasoning and limited experiments based on the prior art according to the concept of the present invention should be within the scope of protection defined by the claims.

Claims

1. An optimization method based on a non-uniform I/O access system in a virtualized multi-core environment is characterized by comprising the following steps:

if NodeAcc[n][i]＞2*NodeAcc[n][j]

Wherein NodeAcc [ n ] [ i ] represents the number of times that the node n accesses pagei, and the formula shows that the previous page is considered to be a hot page if the maximum number of times that the node page is accessed is more than twice of the second maximum;

2. The non-uniform I/O access system based on the virtualized multi-core environment using the optimization method as claimed in claim 1, comprising a performance monitoring module, a thread binding module and a memory migration module,

the performance detection module is configured to monitor hardware information of the virtual machine and the physical host in real time through the modified performance monitoring tool;

The thread binding module is configured to judge whether the current system is under low load or high load according to the hardware information collected by the performance detection module, and bind the virtual machine thread on the node with higher load to another node with lower load if the current system is under high load;

the memory migration module is configured to migrate the related threads to the node closest to the network adapter if the load of the current system is low, so that excessive remote access is avoided, the bandwidth occupation of the interconnection among the nodes is reduced, and the throughput of the system is improved;

the hardware information comprises the times of accessing pages by an application program in the virtual machine, the times of I/O requests, and real-time CPU load and memory load of the physical host;

the judgment condition for judging whether the current system is in low load or high load is as follows:

the step of migrating the related thread to the node closest to the network adapter refers to migrating an access hot page of the thread distributed on a remote node to a local node, wherein the judgment formula of the hot page is as follows:

if NodeAcc[n][i]＞2*NodeAcc[n][j]

and the judgment formula of the hot page shows that the prior pages are all hot pages when the maximum access times of the node pages are more than twice of the second maximum.