CN104615495B

CN104615495B - Optimize the method for network throughput in virtual embedded network environment

Info

Publication number: CN104615495B
Application number: CN201510044195.4A
Authority: CN
Inventors: 姚建国; 程书欣; 邓婷; 管海兵
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2015-01-28
Filing date: 2015-01-28
Publication date: 2018-05-01
Anticipated expiration: 2035-01-28
Also published as: CN104615495A

Abstract

The present invention provides a kind of method for optimizing network throughput in virtual embedded network environment, multiple hypercall are aggregated into a hypercall by it by polymerizeing timer, polymerizeing interval time can be according to the time that next I/O of prediction asks to reach from main regulation, so as to significantly reduce the number of client computer and the switching of host context, substantial amounts of cpu resource is saved to handle more network I/O, while the delay of data transfer can be reduced under low network traffic conditions.By the experiment of inventor it can be found that compared with conventional method, AHC methods can make the handling capacity of Netperf, Apache and Memcached be respectively increased 221.14%, 146.23% and 257.42%.

Description

Optimize the method for network throughput in virtual embedded network environment

Technical field

The present invention relates to embedded system, and in particular, to optimizes network throughput in virtual embedded network environment Method and autonomous hypercalls polymerization.

Background technology

Embedded system is to perform the dedicated system of a small number of tasks, and different from traditional x86 platforms, embedded environment has work( Low, the features such as computing capability is weaker is consumed, there is different performance models.Embedded system needs frequently to carry out with physical world Frequently interaction, but due to the resource constraint of embedded system, the performance that I/O is virtualized under embedded type virtual environment becomes The important technology index of embedded type virtual technology.In embedded type virtual environment, the performance of I/O virtualizations needs to ensure The overall performance of hardware device is stablized in certain spreading range, disclosure satisfy that the resource requirement of each virtual machine.

In traditional system structure, I/O equipment is connected in pci bus, device PCI by pci configuration space and Device address space is interacted with operating system, and feedback operation system is notified by interrupt mechanism.In virtualized environment, software By intercepting and capturing Guest PIO and MMIO Request Interceptions, upper strata is transmitted to by hardware virtualization extension；Again will knot after simulation Fruit is fed back in Guest by interrupting.

There are the I/O device virtualization schemes of three kinds of mainstreams at present, for realizing the virtualization of I/O equipment：First, complete void Planization mode, I/O equipment are simulated by virtual machine platform completely, its driving need not be changed.Virtual machine platform is to virtual machine Fully transparent, virtual machine only needs to call the device drives in native system, directly operates the virtual hardware of lower floor, specifically refers to Order is absorbed in be completed to explain and is performed by VMM.The drawbacks of I/O Full-virtualization modes are maximum switches with being processor level of privilege, Switch i.e. from Guest OS to VMM, between the I/O processes of VMM to analog subscriber program, seriously affected virtualization efficiency, need Consume the substantial amounts of cpu instruction cycle；Dedicated virtual machine facility is integrated second, half virtualization mode, in virtual machine system to drive Dynamic, the virtual unit provided in virtual machine need to be used by accessing I/O equipment.Due to it, to be driven to virtual unit special, so client Machine system needs to modify.When application program proposes access request, front and back end driving is common to coordinate completion to access.Half is virtual The I/O virtualizations of change mode possess preferable performance, but it is disadvantageous in that virtual machine needs to change system drive, realizes The correspondence of front and back end driving；Third, the virtualization of hardware supported, by the support of hardware, virtual machine system can be authorized in VMM Afterwards, forwarded without the request of VMM, realize and the direct of hardware device is accessed.The virtualization mode of hardware supported is following virtual Change a direction of development, but it is widely applied the exploration and development for still depending on hardware device.

1.1、KVM/ARM

The ARM architecture past is considered (the G.J.Popek and that can not be virtualized in instruction set R.P.Goldberg.Formal Requirements for Virtual izable Third Generation Architectures.Communications of the ACM,17(7):412-421, July 1974), ARM is newest Hardware virtualization extension is introduced in ARMv7 and ARMv8 frameworks, including CPU, memory, interruption and timer are virtualized Hardware supported extends.The Virtualization Study of embedded system, Christoffer Dall and Jason Nieh etc. has been carried out The research of system, the virtualization solution KVM/ARM, KVM/ARM for realizing first ARM system have been incorporated into Linux In 3.9 kernels.

KVM/ARM is based on solution KVM under existing x86 platforms, and compared with x86 platforms, ARM platforms lack standard The hardware automatic identification of BIOS or pci bus, and Linux is designed to run on almost all of ARM platforms.By with KVM is combined, and what KVM/ARM can be without modification operates on each ARM platforms, this is contrasted with Xen ARM：Xen ARM Run directly in hardware layer, it is necessary to write different platform codes for each different ARM platforms.

The extension of ARM virtualization hardwares is designed to independent operating in the virtual machine monitor of hardware, therefore KVM/ARM The design of clastotype is employed, which causes KVM to utilize existing KVM codes, and ARM can be made full use of virtual Change the characteristic of hardware expanding, framework is as shown in Figure 1.

It is two modules of Highvisor and Lowvisor that KVM/ARM, which is divided to, and Highvisor operates in PL1 ranks, i.e. host In machine kernel, it is responsible for integrated and utilizes existing KVM codes, such as can directly uses more mature in Linux Kernel Scheduler module, can use the data structure of Linux Kernel Plays；Lowvisor operates in CPU PL2 ranks, i.e. Hyp Rank, Lowvisor make full use of the characteristic that ARM virtualization hardwares extend, configure correct performing environment.The separation of KVM/ARM Pattern allows it to may operate in not modified kernel.

1.2、VirtIO PV Driver

VirtIO is the mechanical floor on equipment in half virtualization solution, is the pumping to general-purpose simulation equipment As；Client OS and virtual machine monitor (Hypervisor) are connected by Application Programming Interface.

The driver framework of Virtio is abstracted as shown in Fig. 2, VirtIO frameworks include front-end driven program and rear end is driven Dynamic program, its front-end driven program include block device, the network equipment, device PCI and balloon drivers, each front end Driver includes a corresponding rear end program, and the software level of VirtIO is as shown in figure 3, wherein Virtio-net is provided The interface that the network equipment is shared, efficient I/O channel is provided for Guest and Qemu, using the data transfer mode of shared drive as Guest provides efficient network I/O performance.

In VirtIO network virtualizations, virtual network device maintains the data structure of a vring, to carry out data Transmitting-receiving and configuration space access.When carrying out network I/O, virtio-net front-end drivens are write client computer with reading and writing data mode Vring queues, then notify host by virtio-pci hardware registers, which is known as kick operations.Host is cut Notification information and the read-write requests from vring queues are obtained, result is added to vring queues after being disposed and is sent interrupts to Client computer.The flow chart of the process from the process as shown in figure 4, can be seen that when client computer sends data every time, it is necessary to perform Kick operation, is switched to virtual machine monitor KVM, then switch back into client computer simultaneously from virtual machine monitor from client computer Triggering is interrupted.

1.3rd, network performance evaluation

An important index is the handling capacity of network in network performance evaluation, handling capacity be it is per second in can send/pass Defeated how many byte data.Inventor is tested network throughput in the case of being directed to different virtual machine quantity, specific side Method is to run a certain number of virtual machines, and bulk data transfer test is carried out using netperf applied in network performance test instrument, should Test result reflection is that a system can be connect with speed sending data how soon, another system with speed how soon Receive data.In experimentation, inventor have recorded the quantity of client computer and the handling capacity of network, the pass between CPU usage System, the results are shown in Figure 5 for it.

From experimental result as can be seen that with the increase of virtual machine quantity, for cpu busy percentage close to saturation, network is overall Handling capacity keeps stablizing.It is fully saturated that this bottleneck shows as number of clients cpu busy percentage in 2 and the above, that is, Say, system can not promptly handle more data packets, and most cpu clock is all used for the treatment of network data receipts Hair, in this process, expense predominantly caused by client computer and host switching (VM Exit).

The content of the invention

For in the prior art the defects of, optimize the object of the present invention is to provide one in virtual embedded network environment The method of network throughput.This method has used for reference network interface card interruption restraining technology, and it is hardware network interface card aspect that network interface card, which interrupts restraining technology, In a kind of technology.In the case of network busy, the expense for the CPU contexts switching that band of discontinuance comes is very big, and CPU easily reaches To interrupting at full capacity, produce interruption and flood phenomenon so that CPU becomes bottleneck, and network interface card interrupts the strategy that restraining technology is taken Not produce interruption immediately after data are received, but produce interrupt requests again after the regular hour；This technology is kept away Exempt from frequently to interrupt and take cpu resource and significantly reduce cpu busy percentage, optimize network throughput.

A kind of autonomous hypercalls polymerization provided according to the present invention, multiple hypercalls hypercall is passed through Polymerization timer is aggregated into a hypercalls hypercall, to reduce the execution number of hypercalls hypercall.

Preferably, include the following steps：

Step A：Polymerization timer is set in VirtIO front-end driven programs, wherein, the timing of the polymerization timer The time that device starts timing is T₀, polymerization interval time is T；

Step B：T is set₀For current time, the time reached using the next request of equation below prediction：

t_next=2t_cur-t_pre

Wherein：t_nextRepresent the time that next request reaches, t_curRepresent the time that current request reaches, t_preBefore expression The time that one request reaches；

Step B：Judge to polymerize whether timer has been turned on：

If-polymerization timer has been started up,：If t_next＞ T+T₀, then polymerization timer is just stopped, otherwise without adjusting With hypercalls hypercall；

If-polymerization timer not yet starts,：If t_next＞ T, then call hypercalls hypercall, from client computer Virtual machine monitor is switched to, then terminates this time to call；Otherwise, then T is allowed₀=t₀And start polymerization timer, without calling Hypercalls hypercall, further, if t_next＞ T+T₀, then just stop polymerization timer, otherwise need not call super tune Use hypercall；

Step C：Judge to polymerize timer terminate when whether if timing terminates either to be stopped polymerization timer or It is stopped, then just calls hypercalls hypercalll.

The method of network throughput is provided in a kind of virtual embedded network environment provided according to the present invention, including such as Lower step：

Step 1：When client computer sends data packet, client computer kernel calls VirtIO front-end driven programs to write data In vring queues；

Step 2：Using the autonomous hypercalls polymerization described in claim 1 or 2, reached according to previous request The time that time and current I/O requests reach, judge whether to carry out context switching immediately, give control to place by client computer Host；To carry out context switching immediately, then step 3 is continued to execute；If context switching cannot be carried out immediately, one is waited A polymerization interval time, then carry out step 3；

Step 3：Client computer performs hypercalls hypercall notice hosts, and client computer is switched to virtual machine monitoring journey Sequence, virtual machine monitor take out data from vring queues, and result is added in vring queues and transmission after being disposed Break to client computer；

Step 4：This time call and terminate.

Compared with prior art, the present invention has following beneficial effect：

The invention mainly relates to an inventive point be exactly used for reference network interface card interrupt restraining technology, in VirtIO front-end drivens Autonomous hypercalls (hypercall) polymerization is introduced in program --- AHC algorithms.It is multiple by the effect of the inventive point Hypercall is aggregated into a hypercall by polymerizeing timer, and polymerization interval time can be according to the next of prediction The time that I/O requests reach from main regulation, so as to significantly reduce the number of client computer and the switching of host context, is saved Substantial amounts of cpu resource can reduce prolonging for data transfer to handle more network I/O under low network traffic conditions Late.By the experiment of inventor it can be found that compared with conventional method, AHC methods can make Netperf, Apache and The handling capacity of Memcached is respectively increased 221.14%, 146.23% and 257.42%.

Brief description of the drawings

Upon reading the detailed description of non-limiting embodiments with reference to the following drawings, further feature of the invention, Objects and advantages will become more apparent upon：

Fig. 1 is KVM/ARM Organization Charts.

Fig. 2 is the driver framework abstract graph of VirtIO.

Fig. 3 is the software hierarchy chart of VirtIO.

Fig. 4 is the work flow diagram of VirtIO.

Fig. 5 is the graph of a relation between the quantity of client computer and the handling capacity of network.

Fig. 6 be introduce AHC algorithms after Virtio-net workflow.

Fig. 7 is the comparison of the hypercall calling rates of conventional method and AHC methods.

Fig. 8 is AHC algorithms to Netperf, the raising of Apache and Memcached performances.

Fig. 9,10 and 11 are in the case of varying number virtual machine, carry out Netperf respectively, Apache and During Memcached benchmark tests, the comparison of the handling capacity of conventional method and AHC methods.

Embodiment

With reference to specific embodiment, the present invention is described in detail.Following embodiments will be helpful to the technology of this area Personnel further understand the present invention, but the invention is not limited in any way.It should be pointed out that the ordinary skill to this area For personnel, without departing from the inventive concept of the premise, various modifications and improvements can be made.These belong to the present invention Protection domain.

It is more valuable compared to cpu resource under x86 environment in embedded type virtual environment, client when network I/O is frequent The switching of the context of machine and host consumes substantial amounts of cpu resource, therefore inventor has used for reference network interface card interruption restraining technology, together When in view of reducing the delay of data transfer under low network traffic conditions, design AHC (autonomous Hypercall polymerization) and calculate Method, introduces AHC algorithms, client computer is when carrying out network data I/O, by data buffer storage to team in VirtIO front-end driven programs The time and the time of current I/O request arrival then reached in row by AHC algorithms according to the requests of I/O before this, judgement is immediately Context switching is carried out, gives control to host, or carries out context switching again after the regular hour, notifies place Host obtains data from buffer queue.The VirtIO network virtualization operation principles for introducing AHC algorithms are as shown in Figure 6.

The VirtIO network virtualization work steps for introducing AHC algorithms (autonomous hypercalls aggregating algorithm) are as follows：

1) when client computer sends data packet, client computer kernel calls VirtIO front-end driven programs to write data on Vring queues；

2) time that the time and current I/O requests that AHC algorithms are reached according to the requests of I/O before this reach, judge whether to stand Context switching is carried out, gives control to host.If can switch immediately, that is, continue to execute below step；If it cannot cut Change, then wait a polymerization interval time, then carry out following step；

3) client computer performs hypercall notice hosts, and client computer is switched to virtual machine monitor, virtual machine monitoring Program takes out data from vring queues, and result is added to vring queues after being disposed and sends interrupts to client computer.

4) this time call and terminate.

Inventor by substantial amounts of it has been observed that the time interval that most of requests reach in network application is almost equal, Ask the situation of arrival rate sudden change seldom.Accordingly, inventor proposes a basic assumption of AHC algorithms：Request is arrived Equal up to time interval, the time reached according to current request can predict the time that next request reaches.Request burst Situation is seldom, therefore the error prediction of next request arrival time does not interfere with overall performance.The principle of AHC algorithms is as follows (parameter is as shown in table 1)：

Table 1

Variable	Meaning
		t_pre	The time that previous request reaches
t_cur	The time that current request reaches
		t_next	The time that next request reaches
t₀	Current time
		T	It polymerize interval time

T₀	Timer starts the time of timing

Input：t_pre, t_cur

Output：Whether hypercalls hypercall is called

It is as follows to perform step：

1) t is set₀For current time, formula t is utilized_next=2t_cur-t_prePredict the time that next request reaches；

2) if polymerization timer has been started up, step 4) is jumped to；If polymerization timer not yet starts, step is performed It is rapid 3)

If 3) t_next＞ T, calling hypercall, (hypercalls, can only be called by kernel, and application program is nothing What method directly invoked), virtual machine monitor is switched to from client computer, then terminates this time to call；Otherwise T is allowed₀=t₀And start Polymerization time interval timer, without calling hypercall；

If 4) t_next＞ T+T₀With regard to stopping timer, hypercall otherwise need not be called；

If 5) terminate or be stopped during timer, hypercalll is just called.

Using this algorithm, multiple hypercall is aggregated into a hypercall by polymerizeing timer, reduces The execution number of hypercall, improves network performance during high workload load, in this case, CPU is fully used. Meanwhile polymerization interval time is dynamic regulation according to the prediction of next request arrival time, in low network traffic conditions The delay of data transfer can be reduced down, wherein it is possible to according to " prediction of next request arrival time ", to adjust " polymerization Interval time ", if the arrival time of next request of prediction in " polymerization interval time in ", waited, do not held immediately Row hypercalls, but multiple hypercalls are aggregated into one；If just it is not immediately performed super in " polymerization interval time in " Level is called.Can be according to current " polymerization interval time " and the relation of " prediction to next request arrival time ", to adjust " polymerization interval time ", so as to determine to polymerize how many a hypercalls.

Inventor will be described further by a specific embodiment below.To make the purpose of the present invention, technical side Case and a little clearer, below in conjunction with attached drawing and a specific embodiment, the present invention is described in further detail.

Fig. 6-11 is the specific embodiment of the present invention, wherein：

In the present embodiment, inventor, which has used, is furnished with 5250 processors of Exynos (double-core 1.7GHz, Cortex A15CPU), the Samsung chromebook of 800MHz, 2GB DDR3 memories, system use the Ubuntu of 12.04 versions, net Network performance testing tool uses netperf, Apache and Memcached, runs 6 virtual machines, carries out data transmission surveying respectively Examination.Due to the limitation of memory, when the quantity of virtual machine is more than 6, excessive virtual machine can be automatically closed in host, so this In inventor only run 6 virtual machines.

In the specific embodiment, the job step of the Virtio-net of autonomous hypercalls polymerization (AHC methods) is introduced It is rapid as follows：

Step 1, set and polymerize the time T that timer starts timing in VirtIO front-end driven programs₀, for subsequently walking According to formula t in rapid_next＞ T+T₀Judge whether to stop timer；

Step 2,6 virtual machines each send TCP (UDP) data packet；

Step 3, when network I/O asks to reach virtio-net driving layers, virtual machine kernel calls VirtIO front ends Driver writes data on vring queues；

Step 4, judge to polymerize whether timer starts：

If-polymerization timer does not start, when just checking time that next request may reach whether beyond polymerization Between the scope that is spaced：If it does, being just immediately performed hypercalls hypercall, polymerization timer is started without, otherwise Start polymerization timer, waiting timer failure or stopping；

If-polymerization timer has been turned on, just check time that next request may reach whether in time interval model In enclosing：If timer expiration or stopping are being waited for, is otherwise stopping timer.

Step 5, after timer expiration or stopping, virtual machine performs hypercall notice hosts；

Step 6, virtual machine is switched to virtual machine monitor, and virtual machine monitor takes out data, place from vring queues Result is added to vring queues after reason and sends interrupts to client computer；

Step 7, this time call and terminate.

As seen from Figure 7, AHC methods greatly reduce the number that each second, hypercall was called, and are about as much as The 10% of conventional method, so as to reduce cpu load, saves the cpu clock cycle, improves performance.As seen from Figure 8, AHC Method can make the performance of Netperf, Apache and Memcached be respectively increased 221.14%, 146.23% and 257.42%. Such as Fig. 9,10 and 11, transverse axis is the number of virtual machine, and number axis is network throughput (MB/s), it can be seen that when an only void During plan machine, the performance of the performance ratio AHC of conventional method is good, this is because in the case of an only virtual machine, AHC methods are not Cpu resource can be fully utilized, while cpu load is reduced, performance cost can be caused；And when the number increase of virtual machine, For conventional method, the network throughput of AHC is greatly improved.

The specific embodiment of the present invention is described above.It is to be appreciated that the invention is not limited in above-mentioned Particular implementation, those skilled in the art can make various deformations or amendments within the scope of the claims, this not shadow Ring the substantive content of the present invention.

Claims

1. a kind of autonomous hypercalls polymerization, it is characterised in that by multiple hypercalls hypercall by polymerizeing timing Device is aggregated into a hypercalls hypercall, to reduce the execution number of hypercalls hypercall；

The autonomous hypercalls polymerization, includes the following steps：

Step A：Polymerization timer is set in VirtIO front-end driven programs, wherein, the timer of the polymerization timer is opened The time of beginning timing is T₀, polymerization interval time is T；

t_next=2t_cur-t_pre

Wherein：t_nextRepresent the time that next request reaches, t_curRepresent the time that current request reaches, t_preRepresent previous Ask the time reached；

Step C：Judge to polymerize whether timer has been turned on：

If-polymerization timer has been started up,：If t_next＞ T+T₀, then just stop polymerization timer, otherwise, then carry out etc. Treat, be not immediately performed and call hypercalls hypercall；

If-polymerization timer not yet starts,：If t_next＞ T, then call hypercalls hypercall, switches from client computer To virtual machine monitor, then terminate this time to call；Otherwise, then T is allowed₀=t₀And start polymerization timer, it is super without calling Hypercall is called, further, if t_next＞ T+T₀, then just stop polymerization timer, otherwise, then waited, not immediately Perform and call hypercalls hypercall；

Step D：Judge whether if timing terminates to terminate or stopped when being either stopped polymerization timer polymerization timer Only, then hypercalls hypercalll is just called.

2. optimize the method for network throughput in a kind of virtual embedded network environment, it is characterised in that include the following steps：

Step 1：When client computer sends data packet, client computer kernel calls VirtIO front-end driven programs to write data on Vring queues；

Step 2：Using the autonomous hypercalls polymerization described in claim 1, time for being reached according to previous request and The time that current I/O requests reach, judge whether to carry out context switching immediately, give control to host by client computer； To carry out context switching immediately, then step 3 is continued to execute；If context switching cannot be carried out immediately, one is waited to gather Interval time is closed, then carries out step 3；

Step 3：Client computer performs hypercalls hypercall notice hosts, and client computer is switched to virtual machine monitor, empty Intend machine monitoring program and take out data from vring queues, result is added to vring queues after being disposed and sends interrupts to visitor Family machine；

Step 4：This time call and terminate.