CN101808141B

CN101808141B - Host and client cooperated page swapping method based on virtualized platform

Info

Publication number: CN101808141B
Application number: CN2010101505594A
Authority: CN
Inventors: 陈文智; 陈慧君; 陈小琴; 黄炜
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2010-04-19
Filing date: 2010-04-19
Publication date: 2011-09-21
Anticipated expiration: 2030-04-19
Also published as: CN101808141A

Abstract

The invention discloses a host and client cooperated page swapping method based on a virtualized platform, and the method comprises the following steps: in a shared data area which is distributed to a client by a host, two buffer areas are divided; when a page which belongs to the address space of the client is swapped out, the host writes a corresponding client page number into the buffer areas; when a missing page fault occurs and a page needs to be swapped in, an exit handle function in the host firstly generates a triple marker; and when a virtual address which is handled by a missing page fault handle function is the same as that of the missing page fault in some triple marker and the corresponding page content is in the swap space, and the corresponding client page number in the triple marker is put into a swap-in buffer area. The invention provides the method for obtaining swap-in and swap-out pages by the client dynamically, and by adopting the manner of the cooperation of the host and the client, the problem of double page swap is avoided, and the system performance is enhanced.

Description

A kind of method of the host and client cooperated page swapping based on virtual platform

Technical field

The present invention relates to a kind of information synchronization method, relate in particular to relate to internal memory between host and client computer under the virtual platform and the information needed of skipping synchronously

Background technology

Under virtualized environment, host operating system (Host Operating System) is managed all hardware resources, and provides virtual hardware environment to move for Client OS (Guest OperatingSystems).Have under (Hardware Assisted) virtualized environment of hardware supports, client computer just can directly run in the virtual environment without revising, and does not make mistakes when moving otherwise some responsive instructions that need to revise client computer guarantee client computer.

By Intel Virtualization Technology, on one group of hardware, can move a plurality of client computer simultaneously, thereby improve the utilance of hardware resource, reduced to electric power the demand of resources such as space simultaneously.All client computer are by the host operating system unified management, and guarantee the isolation between them.Because the mutually noninterfere when moving of host and client computer has guaranteed the consistency and the correctness of whole system function, but also caused certain performance issue thus, one of them is exactly the opacity of memory information and the dual problem of skipping that causes thus.

Under traditional single operating system environment, when memory pressure is excessive, operating system can freely be selected some not accessed within a certain period of time memory pages, with their content exchange (Swap Out) (Swap Space) in specific swapace, reclaim these pages then and use to improve the utilance of internal memory for other program.

Operating system for the modern times, such as Linux, those pages that can be swapped out all can pass through least recently used chained list (Least Recently Used Lists, LRU) safeguard, in case memory pressure is excessive, kernel will take off those least-recently-used pages for recovery from these chained lists.When these content of pages that swapped out needed accessed or revise, operating system changed to the content of correspondence the demand that (Swap In) satisfies application program again from swapace.This process is transparent fully concerning user application.

And under virtualized environment, host and client computer are operating system fully independently, and they can freely be selected the page and they are swapped out in the swapace to slow down memory pressure.From they angles separately, their selection all is optimum, but from whole system environment, this local optimum might not cause global optimum, even may cause the whole system performance decrease.This inconsistency comes from the particularity of resource allocation under the virtualized environment.In the virtualized environment, have only host to have complete internal memory control, and in fact the internal memory that client computer has is distributed by the host, exist host it seems in the client computer use, follow the internal memory of domestic consumer's process not have difference.Like this, host just may swap out and distribute to that part of internal memory of client computer, keeps the transparency to client computer simultaneously.And client computer may just be chosen by host and exchanges to those pages in the swapace, want they are swapped out in its swapace and reclaim to carry out internal memory, at this moment, host has to earlier these page swap-ins be come, and then by client computer this page is swapped out.This process does not have any help to the memory pressure of alleviating whole system, many twice extra disk operating, thus cause systematic function to descend.This performance bottleneck is called as the dual problem of skipping (Double PagingAnomaly).

Its process example as shown in Figure 1, GFN is that 1 client computer page-map is 5 physical page to MFN during beginning.Because internal memory is in short supply, host is selected this physical page is swapped out in the exchange partition.And client computer be owing to can't perceive this operation, and selecting again GFN is that 1 the page is swapped out in its exchange partition.At this moment, host is had to change to this physical page in the internal memory earlier and is set up corresponding mapping relations from exchange partition, allows client computer carry out the operation of skipping again again.At this moment, the operation of skipping of client computer could really be carried out.Can see that in this process, real occurrence in fact just is written to the content of a physical page in the clients exchange subregion from the host exchange partition, to the alleviation of Installed System Memory pressure then without any help.

When system continued dual skipping to occur, overall performance can sharply descend, and therefore, needed an effective method to avoid the generation of the dual problem of skipping, and will guarantee that simultaneously the modification of being made can not influence the performance of original system.

Summary of the invention

The invention provides and a kind ofly can solve the dual problem of skipping, improve the method for skipping of virtual system performance.

The inventive method is based on the producer-consumer (Producer-Consumer) model, by the operation of skipping of host supervisory control system, and the information of correspondence is injected into client computer, the guiding client computer is no longer carried out swap operation to the page that swaps out, has effectively avoided the generation of the dual problem of skipping.Under this model, host serves as the producer's role, and client computer then is the consumer, and its special character has been two production-consumption buffering areas, promptly changes to buffering area with swapping out buffering area.

A kind of based on the host of virtual platform, the method for client cooperated page swapping, comprise the steps:

Internal memory of step a) client computer application is as sharing data area (shared drive), marks off two buffering areas in host is distributed to this piece sharing data area of client computer, and one of them is the buffering area that swaps out, and another is for changing to buffering area.

Described two buffering areas are ring-like buffering areas, are used to deposit client computer page number (Guest Frame Number, the GFN) sequence that is swapped out and changed to by host.

In order to write down the buffering area that swaps out, to change to the record case in the buffering area, information (comprise two ring-like buffering areas size separately, each buffering area is when information such as fore head and tail pointer sensing positions) according to the current shared data field generates control information head (ControlHeader) in sharing data area.

The foundation of sharing data area: because the internal memory of host application may be in not distributing to the memory range of client computer, and the internal memory of distributing to client computer is not physically continuous probably, so the foundation of sharing data area is at first from client-side.Client computer is at first applied for one section continuous internal memory (continuous from the client computer angle), behind the territory of correspondence, by hypercall the initial GFN of this section internal memory is informed to host in the populated control head.Host finds the actual corresponding machine physical page number of each GFN by mapping relations, and (Machine Frame Number MFN), does as a whole being mapped on one section continuous logical address to them, with convenient visit shared data.So far, host and guest both sides can have access to same shared data.

The monitoring of step b) host is to the operation of skipping of client computer, and when the page that belongs to the client address space was swapped out, host was just write the buffering area that swaps out to corresponding client machine page number GFN;

The interior information that writes down of the buffering area that swaps out is also carried out record in the control information head;

Client computer is monitored the buffering area that swaps out, when in the buffering area that swaps out the page number that is recorded being arranged, just its corresponding page is stashed from being used for safeguarding can the be swapped out LRU chained list of the page, make the page in the client computer reclaim function and can't select these pages are swapped out.Discharge this page number in the buffering area that swaps out simultaneously.

Because host is before being swapped out to swapace to the content of a page practically, must remove the mapping relations of all virtual addresses that are mapped to this page (Virtual Address) earlier, whether as long as we judge has in these virtual addresses belongs to the client address space, just can judge the page-map whether client computer is arranged and arrive this physical page, if any, host just can be obtained the GFN of the client computer page by corresponding mapping table, and it is put in the buffering area that swaps out.

In case sharing data area is set up, host just begins to monitor the operation of skipping to client computer, in case there is the page that belongs to the client address space to be swapped out, host is just write the buffering area that swaps out to the GFN of correspondence, and at client computer one end, regularly check the situation of shared data, when in it finds to swap out buffering area, the GFN that is recorded being arranged, just these GFN corresponding page are stashed from the LRU chained list, discharge this page number in the buffering area that swaps out simultaneously, the GFN that in buffering area, is not recorded.

Because the result to buffer records, freeing of page number all is recorded in the control information head, so client computer one end can be realized the monitoring to the situation of shared data by checking the control information head.

After certain GFN corresponding page stashed from the LRU chained list, the page in the client computer reclaimed function owing to can't see these pages, also just can't select they are swapped out with the recovery internal memory, thereby fundamentally avoid dual generation of skipping.And find to change to when still consumable GFN being arranged in the buffering area when consumer's process, just corresponding page is put back in the original LRU chained list, like this, these pages just can be reclaimed by page collection process once more, thereby have avoided the leakage of internal memory.

In the prior art when host or operating system select a page to swap out, can remove all mappings of physical page hereto earlier, this is actually page table entry (Page Table Entry corresponding in these address spaces is set, PTE), remove its presence bit (position 0, the P position), represent the position of this page content with other position then at swapace.Like this, when this page of visit, in fact will trigger the mistake that skips leaf (Page Fault).In traditional operating system, the mistake that skips leaf of this moment can directly be given the processing function that skips leaf and be handled, and in virtualized environment, produce the mistake or other reason causes control to be withdrawn into host from client computer of skipping leaf no matter be, what at first handle is that in the host one withdraws from and handles function (Exit Handler), judging the reason that withdraws from from client computer when it is when skipping leaf mistake, just can give the processing function that skips leaf mistake and go to handle.

Utilize this characteristic, and consider before mistake being given the function that skips leaf and handling contingent scheduling and the consequent extra mistake that skips leaf, the inventive method withdraw from handle at first generate in the function tlv triple mark (this time be to produce the mistake that skips leaf to be withdrawn into and to withdraw from when handling function, rather than the page is when swapping out), and then give to skip leaf and handle function and handle.

Step c) skips leaf mistake need change to the page time, control is given back host, withdrawing from the host handled function and at first generated a tlv triple mark (va, gfn, guest), wherein va is the wrong virtual address that skips leaf, and gfn is a corresponding client machine page number, and guest is a client identifier;

In real running environment, the reason that causes client computer to withdraw from also has a lot, such as carrying out io operation etc.Withdrawing from and handling function to be to skip leaf just can write down this tlv triple mark after the mistake judging the reason that withdraws from from client computer.

Handle function by skipping leaf and handle the mistake that skips leaf, handle the handled virtual address of function when skipping leaf with the identical and corresponding page or leaf content of the wrong virtual address that skips leaf in certain tlv triple mark on swapace the time, then the corresponding client machine page number in this tlv triple mark is put into and changed in the buffering area;

Because after the page is swapped out, corresponding data are stored on the swapace of host, if certain virtual address had both appeared in certain tlv triple mark, and corresponding page or leaf content then illustrates it is the corresponding page change operation that produces from client computer on swapace.

Client computer monitoring changes to buffering area, when finding to change to when still consumable GFN being arranged in the buffering area, just corresponding page is put back in the original LRU chained list, and like this, these pages just can be reclaimed by page collection process once more, thereby have avoided the leakage of internal memory.

Need to prove, though might other client computer also produce the mistake that skips leaf in identical address, cause host that gfn is also write changing in the buffering area of these client computer, this is relation not in fact.Because one comes this probability very low, even if this happens, also it doesn't matter.Because consumer's process of client computer can judge whether each GFN that changes to has before seeing and be swapped out that it's not true can directly ignore it, even if having, it also just in advance is put back into it in the LRU chained list, to the correctness of the overall situation less than influencing.

The task of client computer consumer's process is that the GFN in two ring-like buffering areas is handled.For the GFN in the buffering area that swaps out, consumer's process can be checked its corresponding page earlier whether in the LRU chained list, words just consider it is removed from the LRU chained list, add in another maintainable formation.For the page in the LRU chained list not, consumer's process directly ignorance it because these pages are the kernel page mostly, client computer itself can not exchange them.For the GFN that changes to buffering area, consumer's process can be judged corresponding page earlier whether in maintainable formation equally, and the words that are just can be put back into it in the original LRU chained list, otherwise do not carry out any processing.

The present invention proposes the producer-consumer's model and carries out the information injection of unidirectional host to client computer, proposed dynamically to obtain the method that client computer changes to the page that swaps out, by host and the mutual coordinated mode of client computer, avoid the generation of the dual problem of skipping, improved the performance of system.

Description of drawings

Fig. 1 is the dual schematic diagram that skips in the prior art;

Fig. 2 is the illustraton of model of the inventive method.

Embodiment

The present invention realizes on KVM.Because KVM is based on the virtual machine of kernel, so the mechanism that kernel provides can be used easily, here, and the MMU Notifier mechanism that we have mainly used kernel to provide.By MMU Notifier, host can obtain the GFN of the client computer page that swaps out easily.

Wherein to have adopted the kernel version be the KVM of 2.6.31 to host, and client computer then is FC9, and with its kernel upgrading to 2.6.31.At first move KVM, distribute to the internal memory of its 512MB, move two client computer then, internal memory all is 300MB, then allows two client computer move Sysbench (performance testing tool) simultaneously, and the block size that the Sysbench operational factor is set is 500MB.

Mark off two buffering areas in the sharing data area of client computer application, one of them is the buffering area that swaps out, and another is for changing to buffering area.Information according to the current shared data field generates control information head (Control Header) in sharing data area.

The concrete composition such as the code of control information head are as follows:

struct?rw_control{

U8 lock; // lock is used for synchronously

The page number that u32 total_pages // whole sharing data area takies

U64 magic; // magic number

U32 out_off; // skew of buffering area with respect to the shared data starting point swaps out

U32 out_size; The size of // the buffering area that swaps out

U32 out_head; The current head pointer of // buffering area that swaps out

U32 out_tail; The current tail pointer of // buffering area that swaps out

U32 in_off; // change to the skew of buffering area with respect to the shared data starting point

U32 in_size; // change to the size of buffering area

U32 in_head; // change to the current head pointer of buffering area

U32 in_tail; // change to the current tail pointer of buffering area

}；

In case sharing data area is set up, host just begins to monitor the operation of skipping to client computer, in case there is the page that belongs to the client address space to be swapped out, host is just write the buffering area that swaps out to the GFN of correspondence, and out_head added 1, promptly produce a GFN.

And, regularly check the situation of shared data at client computer one end.In it finds to swap out buffering area, consumable GFN is arranged still, when promptly out_head follows out_tail unequal, just these GFN corresponding page are stashed from the LRU chained list, and correspondingly increase out_tail till following out_head identical.Like this, the page in the client computer reclaims function owing to can't see these pages, also just can't select they are swapped out with the recovery internal memory, thereby fundamentally avoid dual generation of skipping.

When client computer will be visited the page that is swapped out once more, can produce and skip leaf mistake and be withdrawn into host, handle function and handle by withdrawing from.Withdraw from the processing function and at first generate a tlv triple mark, and then give to skip leaf and handle the function processing, described tlv triple mark (va, gfn, guest) this is represented, wherein va is the wrong virtual address that skips leaf, gfn is a corresponding client machine page number, and guest is a client identifier.

Handle function by skipping leaf and handle the mistake that skips leaf.Handle the handled virtual address of function when skipping leaf with the identical and corresponding page or leaf content of the wrong virtual address (va) that skips leaf in certain tlv triple mark on swapace the time, the page change operation that just can judge that Here it is and produce from client computer guest is then put into the corresponding client machine page number (gfn) in this tlv triple mark and is changed in the buffering area.And find to change to when still consumable GFN being arranged in the buffering area when client computer, just corresponding page is put back in the original LRU chained list, like this, these pages just can be reclaimed by page collection process once more, thereby have avoided the leakage of internal memory.

Can see that coordinate mutually by host and client, systematic function is doubled the running time of adopting the present invention front and back shown in table-1.And can see that by table-2 in this process, the disk transmission quantity has reduced more than 50%.The particularly important is, reduced more than 80% by the disk operating number that changes to, and considered that host in this process owing to the operation of skipping that self needs carry out, can infer, after having used the method for cooperated page swapping, the dual problem of skipping obtains more perfectly solving.

Table-1 Sysbench comparison running time (unit: second)

	Client computer 1	Client computer 2
			Before using cooperated page swapping	99.3	96.7
After using cooperated page swapping	49.4	47.7

Table-2 Sysbench running disk operating are (unit: MB) relatively

	Disk writes sum	Sum swaps out	Disk reads sum	Change to sum
					Before using cooperated page swapping	1670	1080	920	800
After using cooperated page swapping	1000	420	240	130

Claims

1. one kind based on the host of virtual platform, the method for client cooperated page swapping, it is characterized in that, comprises the steps:

Internal memory of step a) client computer application marks off two buffering areas as sharing data area from sharing data area, one of them is the buffering area that swaps out, and another is for changing to buffering area;

When step b) was swapped out as the page that belongs to the client address space, host was write the buffering area that swaps out with corresponding client machine page number;

The client computer monitoring buffering area that swaps out, when in the buffering area that swaps out the page number that is recorded being arranged, just with its corresponding page from being used for safeguarding that the LRU chained list of interchangeable page-out stashes, this page number in the buffering area that swaps out of release simultaneously;

Step c) skips leaf mistake need change to the page time, and withdrawing from the host handled function and at first generated tlv triple mark (va, a gfn, guest), wherein va is the wrong virtual address that skips leaf, and gfn is a corresponding client machine page number, and guest is a client identifier;

Client computer monitoring changes to buffering area, when finding to change to when in the buffering area page number that is recorded being arranged still, just corresponding page is put back into the LRU chained list that is used for safeguarding the page that can swap out.

2. the method for claim 1 is characterized in that, described two buffering areas are ring-like buffering areas.

3. method as claimed in claim 2 is characterized in that, generates the control information head in described sharing data area, and buffering area is included in wherein with the descriptor that changes to buffering area for swapping out.