KR20150007666A

KR20150007666A - Virtualization device and method for merging memory page thereof

Info

Publication number: KR20150007666A
Application number: KR1020130081957A
Authority: KR
Inventors: 엄영익; 이세호; 김인혁
Original assignee: 성균관대학교산학협력단
Priority date: 2013-07-12
Filing date: 2013-07-12
Publication date: 2015-01-21

Abstract

The present invention provides a virtualization apparatus. The virtualization apparatus includes a memory, wherein the memory is loaded with at least one guest and a host, and the guest is loaded from a disk image in page units. When a duplicate page which may be shared by the guest exists among cleans pages stored in the cache of the host, the host leaves only one shared page and deduplicates the duplicate page. The host searches the duplicate page using a merge tree and a candidate table, wherein the merge tree is a red-black tree (RB tree) and the candidate table is a hash table.

Description

TECHNICAL FIELD [0001] The present invention relates to a virtualization apparatus,

The present invention relates to a virtualization apparatus and a method for merging a memory page thereof.

Virtualization technology is widely applied to various fields such as server virtualization, desktop virtualization, and cloud computing. However, resource management is becoming an important issue because of the unique use of single computing resources. In particular, techniques for effectively using memory resources have been studied.

For example, Kernel Virtual Machine (KVM), which can be used to adopt Linux as a host operating system, uses memory de-duplication technology called KSM (Kernel Shared Memory or Kernel Samepage Merging).

This is because, due to the nature of the operating system, there are cases where multiple guests have memory pages of the same content, respectively. Such de-duplication, or merge, To share, and to reduce the memory space required.

KSM mainly performs deduplication by scanning a page for an entire anonymous memory area. When page scanning is performed on an entire anonymous page, scanning is unnecessarily performed entirely, resulting in low efficiency and a problem of low duplicate removal rate.

In fact, in environments where multiple virtual machines are running, the same pages are known to be loaded from disk, that is, pages stored in the host's page cache.

Therefore, in order to increase the efficiency of memory de-duplication, a memory merge method is required to scan the page cache of the host to retrieve the page to be de-duplicated.

In the context of the present invention, Korean Patent Laid-Open No. 10-2013-0070501 ("Technology for Removing Memory Deduplication in a Virtual System") carries out a sequential memory deduplication operation on a first processor circuit and a parallel memory redundancy A removal operation is performed.

Korean Patent No. 10-1178752 ("Server-Based Desktop Virtual Machine Architecture Extension to Client Machines") discloses a configuration for extending a server-based desktop virtual machine to a client machine.

SUMMARY OF THE INVENTION The present invention has been made to solve the above-mentioned problems, and it is an object of the present invention to provide a virtualization apparatus and a method of merging a memory page thereof with high efficiency of memory de-duplication.

According to a first aspect of the present invention, there is provided a virtualization apparatus including at least one guest loaded on a page basis from a disk image; And a clean page stored in a cache of the host, if there is a duplicated page that can be shared by the one or more guests, a host that leaves only one shared page and de- Wherein the host searches the duplicate page using a merge tree and a candidate table, the merge tree is a red-black tree (RB tree), the candidate table is a hash And is a table (hash table).

According to a second aspect of the present invention, there is provided a memory merging method for a virtualization apparatus, comprising: (a) receiving one or more guest data from a disk image into a memory in units of pages; step; (b) if there is a duplicate page that can be shared by the one or more guests among the clean pages stored in the cache of the host, only one shared page is left and de-duplication is performed; ; And (c) when the guest sharing the shared page wants to modify the shared page, generating a duplicate page by copy-on-write (CoW).

The present invention achieves a high efficiency of memory de-duplication in a virtualization apparatus and a method of merging a memory page thereof.

Memory space is reserved by deduplicating redundant pages for each virtual machine, that is, the host page cache used by the guest, leaving only one page in memory for pages of the same content.

For this purpose, the search efficiency of the red black tree and the hash table used is good, so that the efficiency of memory de-duplication is better.

The use of limited physical memory and memory depletion caused by using multiple virtual machines can be solved by deduplicating the free memory space.

1 illustrates a structure of a virtualization apparatus according to an embodiment of the present invention.
FIG. 2 illustrates a concept of a memory page merging method of a virtualization apparatus according to an embodiment of the present invention.
FIG. 3 shows a flow of a method of merging a memory page of a virtualization apparatus according to an embodiment of the present invention.
FIG. 4 illustrates an embodiment of a merge tree according to an embodiment of the present invention.

Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings, which will be readily apparent to those skilled in the art. The present invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. In order to clearly illustrate the present invention, parts not related to the description are omitted, and similar parts are denoted by like reference characters throughout the specification.

Throughout the specification, when a part is referred to as being "connected" to another part, it includes not only "directly connected" but also "electrically connected" with another part in between . Also, when an element is referred to as "comprising ", it means that it can include other elements as well, without departing from the other elements unless specifically stated otherwise.

FIG. 1 illustrates a structure of a virtualization apparatus according to an embodiment of the present invention.

The virtualization device 10 may be a computing device that may include one or more memories 20, one or more storage devices 30 and one or more CPUs 40 and may include one or more peripheral devices 50, Two or more guest operating systems may be loaded in the memory 20 and each operating system loaded in the memory 20 may be loaded in the CPU 40. [ Can be executed.

Each guest operating system can service its own applications with the help of a host operating system, for example, the kernel itself or a hypervisor. For each guest user, each guest is a complete computing device, and since it appears to be monopolizing itself, each guest is also called a virtual machine.

Each guest may be stored in the storage device 30 in the form of a virtual disk image (VDI) for efficiency. In the figure, two guests, first guest 100 and second guest 100 ', are shown loaded in memory 20, each having its own disk image, i.e., the first guest disk image 300 ) And the second guest disk image 300 '. The loading unit may be a page.

The host 200 loaded in the memory 20 can scan the host page cache 202 in which pages loaded from the disk images 300 and 300 'into the memory 20 are cached and periodically scan the host page cache 202 And merge, i.e., de-duplicate, redundant pages having the same contents.

The host 200 includes a merge tree 204 and a candidate table 206 for this purpose. In one embodiment, the merge tree 204 is implemented as a red-black tree (RB tree) and the candidate table 206 is implemented as a hash table.

That is, the virtualization device 10 may include a plurality of virtual disks (not shown) that may be shared by one or more guests of one or more guests 100 and 100 'that are loaded page by page from the disk image, The host 200 searches for a duplicate page using the merge tree 204 and the candidate table 206. The host 200 searches for duplicate pages using the merge tree 204 and the candidate table 206. [

At this time, the deduplicated data is shared by the plurality of guests (100, 100 ') and therefore should not be modified. Therefore, duplicated pages are write-protected.

When the duplicated page needs to be modified, the host 200 generates copy-on-write (COW) to generate a duplicate page. That is, when one of the guests sharing the deduplicated page, for example, the first guest 100, wants to modify the deduplicated page, the duplicate page created by copying the deduplicated page is transferred to the first guest 100 ). Duplicated copied pages are freed from write inhibit, so they can be modified freely.

That is, when the guests 100 and 100 'sharing the deduplicated page want to modify the deduplicated page, the host 200 generates a duplicated page by copying at the time of writing, and the duplicated page is prohibited from writing And the page that was copied at the time of writing is released from write inhibition.

Therefore, since the pages to be deduplicated must be shared by the multiple guests 100 and 100 ', it is preferable to scan only the pages in a clean state, that is, the latest (uptodate) state. Also, since the page in the writeback state is in synchronization with the storage device 30, it is preferable that no operation is performed.

The host 200 flushes the dirty pages whose data has been modified, that is, not up-to-date, to synchronize with the storage device 30. This way, more pages will be in the latest sate, so there can be many deduplication pages, which can increase the deduplication rate.

FIG. 2 illustrates a concept of a memory page merging method of a virtualization apparatus according to an embodiment of the present invention.

The figure shows a summary of the above.

The first drawing illustrates the presence of duplicate pages 210 and 210 'referenced by the first guest 100 and the second guest 100' in the host page cache 202, respectively. Since the duplicate pages 210 and 210 'have the same contents, the duplicate pages 210 and 210' are duplicated (S100) by the host 200.

The second drawing shows a shared page 220 that is deduplicated and write-protected. The shared page 220 is identical in content to the two duplicate pages 210 and 210 'in the first drawing. The first guest 100 and the second guest 100 'all refer to the shared page 220. This means that you have the address of the page.

Since the shared page 220 is shared by the first guest 100 and the second guest 100 ', the shared page 220 is in the state of being prohibited from being written. As described above, the first guest 100 or the second guest 100 'Are decomposed into the duplicated pages 210 and 210' again by copying at the time of writing (S200), and the redundant pages 210 and 210 'are in a state where the write inhibition is released.

FIG. 3 illustrates a flow of a method of merging a memory page of a virtualization apparatus according to an embodiment of the present invention.

The de-duplication (S100) of FIG. 2 is performed as follows.

As described above, the host 200 periodically scans the host page cache 202, but searches only the clean pages among the pages stored in the host page cache 202 (S110). (Not shown) is not searched for the remaining dirty pages or pages in the write-back state.

If there is an item indicating duplication in the merge tree 204 (S120), the page is merged (S130), and if there is no item in the merge tree 204, it is checked whether an item exists in the candidate table 206 (S140 ).

If there is no item indicating redundancy in the candidate table 206, an item is added to the candidate table 206 (S150). If the item exists, the page is merged (S160) and the item is moved to the merge tree 204 (S170) .

That is, the host 200 searches the merge tree 204 for each clean page whether or not there is an item indicating a duplicate page in the merge tree 204 or the candidate table 206, The candidate table 206 is searched if there is no entry in the merge tree 204. If there is an entry in the candidate table 206, And moves the item to the merge tree 204. If there is no item in the candidate table 206, the item is added to the candidate table 206. [

At this time, the merge tree 204 maintains information on the merged, i.e., de-duplicated, pages, and the candidate table 206 maintains information about the pages that can be merged, i.e., do. Therefore, the merge tree 204 is searched first from the candidate table 206.

As described above, the merge tree 204 is a red-black tree, and the candidate table 206 can be implemented as a hash table. A hash table is a data structure that stores data as hash key and value pairs, and the red-black tree is a self-balancing binary search tree.

The Red Black Tree was invented by B-Tree inventor Rudolph Bayer in 1972 and the present name appeared in a paper by Robert Segovitch in 1978. Insertion, deletion, and search can be performed with the time complexity of the average O (log n) when there are n elements in the tree. In the worst case, the advantage of maintaining time complexity of O (log n) This is especially useful in cases where execution time is important, especially for real-time processing applications.

In addition, since only one bit is required to represent the color of the node, the spatial complexity can also be maintained at O (n), which consumes less resources.

FIG. 4 illustrates an embodiment of a merge tree 204 in accordance with an embodiment of the present invention.

Because red-black trees are a kind of binary search tree, the data of all nodes is larger than the data of the left children and is smaller than the data of the right children. Thus, the retrieval and sequential listing of data (left child -> parent -> right child) is fast. Since the balance tree is a balanced tree, the length (height) of the path from the root node to each leaf node is not greatly different, so the search and sequential listing speeds are relatively constant.

The red-black tree has the added advantage that search and sequential listing speeds do not slow significantly even in the worst case, since they do not require much computation to maintain a high balance.

The red-black tree satisfies the following conditions in addition to the general binary search tree.

1. A node is a red node (R) or a black node (B).

2. The root node (starting point) is the black node (B).

3. All leaf nodes (BL) are black nodes (B) and have no data and have a null (NIL). In fact, the leaf node BL does not need to be explicitly loaded into the memory 20.

4. All child nodes of the red node (R) are black nodes (B). Therefore, only the black node B can be the parent node of the red node R. [

5. Paths starting from an arbitrary node and reaching the leaf node BL have the same number of black nodes B except for the leaf node BL.

The figure shows an embodiment of a tree that satisfies all of these conditions. The black node B is shown without shading, the red node R is shown with a hatched line, and the leaf node BL is shown as being separated from the intermediate node by a null (NIL).

In the present invention, the red-black tree is used in the implementation of the merge tree 204, and for the sake of explanation, the numerals shown in each node represent information on the contents of each de-duplicated page in a simplified manner.

For example, if the content of a clean page in the host page cache 202 is 11, the page is deduplicated. On the other hand, since the page with content 10 is not present in the merge tree 204, it is necessary to further search the candidate table 206 to determine whether or not the page is a duplicate removal target.

Now, let's take a closer look at the above and explain it more specifically.

First, let us first consider the above-described KSM in the prior art.

The KSM searches two Red-Black Trees to retrieve the deduplication target page. The deduplication target page is a page cache area used by guests such as code, binaries, and libraries used by the virtual machine.

As described above, the present invention improves this, so that each of the virtual machines 100 and 100 'can access a page such as a code, an application binary, or a library used in each virtual machine, that is, the guest 100 or 100' A page deduplication target is selected based on the virtual disk images 300 and 300 'to be used. This is the page read from the disk image 300, 300 'and loaded into the host page cache 202. Since page de-duplication is performed based on the host page cache 202 having a high probability of real memory de-duplication, the deduplication rate and thus the available memory securing rate can be increased.

In addition, since the deduplication is performed on the disk images 300 and 300 'of the virtual machines 100 and 100', that is, the page caches of the files, the memory merging method according to an embodiment of the present invention is performed by the virtual machine 100 , And 100 'in addition to the disk images 300 and 300' used by the first and second optical disk apparatuses 100 and 200 '.

Specifically, the page cache managed by the inode of the virtual disk image is scanned to find the page of the memory de-duplication target. As described above, the status of the page is checked first. That is, the page in the write-back state is not processed because the page is being synchronized with the disk. Pages in dirty state are forced to flush to synchronize with the disk. This allows the contents of the page to be quickly and up-to-date, thereby increasing the probability of deduplication. Finally, deduplication is performed on the latest page.

Two data structures are maintained for page deduplication, one is a merge tree 204 for managing already deduplicated pages, and the other is a candidate table 206 for managing pages that have not yet been deduplicated.

First, it is checked whether there is a page having the same content in the merge tree 204. If there is a page having the same content, the pointer of the searched page is replaced with a page having the same content as the previous page to perform duplication removal.

If it does not exist in the merge tree 204, the candidate table 206 is searched for a page having the same contents to search for a duplicate removal target page. If the same page exists, the duplicated page is added to the merge tree 204 after processing in the same manner as the above case, and deleted in the candidate table 206.

If the candidate table 206 does not find a page having the same content, the page is added to the candidate table 206. This process is repeated to perform page cache de-duplication.

The reason for maintaining two data structures is that the page with the same content as the deduplicated page is more likely to be scanned again. Therefore, instead of directly searching the hash table, the search is first performed in the merge tree 204 managing the deduplicated pages.

The page cache of each disk image 300, 300 'basically has its own backing storage information. This information is used when reflecting the contents of the page to the storage device 30 when the contents are changed.

However, since the memory management method according to the present invention performs deduplication with respect to the page cache, unlike the conventional page cache management technique of Linux, the deduplicated page must have a plurality of storage devices 30 information. Since it is very difficult to manage this, the present invention simply removes all the information of the storage device 30, and when the write operation is performed on the deduplicated page, the virtual machine 100 or 100 ' Reallocated and used. Since most of the page cache is used for the read operation rather than the write operation, the overhead of removing the information of the storage device 30 and allocating it is not large.

When the registered virtual machine file size is large, it is more likely to search for similar pages in the same virtual machines 100 and 100 'than to eliminate memory duplication between the virtual machines 100 and 100'.

In order to solve this problem, if the deduplication does not occur while the cache for one disk image 300 or 300 'is scanned over the threshold ratio, the cache for the other disk images 300 and 300' can be scanned.

For example, if deduplication does not occur while scanning 5% of a page against a single disk image (VDI), the page cache of the other disk image 300, 300 'may be scanned. Also, by scanning 10% of the total number of pages, the page cache of the other disk images 300 and 300 'can be scanned.

The user can adjust the scan rate. The scan rate refers to the number of page scans during a single operation and the delay time during which one operation occurs. As the delay time is shortened and the number of scans is increased, more pages can be deduplicated, but since the usage rate of the CPU 40 is increased, the user can adjust the deduplication rate by selecting an appropriate value.

It will be understood by those skilled in the art that the foregoing description of the present invention is for illustrative purposes only and that those of ordinary skill in the art can readily understand that various changes and modifications may be made without departing from the spirit or essential characteristics of the present invention. will be. It is therefore to be understood that the above-described embodiments are illustrative in all aspects and not restrictive. For example, each component described as a single entity may be distributed and implemented, and components described as being distributed may also be implemented in a combined form.

The scope of the present invention is defined by the appended claims rather than the detailed description and all changes or modifications derived from the meaning and scope of the claims and their equivalents are to be construed as being included within the scope of the present invention do.

10: Virtualization device
20: Memory
30: Storage device
40: central processing unit
50: peripheral device
100: 1st guest
100 ': second guest
200: Host
202: Host page cache
204: Merge tree
206: candidate table
300: First guest disk image
300 ': second guest disk image

Claims

In a virtualization apparatus,
One or more guest (s) loaded page by page from a disk image; And
A host which de-duplicates only one shared page when there is a duplicated page that can be shared by the at least one guest among clean pages stored in a cache of the host; Memory,
The host
The duplicate page is searched using the merge tree and the candidate table,
The merge tree is a red-black tree (RB tree)
Wherein the candidate table is a hash table.

The method according to claim 1,
The host
And when a guest sharing the shared page wants to modify the shared page, generates a duplicate page through copy-on-write (CoW).

3. The method of claim 2,
The shared page is write-protected, removes connection information with the storage device,
Wherein the write prohibited state of the page copied at the time of writing is released.

The method according to claim 1,
The host
Searching for each of the clean pages whether there is an item indicating a duplicate page in the merge tree or the candidate table,
Searching the merge tree first, performing duplicate removal on the clean page when the item is present in the merge tree,
If the entry does not exist in the merge tree, searches the candidate table; if there is the entry in the candidate table, it performs duplicate removal on the clean page, then moves the item to the merge tree;
And adds the item to the candidate table when the candidate table does not include the item.

The method according to claim 1,
The host
And flushes a page page stored in the cache.

The method according to claim 1,
The host
And does not perform deduplication for a write-back page stored in the cache.

The method according to claim 1,
The host
If de-duplication does not occur while scanning the cache for one disk image above a threshold ratio,
A virtualization device that scans the cache for other disk images.

1. A memory merging method of a virtualization apparatus,
(a) loading one or more guests from a disk image into memory in page units;
(b) if there is a duplicate page that can be shared by the one or more guests among the clean pages stored in the cache of the host, only one shared page is left and de-duplication is performed; ; And
(c) generating a duplicate page through copy-on-write (CoW) when a guest sharing the shared page wants to modify the shared page.

9. The method of claim 8,
The step (b)
Searching the cache for each of the clean pages by searching the merge tree or the candidate table for an item indicating a duplicate page,
The merge tree is a red-black tree (RB tree)
Wherein the candidate table is a hash table.

9. The method of claim 8,
The step (b)
Retrieving the merge tree and performing deduplication on the clean page when the item is present in the merge tree;
Searching the candidate table if the item does not exist in the merge tree, and if the item exists in the candidate table, removing the duplicate of the clean page and moving the item to the merge tree; And
And adding the item to the candidate table if the candidate table does not include the item.

9. The method of claim 8,
Wherein the step (b) includes the step of prohibiting writing of the shared page and removing connection information with the storage device,
And the step (c) includes a step of canceling a write-inhibited page of the copied page during the writing.