CN113946286A

CN113946286A - Cloud node block-level caching method, storage device and server

Info

Publication number: CN113946286A
Application number: CN202110944182.8A
Authority: CN
Inventors: 梁雄伟; 赵哲锋; 徐琛; 杨光; 严明
Original assignee: Silk Road Information Port Cloud Computing Technology Co ltd
Current assignee: Silk Road Information Port Cloud Computing Technology Co ltd
Priority date: 2021-08-17
Filing date: 2021-08-17
Publication date: 2022-01-18

Abstract

The invention provides a cloud node block-level caching method, which comprises the following steps of S1: a step of arranging the cache blocks of the magnetic disk; s2: maintaining the integrity of the cache; s3: and (4) self-adaptive updating. The method has the advantages that the limited local storage space of the physical node is used as the block-level cache of the remote storage block, so that the network flow bound to the storage server is reduced. Enabling the virtual machine to provide large storage space for the new virtual disk beyond the capacity of the local disk. The method has remarkable robustness and reliability in cloud storage.

Description

Cloud node block-level caching method, storage device and server

Technical Field

The invention belongs to the field of cloud storage, and particularly relates to a block-level caching method for cloud nodes.

Background

The performance and scale of modern cloud service infrastructures has reached an unprecedented level, spanning multiple geographic regions, to meet the rapidly growing demands of user groups. Such a large scale is due in part to the proliferation of data-driven application workloads in the field of big data and deep learning. Thus, the need to provide sufficient network storage and network bandwidth has become a key design focus for cloud infrastructure being developed or improved. For example, in a cloud application scenario, a machine learning expert attempts to build an optimal deep learning model through iterative training of a large amount of data stored in the cloud. If the data happens to be located in a single network element in the hierarchy, either at the rack level, the pod level, the flat level, the area level, or even at the data center level, the data must be transmitted over the network to the place where training is to be performed over and over. This will exhaust the data center network bandwidth. On the other hand, replicating large amounts of data across multiple levels of a network hierarchy to reduce the latency of remote access is costly and impractical. Some form of data caching scheme is therefore required in a computing storage system.

Assume that there are a large number of cloud nodes equipped with Hard Disk Drives (HDDs) where hypervisors host multiple virtual machines. There are several factors that may prevent a virtual machine from achieving high performance and scalability. First, the current virtual machine co-location mode makes it difficult to provide all virtual block devices on the local storage space only. A physical cloud node carrying a virtual machine needs to provide virtual disks to many virtual machines on the node, and the local storage space may not be sufficient to accommodate the storage space of all the virtual machines. Moreover, since virtual machines may be dynamically migrated between sets of cloud nodes, it is difficult to predict upcoming demand. Second, while using remote network storage as a source of virtual block devices may alleviate these problems, the large amount of I/O traffic may overload the cluster-based network being used. The hierarchical network topology structure adopted by the data center network is common, and the large amount of aggregated I/O traffic can cause serious congestion at the bottleneck, so that the I/O performance of the virtual machine is reduced.

Therefore, it is necessary to provide a new and effective cloud node block level caching strategy to solve the above problems.

Disclosure of Invention

The present invention has been made to solve the above-mentioned technical problems occurring in the prior art, and an object thereof is to propose a method for alleviating these problems by using a local storage of a virtual machine hosting node as a block-level cache of a remote network storage. This caching allows a particular node based on Hypervisor to service its virtual machine's disk I/O requests most of the time using the node's local storage.

The invention provides a cloud node block-level caching method, which comprises the following steps,

s1: a step of arranging the cache blocks of the magnetic disk;

s2: maintaining the integrity of the cache;

s3: and (4) self-adaptive updating.

Further, the step S1 includes,

s11, if the tail hash value of an entry in the block band does not match the hash value of the data block, the entry with the unmatched hash is ignored in the recovery process;

s12 for a read operation, reading an entire entry containing a data block and a data block tail;

s13 the cache address is calculated by the sum of the virtual offset line and the block address, if the calculated cache address contains a dirty effective block, then other available blocks are searched in the cache to replace the dirty effective block, when the replaced block is selected, the following 6 judgment factors are adopted to judge whether (i) the most recently used data block is how much, (ii) whether the data is dirty, (iii) the current data block sequence, (iv) the new data block sequence, (v) the current distance, (vi) the new distance, the sequence factor is judged by the sequence of the block in the cache before and after the removal, the score is obtained by adopting the following formula,

xn denotes the above six factors, the coefficient α denotes the weight assigned to the factor, the coefficient α assigns a weight to all the factors, the score Y is calculated for all the entries in the set, and the block where Y is the minimum value is selected as the replaced block.

Further, the step S2 includes,

s21, in the reading process, after receiving the data block from the remote storage, returning the data to the virtual machine, and executing the metadata updating or cache updating management task;

s22 completes the metadata and cache update before returning to the virtual machine in the write process, and stores the data in the cache if the write operation returns.

Further, the step S3 includes,

s31 triggers a staged caching operation according to configured parameters, determines a staged cache amount, called window data size, according to remote and local I/O delays, and when starting the staged cache, first determines the number of blocks to be transferred within a given time window t, a parameter S_tIndicating the number of bytes per millisecond allowed to be sent over the network, parameter s_tDynamically adjust as I/O and network conditions change;

s32 setting M to be the target network delay, calculating and maintaining an exponentially weighted moving average of the actual delay using a smoothing parameter alpha, M_t＝αM_t-1+ (1-. alpha.) M, calculating s using smoothing parameter γ_tRemote delay M_tAnd s_tIn an inverse relationship:

s33 setting N to local I/O latency, r_tNumber of bytes per millisecond for local disk I/O, by the following equation N_t＝αN_t-1N (1-alpha) calculation of r_t；

S34 s_tAnd r_tIs set to the window data size, the number u of I/Os for staged caching at time t_t；

u_t＝(min(s_t,r_t)×τ_t-P_t)/B；

τ_tIs the time between t and t-1, variable B is the size of the blockband, variable P_tIs a pending I/O request generated by remote access when u_tAnd starting to cache in stages when the time is greater than zero.

Further, the block-level cache structure comprises a plurality of block label columns, the block band comprises continuous blocks from a remote storage server, the block band comprises a plurality of 4KB blocks and a tail, the tail is 512 bytes in size, the tail comprises a hash field and metadata, the hash field records hash values of the blocks and is written in a write operation, the block band entry comprises a plurality of pairs of 4KB blocks and the tail, and the block band entry is written in the cache in a single I/O operation;

the block-level cache structure is stored in a cache space, the cache space is a two-dimensional space including rows and columns of blocks, the cache space shares the blocks from multiple virtual disks, and a virtual offset line of a particular virtual disk represents a first line in the cache space where a block address 0 is located.

The present invention also provides a memory device having stored therein a plurality of instructions adapted to be loaded and executed by a processor to:

s1: a step of arranging the cache blocks of the magnetic disk;

s2: maintaining the integrity of the cache;

s3: and (4) self-adaptive updating.

Further, the step S1 includes,

x_nrepresenting the above six factors, the coefficient α represents the weight assigned to the factor, the coefficient α assigns weights to all the factors, the score Y is calculated for all the entries in the set, and the block for which Y is the minimum value is selected as the replaced block;

the step S2 includes the steps of,

s22, in the write process, the metadata and the cache updating are completed before returning to the virtual machine, if the write operation returns, the data is stored in the cache;

the step S3 includes the steps of,

S34 s_tAnd r_tIs set to the window data size, the number of I/os for the phased cache at time t, ut;

u_t＝(min(s_t,r_t)×τ_t-P_t)/B；

τ_tis the time between t and t-1, variable B is the size of the blockband, variable P_tIs a pending I/O request generated by remote access when u_tStarting to cache in stages when the time is more than zero;

the block-level cache structure comprises a plurality of block label columns, a block band comprises continuous blocks from a remote storage server, the block band comprises a plurality of 4KB blocks and a tail, the tail is 512 bytes in size, the tail comprises a hash field and metadata, the hash field records hash values of the blocks and writes the hash values in the write operation, a block band entry consists of a plurality of pairs of 4KB blocks and the tail, and the block band entry writes the blocks into the cache in a single I/O operation;

The invention also provides a server comprising

A processor adapted to implement instructions; and

a storage device adapted to store a plurality of instructions, the instructions adapted to be loaded and executed by a processor to:

s1: a step of arranging the cache blocks of the magnetic disk;

s2: maintaining the integrity of the cache;

s3: and (4) self-adaptive updating.

Further, in the above-mentioned case,

the step S1 includes the steps of,

the step S2 includes the steps of,

the step S3 includes the steps of,

s31 triggers a staged cache operation based on configured parameters based on remote and local I/O latencyTo determine the amount of phased caching, called window data size, when starting the phased caching, the number of blocks to be transferred within a given time window t is first determined, parameter s_tIndicating the number of bytes per millisecond allowed to be sent over the network, parameter s_tDynamically adjust as I/O and network conditions change;

u_t＝(min(s_t,r_t)×τ_t-P_t)/B；

The invention has the beneficial effect that a caching scheme is provided for the virtual block device in the Hypervisor. The invention utilizes the limited local storage space of the physical node as the block-level cache of the remote storage block to reduce the network flow bound to the storage server. Allowing Hypervisor-based compute nodes to service as many I/O requests as possible that the virtual machines issue from their local storage, while enabling the virtual machines to provide large storage space beyond the capacity of the local disks for new virtual disks. The strategy is superior to the existing method, and has remarkable robustness and reliability in cloud storage.

Drawings

FIG. 1 illustrates an example model of a cloud storage environment;

FIG. 2 illustrates a logical structure of a cache system according to an embodiment of the present invention;

FIG. 3 illustrates a blockband structure;

FIG. 4 illustrates a phased cache;

FIG. 5 illustrates an implementation of the present invention in a Xen environment;

FIG. 6 shows a comparison of performance at network saturation as the number of virtual machines is increased;

FIG. 7 illustrates a comparison of performance when a remote storage server becomes a bottleneck as the number of virtual machines is increased;

fig. 8 shows a flow chart of the present invention.

Detailed Description

Example 1

The invention provides a cloud node block level caching strategy based on Hypervisor, which comprises the following steps,

s1: the method comprises the following steps of (1) arranging the cache blocks of the disk, wherein a cache block placing strategy is established by taking higher sequence as a primary target, and the arrangement of the cache blocks of the disk is designed on the basis of the cache block placing strategy, so that the performance influence is reduced to the minimum;

s2: the integrity of the cache is kept, and when the host fails or crashes, the write I/O operation sent from the virtual machine and the confirmation return information of the write operation cannot damage the integrity of the write operation because of losing write data;

s3: an adaptive update mechanism that gradually sends the modified blocks to remote storage without performance impact on normal I/O operations;

in the practice of the present invention, the cache in vStore (the name of one of the products in which the invention is practiced) is a set associative cache with a write-back policy. The present invention treats the cache structure as a table. Each row in this table contains a plurality of block label columns. A chunk band contains contiguous chunks from one remote storage server, but adjacent chunk bands may be from different storage servers. Set associativity and cache line length are configurable;

a block band contains multiple 4KB blocks and a tail. The tail is 512 bytes in size, which contains the hash field and metadata. The hash field records the hash value of the block and is written upon a write operation. To avoid additional disk I/O due to individual metadata accesses, a chunk stripe entry consisting of multiple 4KB chunk and trailer pairs is constructed and written to cache in a single I/O operation;

as shown in fig. 3, a block stripe is a data storage unit, and a block stripe includes a plurality of 4KB data blocks and a tail. The tail is 512 bytes in size, which contains the hash field and metadata. A 4K data block and a 512 byte trailer constitute a block band entry containing n consecutive 4096+512 byte blocks.

When designing a vStore cache replacement algorithm, the ordering of blocks in the cache space is considered. Traditional replacement policies, such as Least Recently Used (LRU) or least recently used (LFU) policies, may be detrimental to I/O performance because they may eventually separate blocks that should be contiguous. The virtual machine may have issued consecutive read or write operations, but the cache replacement may turn them into random access. In vStore, the retention order I/O is also sequential in the cache space;

the cache space on the local memory is a two-dimensional space containing rows and columns of blocks. Since this cache space shares blocks from multiple virtual disks, it is desirable to separate the cache blocks of different virtual disks into different cache regions of the space as much as possible. If two or more virtual disk blocks compete for overlapping cache space starting from the first row, some active blocks may be moved out and some blocks may be moved in. To separate them, the present invention uses virtual offset rows. The virtual offset line for a particular virtual disk represents the first line in the cache space in which block address 0 is located. The idea is to apply different offsets to different virtual disks so that their starting addresses in the cache space are different;

further, the step S1 includes,

the S11 hash value is used to check if a crash occurred during the write operation. If a crash occurs before the write is complete, the tail hash value of an entry in the chunk band does not match the hash value of the data chunk. Because the write is not complete and has not been returned to the virtual machine, any entries with such a non-matching hash may be ignored during the recovery process;

s12 for a read operation, vStore reads the entire entry containing the data block and tail. Although the read size of each 4KB block is increased by 512 bytes, the overhead is small because the reads are sequential;

the manner in which the S13vStore allocates buffers for newly arrived blocks is as follows. The cache address is first calculated from the sum of the virtual offset line and the block address. If the computed cache address contains a dirty valid block, it will look for other available blocks in the set. In selecting the replaced block, vStore uses the following 6 factors: (i) what is the most recently used data block? (ii) Is the data block dirty? (iii) Current data block order (iv) new data block order (v) current distance and (vi) new distance. The sequential factor considers the order of blocks in the cache before and after a move-out, with the intent of preventing sequential blocks from losing order through the move-out and selecting the moved-out block in a manner that increases order. (v) The distance factor of (vi) refers to the row difference between the virtual offset row and the new row. Rows closer to the virtual offset row are preferentially selected. Combining these 6 factors, the following formula is formed to give a score. xn represents the six factors above that,

the coefficient α represents the weight assigned to the factor. In the current vStore, all factors are assigned equal weights. The score Y is calculated for all entries in the set and the entry with the smallest value is selected as the replaced block. These weights may be configured so that the selection of the replaced block may be adjusted according to workload and settings.

In the practice of the present invention, step S2 includes,

s21 in the read process, the vStore returns data to the virtual machine immediately upon receiving the data block from the remote store. Management tasks such as metadata updates and cache updates will be performed next to make the perceived I/O read requests faster;

s22 in the write process, metadata and cache updates must be completed before returning to the virtual machine. Otherwise, an unexpected host failure may disrupt the write operation of the virtual machine. If the write operation returns, then it should be ensured that the data is persisted in the vStore cache. Thus, the write process may be the cause of most of the overhead. If a cache flush is required, disk I/O may be as many as four times in the worst case, as described below,

read the local (target) block to be moved out.

Remote reading of blocks in a block band: if there are any invalid blocks in the band of blocks that contain the block to be removed, the target block will be merged into that block to form a complete band of blocks.

Remote writing one complete band of blocks.

Write new block to local vStore cache.

In the implementation of the present invention, the step S3 includes,

s31 triggers scoring based on a configured parameter called LOD (dirtiness level)A phase cache operation. The method of the present invention determines the amount of phased caching, referred to as window data size, based on remote and local I/O latency. In starting a staged cache, vStore first determines the number of blocks to be transferred within a given time window t. Introduces a parameter s_tWhich is the number of Bytes Per Millisecond (BPMS) that the vStore is allowed to transmit over the network. Parameter s_tDynamically adjusting as I/O and network conditions change. Here, the data includes normal network traffic and traffic caused by a phased caching mechanism;

s32 sets M to be the network delay that one wants to keep. Calculating and maintaining an exponentially weighted moving average of the actual delay using a smoothing parameter alpha, e.g. M_t＝αM_t-1+ (1-. alpha.) M. Then, s is calculated using another smoothing parameter γ_t. The results show that the remote delay M_tAnd s_tIn an inverse relationship.

S33 setting N to local I/O latency, r_tThe number of bytes per millisecond of local disk I/O. Thus, N can be achieved in the same manner_t＝αN_t-1N (1-alpha) calculation of r_t；

S34s_tAnd r_tIs set to the window data size. Number u of I/Os for staged caching at time t_t；

u_t＝(min(s_t,r_t)×τ_t-P_t)/B；

τ_tIs the time between t and t-1 (milliseconds). The variable B is the block band size. Variable P_tIs a suspended I/O request generated by a remote access, not generated by a vStore. When u is_tWhen the time is more than zero, the cache is started to be cached in stages.

In the implementation of the present invention, the step S1 includes,

the cache in vStore is a set associative cache with a write-back policy. The present invention treats the cache structure as a table. Each row in this table contains a plurality of block label columns. A chunk band contains contiguous chunks from one remote storage server, but adjacent chunk bands may be from different storage servers. The set associativity and the cache line length are configurable. A block band contains multiple 4KB blocks and a tail. The tail is 512 bytes in size, which contains the hash field and metadata. The hash field records the hash value of the block and is written upon a write operation. To avoid additional disk I/O due to separate metadata access, the present invention constructs a chunk stripe entry consisting of multiple 4KB chunk and trailer pairs and writes the cache in a single I/O operation. The hash value is used to check whether a crash occurred during the write operation. If a crash occurs before the write is complete, the tail hash value of an entry in the chunk band does not match the hash value of the data chunk. Because the write is not complete and has not been returned to the virtual machine, any entries with such a non-matching hash may be ignored during the recovery process. For a read operation, the vStore reads the entire entry containing the data block and the tail. Although the read size of each 4KB block is increased by 512 bytes, the overhead is small because the reads are sequential.

The present invention takes into account the ordering of blocks in the cache space when designing the vStore cache replacement algorithm. Traditional replacement policies, such as Least Recently Used (LRU) or least recently used (LFU) policies, may be detrimental to I/O performance because they may eventually separate blocks that should be contiguous. The virtual machine may have issued consecutive read or write operations, but the cache replacement may turn them into random access. In vStore, it is sought to keep sequential I/O sequential in cache space.

The cache space on the local memory is a two-dimensional space containing rows and columns of blocks. Since this cache space shares blocks from multiple virtual disks, the present invention seeks to separate cache blocks of different virtual disks into different cache regions of the space as much as possible. If two or more virtual disk blocks compete for overlapping cache space starting from the first row, some active blocks may be moved out and some blocks may be moved in. To separate them, the present invention uses virtual offset rows. The virtual offset line for a particular virtual disk represents the first line in the cache space in which block address 0 is located. The idea is to apply different offsets to different virtual disks so that their starting addresses in the cache space are different.

The manner in which vStore allocates buffers for newly arrived blocks is as follows. The cache address is first calculated from the sum of the virtual offset line and the block address. If the computed cache address contains a dirty valid block, it will look for other available blocks in the set. In selecting the replaced block, vStore uses the following 6 factors: (i) what is the most recently used data block? (ii) Is the data block dirty? (iii) Current data block order (iv) new data block order (v) current distance and (vi) new distance. The sequential factor takes into account the order of blocks in the cache before and after a move, which may prevent sequential blocks from losing order through a move. The present invention selects the shifted out blocks in a manner that increases the order. (v) The distance factor of (vi) refers to the row difference between the virtual offset row and the new row. Rows closer to the virtual offset row are preferentially selected. Combining these 6 factors, the following formula is formed to give a score. x is the number of_nRepresenting the above six factors.

In the implementation of the present invention, the step S3 includes,

the staged caching function of vStore flushes modified blocks in the cache to the remote pool. The staged caching function aims to keep as low a proportion of dirty blocks in the cache space as possible. Since refresh operations of a dirty block may significantly impact response time at frequent I/O requests, the vStore attempts to refresh a portion of the dirty block using only idle time without impacting normal I/O performance. Successful dumping allows the virtual disk to be quickly detached for virtual machine migration because there are fewer blocks to refresh.

In designing an anti-staging cache mechanism, there are several goals. First, the performance overhead incurred by the data staging cache should be controllable compared to the normal operation of the vStore. Since staging cache increases the number of I/Os, some performance penalty results. Second, the phased cache should automatically adapt to changing workloads. The cache should be ready in stages when the system is busy with heavy I/O. Only when the system is idle should operation take place.

The phased caching operation is triggered according to a configured parameter, called LOD (dirty level). The method of the present invention determines the amount of phased caching, referred to as window data size, based on remote and local I/O latency. In starting a staged cache, vStore first determines the number of blocks to be transferred within a given time window t. Introduces a parameter s_tWhich is the number of Bytes Per Millisecond (BPMS) that the vStore is allowed to transmit over the network. Parameter s_tDynamically adjusting as I/O and network conditions change. Here, the data includes normal network traffic as well as traffic caused by a phased caching mechanism.

To calculate s_tLet M be the network delay that one wants to keep. Calculating and maintaining an exponentially weighted moving average of the actual delay using a smoothing parameter alpha, e.g. M_t＝αM_t-1+ (1-. alpha.) M. Then, s is calculated using another smoothing parameter γ_t. The results show that the remote delay M_tAnd s_tIn an inverse relationship.

Let N be the local I/O latency, r_tThe number of bytes per millisecond of local disk I/O. Thus, N can be achieved in the same manner_t＝αN_t-1N (1-alpha) calculation of r_t。

s_tAnd r_tIs set to the window data size. Number u of I/Os for staged caching at time t_tThe calculation is as follows,

u_t＝(min(s_t,r_t)×τ_t-P_t)/B

The meaning of English abbreviations in the present invention will be described below.

vStore represents a cloud node block-level caching strategy based on Hypervisor according to an embodiment of the present invention.

Hypervisor denotes a virtual machine monitor.

The LOD represents the dirtiness level.

HDD denotes a hard disk drive.

LRU represents the least used cache replacement algorithm.

LFU represents the least recently used cache replacement algorithm.

In the implementation process of the invention, the above steps do not have a sequence relation.

An embodiment of the invention implements vStore using the blktap mechanism in a Xen environment. Xen follows the split driver architecture of I/O, where the front half of the driver (blkfront) is placed in the guest virtual machine and the back end (blkback) is placed in Dom 0. Xen's blktap mechanism replaces the back-end portion of the split driver with a block-tap device. Then, all guest virtual machine disk I/O, once it leaves the front end, is redirected to the user process, referred to as Tapdisk in Dom 0. This mechanism allows the present invention to capture all block requests from the virtual machine and conveniently implement the required functionality in user space.

The tapdisek process opens the block device by the specified type. These types include block devices that perform synchronous I/O and asynchronous I/O. If open in synchronous mode, all block requests will be processed normally by reading, writing, opening, closing the system. However, if open in asynchronous mode, tapdisek will call the Linux AIO library function to handle the block request. In vStore, the present invention creates a new type of tapdish pattern and registers a set of callback functions.

In one embodiment of the present invention, the evaluation environment is composed of a host having 3.40GHz dual Intel (R) Xeon (TM) CPUs. The hosts are placed in a rack of 20 physical machines that communicate at 1 Gbps. The virtual machine image is provided to a host on an NFS (network file system) volume. This NFS storage space is used to store the virtual machine image and does not operate as a remote storage server in the context of the present invention. The present invention has attached this separate storage volume to the virtual machine and run the experiment on them so that the present invention can fully reserve local storage for cache space. The virtual machine of the present invention is lightweight, with 512MB of memory running on Linux kernel 2.6.18-8.

The role of the remote storage server is played by an independent host that is independent of the rack and the sub-network. The present invention has installed on them a TCP version of the network block device so that it can act as a remote storage server. The test virtual machine of the present invention becomes a network block device client. The network block device server space may be accessed by conventional open, read and/or write calls to/dev/nbd 0 devices.

The present invention uses two workload criteria for the evaluation of the proposed model of the present invention. These two load experiments showed different workload characteristics, as described below.

(1) Filebench: a flexible file system benchmarking allows users to specify desired behaviors using a custom workload model language. There are several pre-constructed workloads in the Filebench benchmark test. Wherein, the invention selects Webserver, Varmail and Fileserver.

(2) Postmark: the file system is a file system reference initially created by NetApps and is used for correctly simulating the mode of small temporary files commonly found in Internet software such as e-mail, news and e-commerce.

Table 1 compares the read-write ratios observed from the application and the Hypervisor. The workload observed from the Hypervisor end is very different from the application level. This is due to the locality of the data block.

TABLE 1 comparison of read-write ratio of application and Hypervisor

Example 2: performing a time-cost experiment

The basis for the comparison is the performance of the local store, not the remote store. This is because the performance of remote storage servers may vary, from high-end enterprise-level storage servers to low-end storage servers built from commodity hardware components. vStore is always at a disadvantage compared to the performance of high-end storage servers. The reverse is also possible. Thus, comparing the performance of the vStore to the local store, one can see how much real overhead the vStore will incur. Since there is a 512-byte trailer at the end of each block, the expected execution time overhead will be at least 12.5%.

A virtual disk is prepared for the virtual machine under test using Xen's aiotapdish on the vStore cache file. This virtual machine image file resides on the local disk, so all virtual machine I/O requests become I/O on the local disk (i.e., vStore cache space). The comparison may be performed fairly using the local disk and the same disk area on the local disk. The performance comparison is a local disk access (with and without vStore cache) by aiotapiskk.

The execution time performance overhead is shown in table 2. The local disk capabilities are shown in label AIO. It uses Xen's AIO disk through Linux asynchronous library. The tag AIO-512 is a specially modified AIO in which all 4KB blocks are extended with an extra 512 bytes to emulate the tail size of vStore. It is intended to reveal the extent to which only one tail is added to the performance of the AIO mechanism. How much the tail and cache processing logic has an effect on the performance of vStore is found through the above experiments.

TABLE 2 execution time overhead

Table 2 shows that the execution time overhead for vStore is less than 12%. It can be seen that when the workload is heavy, the overhead between AIO-512 and vStore is very different, such as FileBenchWebserver and FileBenchFileServer. When the workload is moderate (i.e., filebenchvariable), the overhead difference is small.

In one embodiment of the invention, the workload is a mixture of different reads and writes, and the sequence is changed continuously. By separating the different types of workloads, it can be seen which type of workload has a greater impact on performance than other types of workloads. To this end, the vStore on the composite workload was evaluated to better understand the overhead incurred under the workload shown in table 3. By changing the read-write, sequence, and thread count, six types of workloads can be generated. It shows that the multi-threaded workload is only about half of the single-threaded workload because the probability of page cache hits is greater when multiple workloads are staggered in the virtual machine. As can be seen from the slight differences between the AIO-512 and vStore columns, most of the overhead of vStore comes from the processing of the tail. Among these workloads, the largest difference between them is multi-threaded sequential reads. In the case of random workloads, the op/s for normal AIO is high because cache misses are large. However, vStore can satisfy some block requests to the cache and exhibits lower overhead than AIO-512.

TABLE 3 overhead on various composite workloads

Example 3: multiple virtual machine operation experiment

To verify the benefits of vStore in improving the performance of multiple virtual machines when sharing a remote storage server. The experiment was tested for two scenarios.

(1) Network saturation: multiple virtual machines connect virtual disks from the same remote storage server and generate high network traffic, resulting in network bandwidth saturation. The goal of this experiment was to make network bandwidth a bottleneck. To accomplish this, four virtual disks are placed on different physical disks of the storage server.

(2) Storage I/O saturation: this scenario simulates a situation where disk I/O is saturated due to a large number of I/O requests on the virtual disk. In order to make disk bandwidth a bottleneck rather than a network, four virtual disks are placed on the same physical disk of the storage server.

The results of both cases are shown in fig. 6 and 7. In both cases, as the number of virtual machines sharing a remote storage server increases, the number of IOs per second decreases significantly. When the number of virtual machines reaches 4, the performance drops to 20% of a single virtual machine. However, using vStore on 4 virtual machines helps to mitigate performance degradation. While vStore does not completely prevent performance loss, it can mitigate performance degradation in the case of 2 virtual machines or more. Particularly in the postmark case, vStore can provide performance comparable to the 1 virtual machine case. From these tests, it can be seen that vStore can be used to prevent performance degradation due to network and storage I/O saturation at remote storage.

The result of the comparison experiment is integrated, and the block-level cache strategy of the cloud node based on the Hypervisor provided by the invention is more effective.

Although the present invention has been disclosed in connection with the preferred embodiments shown and described in detail, it will be understood by those skilled in the art that various modifications may be made to the block level caching strategy of the Hypervisor-based cloud node proposed by the present invention without departing from the scope of the present invention. Therefore, the scope of the present invention should be determined by the contents of the appended claims.

Claims

1. A cloud node block-level caching method is characterized by comprising the following steps,

s1: a step of arranging the cache blocks of the magnetic disk;

s2: maintaining the integrity of the cache;

s3: and (4) self-adaptive updating.

2. The cloud node block-level caching method according to claim 1, wherein the step S1 includes,

3. The cloud node block-level caching method according to claim 1, wherein said step S2 includes,

4. The cloud node block-level caching method according to claim 1, wherein the step S3 includes,

u_t＝(min(s_t,r_t)×τ_t-P_t)/B；

5. The cloud node block-level caching method of claim 1,

6. A memory device having stored therein a plurality of instructions adapted to be loaded and executed by a processor:

s1: a step of arranging the cache blocks of the magnetic disk;

s2: maintaining the integrity of the cache;

s3: and (4) self-adaptive updating.

7. A storage device according to claim 6,

the step S1 includes the steps of,

the step S2 includes the steps of,

the step S3 includes the steps of,

s31 triggers a staged caching operation according to configured parameters, determines a staged cache amount, called window data size, according to remote and local I/O delays, and when starting the staged cache, first determines the number of blocks to be transferred within a given time window t, a parameter S_tIndicating every millisecond allowed to be transmitted through the networkNumber of bytes, parameter s_tDynamically adjust as I/O and network conditions change;

u_t＝(min(s_t,r_t)×τ_t-P_t)/B；

8. A server, comprising

A processor adapted to implement instructions; and

s1: a step of arranging the cache blocks of the magnetic disk;

s2: maintaining the integrity of the cache;

s3: and (4) self-adaptive updating.

9. The server according to claim 8,

the step S1 includes the steps of,

x_nrepresenting the above six factors, the coefficient alpha representing the factor assignedThe weight, the coefficient alpha distributes the weight for all the factors, the score Y is calculated for all the items in the set, and the block with the minimum value of Y is selected as the replaced block;

the step S2 includes the steps of,

the step S3 includes the steps of,

S34 s_tAnd r_tIs set to the window data size, in timeNumber u of I/Os for staged caching at time t_t；

u_t＝(min(s_t,r_t)×τ_t-P_t)/B；