CN113742304B - Data storage method of hybrid cloud - Google Patents

Data storage method of hybrid cloud Download PDF

Info

Publication number
CN113742304B
CN113742304B CN202111313263.4A CN202111313263A CN113742304B CN 113742304 B CN113742304 B CN 113742304B CN 202111313263 A CN202111313263 A CN 202111313263A CN 113742304 B CN113742304 B CN 113742304B
Authority
CN
China
Prior art keywords
file
compression
cloud
storage
files
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111313263.4A
Other languages
Chinese (zh)
Other versions
CN113742304A (en
Inventor
邱创和
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Yaguan Technology Co ltd
Original Assignee
Hangzhou Yaguan Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Yaguan Technology Co ltd filed Critical Hangzhou Yaguan Technology Co ltd
Priority to CN202111313263.4A priority Critical patent/CN113742304B/en
Publication of CN113742304A publication Critical patent/CN113742304A/en
Application granted granted Critical
Publication of CN113742304B publication Critical patent/CN113742304B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • G06F16/1744Redundancy elimination performed by the file system using compression, e.g. sparse files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • G06F16/162Delete operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files

Abstract

A data storage method of a hybrid cloud belongs to the technical field of digital information transmission, and comprises the following steps: step 1, establishing a mixed cloud; step 2, the cache manager processes a request from a user; step 3, the cloud manager accesses the public cloud storage and downloads the file to the cache manager; step 4, the cache manager judges whether the file needs to be compressed or not, and determines a specific compression method adopted when the file needs to be compressed; and 5, the cloud manager determines the compression method of all files saved in the public cloud storage. According to the scheme, different compression methods are combined, so that the performance of the private cloud end of the hybrid cloud is improved, and the cost of the public cloud end is saved. The private cloud makes a compression scheme decision for each missed request based on file characteristics and the list. The public cloud makes the compression scheme decision for all stored files. The scheme finds a set of compression methods to minimize the cost and meet the requirement of response limit time.

Description

Data storage method of hybrid cloud
Technical Field
The invention belongs to the technical field of digital information transmission, and particularly relates to a data storage method of a hybrid cloud.
Background
The hybrid cloud generally comprises a public cloud and a plurality of private clouds, and integrates respective advantages of the public cloud and the private clouds, and the hybrid cloud can use the public cloud as a platform for data backup, cloud data processing and remote data access of the private clouds and use the private clouds as a platform for hot data storage.
Chinese patent publication No. CN102263825A discloses a data transmission method for a cloud location-based hybrid cloud storage system, which automatically adopts different data transmission modes for different cloud environments by analyzing cloud storage nodes and cloud locations of cloud clients, and simultaneously meets the requirements of high performance of private cloud environment data transmission, high bandwidth utilization rate and high security of public cloud environments. Meanwhile, it is considered that:
the private cloud undertakes the storage task of data with high requirements on availability and performance, and strategies such as compression cause high delay. The lower the relative cloud position is, the less compression or no compression is performed on data transmission, so that the performance is improved, and the delay is reduced; the higher the relative cloud position is, the higher the compression ratio is adopted, and the bandwidth utilization rate is improved.
Therefore, in the conventional scheme, the private cloud storage with a low location is used as a cache to ensure the data transmission performance of the private cloud, and meanwhile, the public cloud storage with a high location is used as a backup.
The above method has the following disadvantages: the low-located private cloud stores as little or no compression as possible. However, if the private cloud storage applies the compression scheme, on one hand, the transmission bandwidth between the public cloud and the private cloud is reduced, and on the other hand, the limited storage space of the private cloud storage can be effectively utilized, more files are reserved in the private cloud storage, the number of files which the private cloud has to access the public cloud storage is reduced, the file transfer cost from the public cloud to the private cloud is reduced, and therefore the performance of the hybrid cloud is improved.
Therefore, it is necessary to apply the compression scheme also to low-location private cloud storage. However, the private cloud is responsible for storage of data with high availability and performance requirements, some files are frequently accessed, repeated compression and decompression cause a large overhead, and a high delay is caused, while other files are opposite. The same compression method is adopted for files with different states, and the effect is not ideal. Therefore, different compression schemes are adopted for files with different states, so that the effect of improving performance is achieved.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a data storage method of a hybrid cloud.
In order to achieve the above object, the present invention is achieved by the following technical solutions.
A data storage method of a hybrid cloud comprises the following steps:
step 1, establishing a hybrid cloud, wherein the hybrid cloud comprises a private cloud and a public cloud;
the private cloud comprises a private cloud storage and a cache manager; the public cloud comprises a public cloud storage and a cloud manager;
step 2, the cache manager processes a request from a user, and when the cache manager receives the user request, the cache manager checks whether a file corresponding to the request exists in the private cloud storage: if the file exists in the private cloud storage, allowing the user to directly access; if the file does not exist in the private cloud storage, the cache manager requests the cloud manager to download the file, and then the next step is carried out;
step 3, after receiving a file downloading request of the cache manager, the cloud manager accesses the public cloud storage and downloads the file to the cache manager;
step 4, after the file is downloaded and before the file is placed into the private cloud storage, the cache manager judges whether the file needs to be compressed or not and determines a specific compression method adopted when the file needs to be compressed;
and 5, the cloud manager determines the compression method of all files saved in the public cloud storage.
Further, the private cloud storage serves as cache storage; when the corresponding file requested by the cache manager is lost, the cache manager selects a proper compression method for the file after downloading the file; the public cloud storage is used as backup storage; the cloud manager adopts a combined compression method for all files, and finds the compression method with the minimum storage cost under the constraint of set response time.
Further, step 4 comprises the steps of:
step 4a, the cache manager writes and reads the average wrci of the requested files according to the private cloudiA default compression method is provided for downloaded files: if wrciiIf the compression rate is more than 0.5, the file is more suitable for writing, the default compression method is set to be the LZW compression method, and otherwise, the default compression method is set to be the LZMA compression method;
step 4b, after compressing the downloaded file and before storing the compressed file in the private cloud storage, the cache manager determines the file to be deleted in the private cloud storage; the cache manager sets up two lists: a compressed list and an uncompressed list; the compression list represents a file list which is required to be deleted from the private cloud storage after the downloaded file is compressed and stored in the cache manager; the uncompressed list represents a file list which is required to be deleted from the private cloud storage after the downloaded file is uncompressed and stored in the cache manager; the number of files in the uncompressed list is not less than the number of files in the compressed list.
Step 4c, classifying the files in the uncompressed list and not in the compressed list as a kickfree list; a kickfree list indicating files that are free from deletion because the downloaded files are compressed;
the overall time saved with compression is denoted by v, which is calculated as: v = vco−vci
Wherein v iscoIndicating the time saved after using the kick-free list, vciRepresents the response time taken with compression;
if v is greater than 0, the whole is time-saving after compression is adopted, so that the cache manager compresses the downloaded file and stores the compressed file in the private cloud storage; otherwise, the downloaded file is directly stored in the private cloud storage without compression.
Further, vcoThe calculation formula (c) is as follows: when compression method J is alcoiWhen the temperature of the water is higher than the set temperature,
vco=∑i∈AvoidKick[siJ×(tco+dJ)−siJ×(tci+cJ×wrcii+dJ×(1−wrcii))];
wherein alcoiThe compression method used by the file i in the public cloud storage is shown; avoidkisk, representing a kickfree list; siJThe size of the file i after compression by the compression method J is represented; t is tcoThe unit of the transmission time of the public cloud storage is s/MB; dJThe decompression time of compression method J is represented; t is tciThe unit of the transmission time of the private cloud storage is s/MB; c. CJRepresents the compression time of compression method J; wrciiAnd represents the average write-read ratio of the requested file in the private cloud.
Further, in step 4c, vciThe calculation formula (c) is as follows: when i is the downloaded file, the compression method j is alciiWhen the temperature of the water is higher than the set temperature,
vci=sij×[tci+cj×wrcii+dj×(1−wrcii)]−si0×tci
wherein alciiThe compression method used by the file i in the private cloud storage is represented; sijRepresenting the size of the file i after compression by the compression method j; t is tciThe unit of the transmission time of the private cloud storage is s/MB; c. CjRepresents the compression time of compression method j; wrciiRepresenting an average write-read ratio of the requested files in the private cloud; djIndicating the decompression time of compression method j; si0Indicating the uncompressed size of file i.
Further, in step 5, classifying the files in the public cloud storage, calculating the time of each file which is not accessed through a timer, classifying the files of which the time which is not accessed exceeds an empirical value as silent files, and classifying the rest files as active files; using a CM compression method for silent files; a solution to the linear programming problem for active files, i.e. minimizing the storage cost of public cloud memory with bounded public cloud average response times.
The solution to the linear programming problem is mathematically expressed as follows:
minimizing storage costs:
Figure 995938DEST_PATH_IMAGE001
the conditions to be satisfied are:
Figure DEST_PATH_IMAGE003
among them, CoststoThe storage cost of the cloud storage is represented, and the unit is $/MB; sijThe size of the file i after compression by the compression method j; freq (total number of bits)iRepresenting the access frequency of the file i when accessing the public cloud; t is tcoThe unit of the transmission time of the public cloud storage is s/MB; c. CjRepresents the compression time of compression method j; wrcoiRepresenting an average write-read ratio of the requested file in the public cloud; djIndicating the decompression time of compression method j; cloudLim, which represents the response limit time of the public cloud.
Compared with the prior art, the invention has the following beneficial effects:
according to the scheme, different compression methods are combined, so that the performance of the private cloud end of the hybrid cloud is improved, and the cost of the public cloud end is saved. The private cloud makes a compression scheme decision for each missed request based on file characteristics and the list. The public cloud makes the compression scheme decision for all stored files. The scheme finds a set of compression methods to minimize the cost and meet the requirement of response limit time.
Simulations show that the method can improve performance by 50% and save cost by 75% compared with the traditional scheme (private cloud storage original file). The main reason is that the scheme does not adopt a single compression method, and a more appropriate compression scheme can be selected for the file.
Drawings
FIG. 1 is a diagram of overall response times for a private cloud;
FIG. 2 is a graph of the relationship between hit rate and compression rate for a private cloud;
FIG. 3 is a graph of response time of a public cloud;
FIG. 4 is a graph comparing average response time and storage cost of a public cloud;
FIG. 5 is a comparison graph of response times for various combinations of the present solution by the public cloud;
fig. 6 is a comparison graph of response times for various combinations of private clouds for this scheme.
Detailed Description
The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.
Introduction of the compression method:
the Huffman compression method has short compression time and high compression ratio (about 0.6), and is suitable for storage mainly based on writing files.
The LZ77 compression method, with a shorter decompression time, higher compression ratio (about 0.6), is suitable for read-based storage. However, the difference between the compression time and the decompression time is large and is not advantageous in some cases. For example, when 9 requests are read and 1 request is write out of 10 requests, the LZ77 compression method will take 0.477 seconds and the huffman compression method will take 0.2 seconds according to the data in table 1, and it can be seen that the LZ77 compression method is not an optimal solution.
The compression ratios of both the LZMA compression and the LZW compression are less than the huffman compression and the LZ77 compression. The smaller compression ratio makes both algorithms more suitable for storage in the public cloud. Of these two algorithms, the LZMA compression method is suitable for read-file based storage and the LZW compression method is suitable for write-file based storage.
The CM compression method has the lowest compression rate, but has longer compression time and decompression time, and is suitable for backing up files but not suitable for private cloud storage. This is because files in private cloud storage need to be accessed frequently.
Compression method Huffman LZ77 LZMA LZW CM
Compression ratio 0.58 0.59 0.34 0.45 0.25
Compression time (s/MB) 0.02 0.45 0.64 0.35 1079
Decompression time (s/MB) 0.02 0.003 0.12 0.25 1075
The method is applicable to the following steps: writing in Reading Reading Writing in Backup of
By applying the compression scheme to the data storage of the private cloud, more files can be stored in the private cloud storage, the possibility of accessing the files from the cloud is further reduced, and the transmission cost is reduced. However, additional compression and decompression time is spent due to the compression scheme. Therefore, adaptive adjustments to the overall compression scheme are required.
And when the requested file is lost, the private cloud selects a proper compression method for the requested file, determines whether to compress the requested file according to the overhead possibly generated by downloading the requested file from the public cloud storage, and determines how to compress the requested file mainly according to the writing-reading ratio.
A data storage method of a hybrid cloud comprises the following steps:
step 1, establishing a hybrid cloud, wherein the hybrid cloud comprises a private cloud and a public cloud;
the private cloud comprises a private cloud storage and a cache manager;
the private cloud storage is used as cache storage;
when the corresponding file requested by the cache manager is lost, the cache manager selects a proper compression method for the file after downloading the file;
the public cloud comprises a public cloud storage and a cloud manager;
the public cloud storage is used as backup storage;
the cloud manager adopts a combined compression method for all files; and finding a compression method with the minimum storage cost under the constraint of a given response time.
Step 2, the cache manager processes a request from a user, and when the cache manager receives the user request, the cache manager checks whether a file corresponding to the request exists in the private cloud storage: if the file exists in the private cloud storage, allowing the user to directly access; if the file does not exist in the private cloud storage, the cache manager requests the cloud manager to download the file, and then the next step is performed.
And 3, after receiving the file downloading request of the cache manager, the cloud manager accesses the public cloud storage and downloads the file to the cache manager.
And 4, after the file is downloaded and before the file is placed into the private cloud storage, judging whether the file needs to be compressed by the cache manager, and determining a specific compression method adopted when the file needs to be compressed.
Step 4a, the cache manager writes and reads the average wrci of the requested files according to the private cloudiA default compression method is provided for downloaded files: if wrcii> 0.5, indicating that the file is more suitable for writing, the default compression method is set to the LZW compression method, otherwise the default compression method is set to the LZMA compression method.
Step 4b, after compressing the downloaded file and before storing the compressed file in the private cloud storage, the cache manager determines the file to be deleted in the private cloud storage; the cache manager sets up two lists: a compressed list and an uncompressed list; the compression list represents a file list which is required to be deleted from the private cloud storage after the downloaded file is compressed and stored in the cache manager; the uncompressed list represents a file list which is required to be deleted from the private cloud storage after the downloaded file is uncompressed and stored in the cache manager; the number of files in the uncompressed list is not less than the number of files in the compressed list.
Step 4c, classifying the files in the uncompressed list and not in the compressed list into a kickfree list AvoidKick; kick-free lists, which represent files that are free from deletion because the downloaded files are compressed. For example, when the downloaded file is 10MB in size, there are two possibilities of "compressed" and "uncompressed" before it is stored in private cloud storage. When the file is "compressed", the cache manager determines that the files to be deleted in the private cloud storage are file 1 and file 2, and at this time, a compression list is composed of file 1 and file 2. When the file is "uncompressed", the cache manager determines that the files to be deleted in the private cloud storage are file 1, file 2, and file 3, and at this time, an uncompressed list is composed of file 1, file 2, and file 3. Since kickfree list AvoidKick is a file that is in the uncompressed list and not in the compressed list, at this point, file 3 constitutes kickfree list AvoidKick. The private cloud storage deletes old files and stores new files, and for cyclic storage, the logic of deleting old files is usually deleting the oldest files, although other deleting logics can also be adopted, in short, the sequence of deleting old files is determined, so that the uncompressed list necessarily includes the compressed list, that is, the number of files in the uncompressed list is not less than the number of files in the compressed list.
By vcoRepresenting the time saved after using the kick-free list, calculate vco
When compression method J is alcoiWhen the temperature of the water is higher than the set temperature,
vco=∑i∈AvoidKick[siJ×(tco+dJ)−siJ×(tci+cJ×wrcii+dJ×(1−wrcii))];
wherein alcoiThe compression method used by the file i in the public cloud storage is shown; avoidkick, representing a kickfree list; siJThe size of the file i after compression by the compression method J is represented; t is tcoThe unit of the transmission time of the public cloud storage is s/MB; dJThe decompression time of compression method J is represented; t is tciThe unit of the transmission time of the private cloud storage is s/MB; c. CJRepresents the compression time of compression method J; wrciiAnd represents the average write-read ratio of the requested file in the private cloud.
By vciRepresenting the response time taken with compression, calculate vci
When i is the downloaded file, the compression method j is alciiWhen the temperature of the water is higher than the set temperature,
vci=sij×[tci+cj×wrcii+dj×(1−wrcii)]−si0×tci
wherein alciiThe compression method used by the file i in the private cloud storage is represented; sijRepresenting the size of the file i after compression by the compression method j; t is tciThe unit of the transmission time of the private cloud storage is s/MB; c. CjRepresents the compression time of compression method j; wrciiRepresenting an average write-read ratio of the requested files in the private cloud; djIndicating the decompression time of compression method j; si0Indicating the uncompressed size of file i.
The overall time saved with compression is denoted by v, which is calculated.
v=vco−vci
Because files in the kickless list are retained in the private cloud storage, time is saved for downloading the files from the public cloud storage. v. ofcoThe compression and decompression times are subtracted at the same time because the files in the kick-free list already exist in private cloud storage.
vciFocusing on the downloaded file, the time spent uncompressed and compressed was compared. If v isci> 0, indicating that compression can save response time.
If v is greater than 0, the whole is time-saving after compression is adopted, so that the cache manager compresses the downloaded file and stores the compressed file in the private cloud storage; otherwise, the downloaded file is directly stored in the private cloud storage without compression.
And 5, the cloud manager determines the compression method of all files saved in the public cloud storage.
The compression scheme applied in the private cloud reduces the possibility of accessing the public cloud and further reduces the transmission cost because more files are reserved in the private cloud storage. Applying a compression scheme in public clouds can also reduce costs, but the reason for this is different from private clouds in that it reduces transmission costs is: the storage space occupied by the compressed file is reduced, thereby reducing the storage cost. However, the cost of storage is not the only consideration for compression, and what compression scheme should be used should also take into account response time constraints. The response time after applying the compression scheme in the public cloud should be within an acceptable range. The maximum acceptable response time is set to the response limit time cloudLim of the public cloud.
And the public cloud storage is used as backup storage and is different from the cache load of the private cloud storage. Therefore, the compression method of the public cloud storage should look at each file instead of each request, which can save a lot of computing overhead. Meanwhile, the average response time of the public cloud is introduced, and the compression scheme is prevented from being dominated by a large file. Therefore, the problem of which compression scheme is adopted by the public cloud storage is simplified into a linear programming problem, namely, the storage cost of the public cloud storage is minimized under the condition that the average response time of the public cloud is bounded.
The linear programming problem is expressed mathematically as follows:
minimizing storage costs:
Figure 100029DEST_PATH_IMAGE001
the conditions to be satisfied are:
Figure 614187DEST_PATH_IMAGE003
among them, CoststoThe storage cost of the cloud storage is represented, and the unit is $/MB; sijThe size of the file i after compression by the compression method j; freq (total number of bits)iRepresenting the access frequency of the file i when accessing the public cloud; t is tcoThe unit of the transmission time of the public cloud storage is s/MB; c. CjRepresents the compression time of compression method j; wrcoiRepresenting an average write-read ratio of the requested file in the public cloud; djIndicating the decompression time of compression method j; cloudLim, which represents the response limit time of the public cloud. Unit of storage cost: and $ MB, which represents the cost that would be spent per 1MB of data stored. In the art, costs are typically calculated in dollars, for example: the API cost of PUT, POST, LIST requests is $ 0.005 per 1000 requests; the network download cost is $ 0.09 per GB.
The public cloud average response time, including the time of transmission to the private cloud and the time of compression and decompression, are affected by the file size, which is also affected by the compression method. The public cloud average response time is also affected by the public cloud response limit time cloudLim. Therefore, the computational overhead for solving this linear programming problem increases sharply with the increase in the number of files, and it is necessary to simplify the above scheme again.
Classifying files in the public cloud storage, calculating the time of each file which is not accessed through a timer, classifying the files of which the time which is not accessed exceeds an empirical value into silent files, and classifying the rest files into active files. CM compression is used for silent files. The solution of the linear programming problem described above is used for active files.
Using a work load simulator ProWGen to generate a work load, setting 100 files in a system, setting the capacity of a private cloud storage to be equal to the sum of the sizes of the files meeting the conventional distribution, setting the average file size to be 50MB, setting the cache size to be 1500MB, and setting the speed of file transmission from a public cloud to be 0.05 s/MB-1.5 s/MB; the access speed of the private cloud memory is 0.0067s/MB, which is 200 times faster than that of the public cloud; setting 25% of files to be accessed only once.
When the private cloud is simulated, since the CM compression method has a low compression rate and most operations of the public cloud are reading, it is assumed that all files of the public cloud storage are compressed by the CM compression method, and much cost is saved in transmission and storage.
Fig. 1 is a diagram of the overall response time of a private cloud, with the unit of the overall response time being s. As can be seen from fig. 1, the best results are obtained by this scheme, regardless of the write-read ratio. Overall, the overall response time increases with increasing write-read ratio because the compression time is greater than the decompression time. As the write-read ratio increases, indicating more write requests, the system takes more time to compress, thereby increasing the overall response time. Because the scheme adopts the combination of compression methods matched with the workload, the performance of the scheme is better than that of the scheme which only uses one compression method.
Fig. 2 is a graph of the relationship between hit rate and compression rate of a private cloud. The hit rate increases with increasing compression rate but eventually becomes smooth. An increase in compression rate indicates that more files are compressed. An increase in hit rate, which means an increase in the probability of requesting a file in the private cloud, reduces the need to access the public cloud.
When simulating a public cloud, it is preset that the write-read ratio of the public cloud and the write-read ratio of the private cloud are the same, and a file with an access frequency of 30% occupies more than 60% of requests. A mathematical programming solver lp _ solution is used to solve the linear programming problem for public clouds.
FIG. 3 is a graph of response time in units of s for a public cloud. The response time of the uncompressed scheme is best shown in fig. 3 because it does not take any compression or decompression time and the transmission speed is faster, but the storage cost of the scheme is highest because it stores the original file. By adopting the scheme, the response time is influenced by the response limit time cloudLim of the public cloud, but under the influence of the response limit time cloudLim of different public clouds, the response time is smaller than that of a scheme only adopting an LZW compression method or an LZMA compression method.
FIG. 4 is a graph comparing average response time in units of s and storage cost in units of $/MB for a public cloud. Fig. 4 shows that as the response limit time cloudLim of the public cloud becomes more relaxed, the scheme tends to select a compression method that saves more storage cost. Therefore, as the response time increases, the storage cost becomes smaller.
The combined results of the private cloud and the public cloud are then considered together. Comparing results of using the scheme at both ends of the private cloud and the public cloud, and comparing results of using the scheme at only one end of the private cloud and the public cloud and using a single algorithm at the other end of the private cloud and the public cloud.
Fig. 5 is a comparison graph of response times of various combinations of the present scheme, where the public cloud has an average response time unit of s, and in fig. 5, before "+" indicates a compression scheme of the private cloud, after "+" indicates a compression scheme of the public cloud, and Auto indicates the present patent scheme. The combined scheme of 'Auto + Auto' performs most prominently, and is improved by 40% compared with the combined scheme of 'LZW + Auto'. Since the public cloud adopts the scheme, the main factor influencing the response time is the compression method of the private cloud. Therefore, the scheme has better performance in private cloud.
Fig. 6 is a comparison graph of response time of various combinations of the private cloud and the present solution, where the unit of the response time is s, in fig. 6, before "+" indicates a compression scheme of the private cloud, after "+" indicates a compression scheme of the public cloud, and Auto indicates the present patent solution. The combined scenario of "Auto + Auto" performs most prominently. The combined scheme of Auto + LZMA also performs well because the compression method used for each file at both ends is similar for both combined schemes.
In conclusion, the scheme combines different compression methods, so that the performance of the private cloud end of the hybrid cloud is improved, and the cost of the public cloud end is saved. The private cloud makes a compression scheme decision for each missed request based on file characteristics and the list. The public cloud makes the compression scheme decision for all stored files. The scheme finds a set of compression methods to minimize the cost and meet the requirement of response limit time.
Simulations show that the method can improve performance by 50% and save cost by 75% compared with the traditional scheme (private cloud storage original file). The main reason is that the scheme does not adopt a single compression method, and a more appropriate compression scheme can be selected for the file.
The applicable parameters of the scheme are summarized as follows:
tcithe unit of the transmission time of the private cloud storage is s/MB;
tcothe unit of the transmission time of the public cloud storage is s/MB;
cacheSize, storage capacity of private cloud storage;
cloudLim, response limit time of public cloud;
Coststothe storage cost of cloud storage is $/MB;
Costtrnthe transmission cost of the cloud storage is $/MB;
sijthe size of the file i after compression by the compression method j;
wrciiaverage write-read ratio of the requested files in the private cloud;
wrcoiaverage write-read ratio of requested files in the public cloud;
freqiaccess frequency when file i accesses the public cloud;
alciiin the private cloud storage, a compression method used by the file i;
alcoiin a public cloud storage, a compression method used by the file i;
cjcompression time of compression method j;
djdecompression time for compression method j.
Furthermore, it should be understood that although the present description refers to embodiments, not every embodiment may contain only a single embodiment, and such description is for clarity only, and those skilled in the art should integrate the description, and the embodiments may be combined as appropriate to form other embodiments understood by those skilled in the art.

Claims (6)

1. A data storage method of a hybrid cloud is characterized by comprising the following steps:
step 1, establishing a hybrid cloud, wherein the hybrid cloud comprises a private cloud and a public cloud;
the private cloud comprises a private cloud storage and a cache manager; the public cloud comprises a public cloud storage and a cloud manager;
step 2, the cache manager processes a request from a user, and when the cache manager receives the user request, the cache manager checks whether a file corresponding to the request exists in the private cloud storage: if the file exists in the private cloud storage, allowing the user to directly access; if the file does not exist in the private cloud storage, the cache manager requests the cloud manager to download the file, and then the next step is carried out;
step 3, after receiving a file downloading request of the cache manager, the cloud manager accesses the public cloud storage and downloads the file to the cache manager;
step 4, after the file is downloaded and before the file is placed into the private cloud storage, the cache manager judges whether the file needs to be compressed or not and determines a specific compression method adopted when the file needs to be compressed;
step 4 comprises the following steps:
step 4a, the cache manager writes and reads the average wrci of the requested files according to the private cloudiA default compression method is provided for downloaded files: if wrciiIf the compression rate is more than 0.5, the file is more suitable for writing, the default compression method is set to be the LZW compression method, and otherwise, the default compression method is set to be the LZMA compression method;
step 4b, after compressing the downloaded file and before storing the compressed file in the private cloud storage, the cache manager determines the file to be deleted in the private cloud storage; the cache manager sets up two lists: a compressed list and an uncompressed list; the compression list represents a file list which is required to be deleted from the private cloud storage after the downloaded file is compressed and stored in the cache manager; the uncompressed list represents a file list which is required to be deleted from the private cloud storage after the downloaded file is uncompressed and stored in the cache manager; the number of files in the uncompressed list is not less than the number of files in the compressed list;
step 4c, classifying the files in the uncompressed list and not in the compressed list as a kickfree list; a kickfree list indicating files that are free from deletion because the downloaded files are compressed;
the overall time saved with compression is denoted by v, which is calculated as: v = vco−vci
Wherein v iscoIndicating the time saved after using the kick-free list, vciRepresents the response time taken with compression;
if v is greater than 0, the whole is time-saving after compression is adopted, so that the cache manager compresses the downloaded file and stores the compressed file in the private cloud storage; otherwise, the downloaded file is directly stored in the private cloud storage without being compressed;
and 5, the cloud manager determines the compression method of all files saved in the public cloud storage.
2. The data storage method of a hybrid cloud according to claim 1, wherein the private cloud storage serves as cache storage; when the corresponding file requested by the cache manager is lost, the cache manager selects a proper compression method for the file after downloading the file; the public cloud storage is used as backup storage; the cloud manager adopts a combined compression method for all files, and finds the compression method with the minimum storage cost under the constraint of set response time.
3. The data storage method of the hybrid cloud according to claim 1, wherein v in step 4ccoThe calculation formula (c) is as follows: when compression method J is alcoiWhen the temperature of the water is higher than the set temperature,
vco=∑i∈AvoidKick[siJ×(tco+dJ)−siJ×(tci+cJ×wrcii+dJ×(1−wrcii))];
wherein alcoiThe compression method used by the file i in the public cloud storage is shown; avoidkisk, representing a kickfree list; siJThe size of the file i after compression by the compression method J is represented; t is tcoThe unit of the transmission time of the public cloud storage is s/MB; dJThe decompression time of compression method J is represented; t is tciThe unit of the transmission time of the private cloud storage is s/MB; c. CJRepresents the compression time of compression method J; wrciiAnd represents the average write-read ratio of the requested file in the private cloud.
4. The data storage method of claim 3, wherein v is the number of the data in step 4cciThe calculation formula (c) is as follows: when i is the downloaded file, the compression method j is alciiWhen the temperature of the water is higher than the set temperature,
vci=sij×[tci+cj×wrcii+dj×(1−wrcii)]−si0×tci
wherein alciiThe compression method used by the file i in the private cloud storage is represented; sijRepresenting the size of the file i after compression by the compression method j; t is tciThe unit of the transmission time of the private cloud storage is s/MB; c. CjRepresents the compression time of compression method j; wrciiRepresenting an average write-read ratio of the requested files in the private cloud; djIndicating the decompression time of compression method j; si0Indicating the uncompressed size of file i.
5. The data storage method of the hybrid cloud according to claim 1, wherein in the step 5, the files in the public cloud storage are classified, the time of each file which is not accessed is calculated through a timer, the files which are not accessed and have the time exceeding an empirical value are classified as silent files, and the rest files are classified as active files; using a CM compression method for silent files; a solution to the linear programming problem for active files, i.e. minimizing the storage cost of public cloud memory with bounded public cloud average response times.
6. The data storage method of the hybrid cloud according to claim 5, wherein the solution of the linear programming problem is represented by the following mathematical formula:
minimizing storage costs:
Figure 897442DEST_PATH_IMAGE001
the conditions to be satisfied are:
Figure 345741DEST_PATH_IMAGE002
among them, CoststoThe storage cost of the cloud storage is represented, and the unit is $/MB; sijThe size of the file i after compression by the compression method j; freq (total number of bits)iRepresenting the access frequency of the file i when accessing the public cloud; t is tcoThe unit of the transmission time of the public cloud storage is s/MB; c. CjRepresents the compression time of compression method j; wrcoiRepresenting an average write-read ratio of the requested file in the public cloud; djIndicating the decompression time of compression method j; cloudLim, which represents the response limit time of the public cloud.
CN202111313263.4A 2021-11-08 2021-11-08 Data storage method of hybrid cloud Active CN113742304B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111313263.4A CN113742304B (en) 2021-11-08 2021-11-08 Data storage method of hybrid cloud

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111313263.4A CN113742304B (en) 2021-11-08 2021-11-08 Data storage method of hybrid cloud

Publications (2)

Publication Number Publication Date
CN113742304A CN113742304A (en) 2021-12-03
CN113742304B true CN113742304B (en) 2022-02-15

Family

ID=78727719

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111313263.4A Active CN113742304B (en) 2021-11-08 2021-11-08 Data storage method of hybrid cloud

Country Status (1)

Country Link
CN (1) CN113742304B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230259406A1 (en) * 2022-02-14 2023-08-17 International Business Machines Corporation Workflow Data Redistribution in Hybrid Public/Private Computing Environments

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102263825B (en) * 2011-08-08 2014-08-13 浪潮电子信息产业股份有限公司 Cloud-position-based hybrid cloud storage system data transmission method
CN102523446B (en) * 2011-12-26 2014-06-04 南京鹏力系统工程研究所 Adaptive compression method of radar video in vessel traffic navigation system
CN106210015B (en) * 2016-07-05 2019-12-31 福州大学 Cloud storage method for hot data caching in hybrid cloud structure
CN107678685B (en) * 2017-09-11 2020-01-17 清华大学 Key value storage management method based on flash memory storage path optimization
WO2020252614A1 (en) * 2019-06-17 2020-12-24 Beijing Voyager Technology Co., Ltd. Systems and methods for data processing
CN112764686A (en) * 2021-01-26 2021-05-07 东北大学 Big data processing system energy-saving method based on data compression

Also Published As

Publication number Publication date
CN113742304A (en) 2021-12-03

Similar Documents

Publication Publication Date Title
US7895242B2 (en) Compressed storage management
US7188227B2 (en) Adaptive memory compression
US8447948B1 (en) Dynamic selective cache compression
US7617362B2 (en) System for balancing multiple memory buffer sizes and method therefor
CN109005056B (en) CDN application-based storage capacity evaluation method and device
US6349375B1 (en) Compression of data in read only storage and embedded systems
US20120303905A1 (en) Method and apparatus for implementing cache
US10037270B2 (en) Reducing memory commit charge when compressing memory
CN113742304B (en) Data storage method of hybrid cloud
CN112825023A (en) Cluster resource management method and device, electronic equipment and storage medium
CN111625515A (en) Method and device for processing operation request of aggregated small files
CN111930305A (en) Data storage method and device, storage medium and electronic device
CN113655969A (en) Data balanced storage method based on streaming distributed storage system
CN109831476A (en) Installation kit method for down loading, device, electronic equipment and storage medium
US8751750B2 (en) Cache device, data management method, program, and cache system
CN109325001B (en) Method, device and equipment for deleting small files based on metadata server
CN111857574A (en) Write request data compression method, system, terminal and storage medium
CN112925472A (en) Request processing method and device, electronic equipment and computer storage medium
CN108234552B (en) Data storage method and device
CN115951832A (en) Method and system for merging intelligent small files aiming at object storage
US10992743B1 (en) Dynamic cache fleet management
CN111143161B (en) Log file processing method and device, storage medium and electronic equipment
KR102195239B1 (en) Method for data compression transmission considering bandwidth in hadoop cluster, recording medium and device for performing the method
CN107580060B (en) Mobile terminal warehouse-splitting caching method
CN112800123A (en) Data processing method, data processing device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant