CN112860189A - Cost-driven cold and hot layered cloud storage redundancy storage method and system - Google Patents

Cost-driven cold and hot layered cloud storage redundancy storage method and system Download PDF

Info

Publication number
CN112860189A
CN112860189A CN202110189368.7A CN202110189368A CN112860189A CN 112860189 A CN112860189 A CN 112860189A CN 202110189368 A CN202110189368 A CN 202110189368A CN 112860189 A CN112860189 A CN 112860189A
Authority
CN
China
Prior art keywords
data
cold
hot
cloud storage
cloud
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110189368.7A
Other languages
Chinese (zh)
Other versions
CN112860189B (en
Inventor
潘丽
刘明宇
刘士军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University
Original Assignee
Shandong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University filed Critical Shandong University
Priority to CN202110189368.7A priority Critical patent/CN112860189B/en
Publication of CN112860189A publication Critical patent/CN112860189A/en
Application granted granted Critical
Publication of CN112860189B publication Critical patent/CN112860189B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0644Management of space entities, e.g. partitions, extents, pools
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1004Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's to protect a block of data words, e.g. CRC or checksum
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1469Backup restoration techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/0647Migration mechanisms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/0652Erasing, e.g. deleting, data cleaning, moving of data to a wastebasket
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0206Price or cost determination based on market factors

Abstract

The present disclosure provides a cost-driven cold and hot layered cloud storage redundancy storage method and system, the method comprising: encoding data uploaded by a user through erasure codes (m, n), dividing the data into n data blocks, and then uploading the n data blocks to n cloud storages respectively; for data requested to be downloaded by a user, firstly, selecting m cloud storages with the lowest access price from the n cloud storages, respectively downloading data blocks in the m cloud storages, and then recovering original data; for data stored in cloud storage, the storage layer of the data is dynamically adjusted periodically according to the heat of the data to reduce cost. According to the scheme, the risk that a user uses cloud storage is avoided, a proper storage layer can be selected for data in the cloud in a self-adaptive mode according to the data heat degree, the problem that the data cannot be accessed or even lost due to the fact that one or more cloud storage services are unavailable is solved, and meanwhile the cost of using the cloud storage is optimized.

Description

Cost-driven cold and hot layered cloud storage redundancy storage method and system
Technical Field
The disclosure belongs to the technical field of cloud storage, and particularly relates to a cost-driven cold and hot layered cloud storage redundant storage method and system.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
In recent years, cloud storage technologies and services have rapidly developed. The cloud storage provides high reliability and high durability, ensures data safety, can realize scale effect and elastic expansion, and effectively reduces operation cost. By storing the data in the cloud, a user does not need to purchase a device self-built data center, a large amount of device purchase and maintenance cost can be saved, complex setting and management tasks are simplified, and the user can conveniently access the data at any time and any place.
In order to improve competitiveness and attract more users, different pricing policies and service modes are established by different cloud storage service providers, and as user demands and technologies develop, the cloud storage service providers may change the pricing policies or provide new service modes. For example, data stored in the cloud may be divided into cold data and hot data according to the access frequency thereof, and thus the access frequency may also be referred to as the heat of the data. The cold data is hot and the hot data is hot. In this regard, most cloud storage service providers provide cold and hot tiered storage services. For hot layers, it charges a lower access fee, but charges a higher storage fee, suitable for storing data that is frequently accessed; for the cold layer, it charges a lower storage fee, but charges a higher access fee, suitable for storing data that is infrequently accessed. The user can select an appropriate storage layer according to the popularity of the data to reduce the cost.
The inventors have found that, although migrating data into the cloud and selecting an appropriate storage tier for the data may result in a cost reduction for the user, the user may face the following technical problems and difficulties when using the cloud to store and select the storage tier:
(1) although cloud storage services provide high availability, there is no guarantee that availability is 100%. The situation that the service is unavailable occurs in various big cloud service providers at home and abroad. For example, Aliyun crashed around 30 points at 9 points on 21 days 6 months by 2015 and recovered as late as 22 points; the downtime of google cloud, Azure, AmazonAWS in 2016 for the entire year was 47 minutes, 270 minutes, 108 minutes, respectively; amazon S3(simple storage service) has service interruption in 2017, namely No. 2 and No. 28, and causes slow loading of user applications such as Netflix, Reddit, Adobe, and Imgur; google makes all systems and services that rely on Google authentication unusable due to authentication system problems on 12/14/2020. Once the cloud service is interrupted and crashed, normal business of a user is affected, and even data stored in the cloud is lost, which inevitably brings great economic loss to the user. In order to prevent the problem, a user may store data in a plurality of cloud storage redundantly, so that even if one cloud storage service is interrupted or data is lost, the user may ensure normal operation of a service through other cloud storage services. But storing the same data in multiple cloud stores will inevitably result in a doubling of usage costs. Therefore, finding a cost-driven data redundancy storage method across multiple cloud storage is an urgent technical problem to be solved.
(2) The cloud storage service provider usually has a "provider locking" characteristic, so that a user is prompted to use a certain cloud storage service all the time, and the user is prevented from switching to other cloud storage services in the midway. That is, if a user stores data in one cloud storage, previous cloud storage service providers would charge a high fee when they want to migrate the data to other cloud storage. This feature may prevent users from migrating data to lower priced cloud storage.
(3) When a user selects a storage tier based on the heat of the data, the user is often unable to select the exact storage tier because the user does not know the future heat of the data. Depending on the pricing strategy of the different cold and hot storage tiers, more costs may be incurred if the selected storage tier does not match the data heat. In addition, the switching of the storage layer also incurs a certain cost. Therefore, the user cannot blindly switch the storage layer of the data, which otherwise not only does not achieve the purpose of saving the cost, but also generates more cost.
Disclosure of Invention
In order to solve the above problems, the present disclosure provides a cost-driven cold and hot layered cloud storage redundancy storage method and system, which can store data of a user in multiple cloud storages in blocks to improve reliability and availability, and can effectively reduce "provider locking" cost when data is migrated to a new cloud storage; meanwhile, the data blocks in each cloud storage can be transferred between the cold and hot layers according to the heat degree of the data blocks, and the cost of using the cloud storage can be effectively reduced.
According to a first aspect of the embodiments of the present disclosure, there is provided a cost-driven hot and cold hierarchical cloud storage redundant storage method, including:
encoding data uploaded by a user by using an erasure code, dividing the data into n data blocks, and uploading the n obtained data blocks to n cloud storages respectively;
for data requested by a user, selecting m cloud storages with the lowest access price from the n cloud storages, downloading corresponding data blocks in the cloud storages, recovering the data blocks into original data, and returning the original data to the user;
and for each data block in the cloud storage, selecting a proper storage layer for the data according to the data heat by using a data migration algorithm of cold and hot sensing to aim at reducing the cost of using the cloud storage.
Further, the data migration algorithm of cool-heat perception comprises the following steps:
calculating the access times beta of the last delta t day by using the historical access records of the data blocks executed by the algorithmΔ
If the data block executed by the algorithm is stored in the hot layer, calculating the reference value beta of the access times when the costs of delta t days generated by the hot layer and the cold layer are equalhIf beta isΔ<βhIf not, the data block is continuously stored in the hot layer;
if the data block executed by the algorithm is stored in the cold layer, calculating the reference value beta of the access times when the costs of delta t days generated under the decision of the cold layer and the decision of transferring to the hot layer are equalcIf beta isΔ>βcThen the data block is migrated to hotLayer, otherwise, it is stored in the cold layer.
Further, the reference value of the number of accesses is βhAnd betacRespectively satisfying the following formulas:
Sh(Δt)+Ahh)=Th→c+Sc(Δt)+Ach)
Sc(Δt)+Ac(βc)=Tc→h+Sh(Δt)+Ahc)
wherein S ish(Δ t) and Sc(Δ t) represents the storage costs incurred by storing the Δ t duration in the hot and cold layers, respectively, Ahh) And Ach) Indicating the occurrence of beta in the hot and cold layers, respectivelyhAccess charges incurred by secondary access, Ahc) And Acc) Indicating the occurrence of beta in the hot and cold layers, respectivelycAccess charges incurred by secondary access, Th→cAnd Tc→hRepresenting the costs incurred in migrating data from hot to cold and from cold to hot layers, respectively.
According to a second aspect of the embodiments of the present disclosure, there is provided a cost-driven hot and cold tiered cloud storage redundant storage system, comprising: the system comprises a cloud storage price acquisition module, an application programming interface server, a data segmentation module, a cloud storage management module, a data migration module and a log module; wherein the content of the first and second substances,
the cloud storage price acquisition module periodically acquires pricing of a plurality of cloud storage services through a crawler program or an SDK (software development kit) provided by a cloud storage service provider and stores the pricing into a database;
the application programming interface server provides an accessible programming interface compatible with AmazonS3 protocol for the user, receives the user request, distributes and processes the user request, and responds to the user request;
the data segmentation module encodes data uploaded by a user by using an erasure code, divides the data into n data blocks, calculates the meta information of each data block and completes the data blocks;
the cloud storage management module is used for managing a plurality of cloud storage services and is responsible for uploading data blocks to corresponding cloud storage or downloading required data blocks from the cloud storage;
the data migration module is used for acquiring request history of data/data blocks from the operation log, calculating the heat of the data blocks, and migrating the data blocks stored in each cloud storage among cold and hot layers by using a cold and hot sensing data migration algorithm so as to save cost;
the log module records the operation history of the user, and the operation history comprises the operation of requesting an application programming interface server by the user, the operation of uploading and downloading data blocks by the cloud storage management module, and the operation of transferring data between a cold layer and a hot layer by the data transfer module.
Further, the cloud storage management module is configured to manage a plurality of cloud storage services, where the plurality of cloud storage services include: selecting a cloud storage service to be used in a configuration file mode; adding a new cloud storage service supporting the AmazonS3 protocol by means of a configuration file; the cloud storage service which does not support the AmazonS3 protocol is seamlessly added into the system through the application programming interface provided by the system;
according to a third aspect of the embodiments of the present disclosure, a distributed deployment cost-driven hot and cold hierarchical cloud storage redundant storage system is provided, where each server includes a module in the system of the second aspect of the present disclosure, and each server node connects and communicates through a distributed technology, and distributes a user request to an application programming interface server on each server through a load balancing algorithm.
According to a fourth aspect of the embodiments of the present disclosure, there is provided an electronic device, including a memory, a processor, and a computer program stored in the memory and running on the memory, wherein the processor executes the computer program to implement the above-mentioned one cost-driven cold and hot hierarchical cloud storage redundancy storage method.
According to a fifth aspect of the embodiments of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a cost-driven cold-hot hierarchical cloud storage redundancy storage method as described above.
Compared with the prior art, the beneficial effect of this disclosure is:
(1) the method and the device for cloud storage greatly reduce the risk of using the cloud storage by the user. The data is stored in a plurality of different cloud storages in a blocking mode in an erasure code mode, and even if a few cloud storage services are unavailable at the same time, complete original data can be recovered through data blocks in the available cloud storages. The method and the device avoid the situation that normal business of a user is affected and even data is lost when the data is stored in one cloud storage and the service is unavailable.
(2) The present disclosure alleviates the high cost problem of switching cloud storage caused by "provider locking". According to the data storage method and device, the data are stored in the plurality of cloud storage in the blocking mode, when the cloud storage service provider is replaced, a certain block of data can be selected and migrated to the new cloud storage, and accordingly cost of provider locking is reduced in proportion.
(3) The method effectively utilizes the characteristic of cold and hot layer storage in cloud storage, adaptively selects a proper storage layer according to the data heat degree, reduces the use cost of the cloud storage, and effectively relieves the problem of cost rise caused by user autonomous decision making errors.
(4) The method and system of the present disclosure have the following advantages:
the system of the present disclosure can easily perform distributed deployment; the system disclosed by the disclosure is compatible with AmazonS3 protocol widely used in the industry at present, and a user can interface the service system with the system disclosed by the disclosure by only slightly changing the service system; the system disclosed by the disclosure can provide a cold and hot layer migration function without any intervention and aid decision-making by a user.
Advantages of additional aspects of the disclosure will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the disclosure.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure and are not to limit the disclosure.
Fig. 1 is a schematic flowchart of a cost-driven hot and cold hierarchical cloud storage redundancy storage method according to a first embodiment of the present disclosure;
FIG. 2 is a schematic flow chart illustrating a data migration algorithm for cold-hot sensing in the method according to the first embodiment of the present disclosure;
fig. 3 is a structural diagram of a cost-driven hot and cold hierarchical cloud storage redundant storage system according to a second embodiment of the disclosure;
fig. 4 is a structural diagram of a system architecture in which a cost-driven hot and cold hierarchical cloud storage redundant storage system is distributively deployed on two servers according to a third embodiment of the present disclosure.
Detailed Description
The present disclosure is further described with reference to the following drawings and examples.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present disclosure. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
The embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict.
The general idea proposed by the present disclosure: using erasure codes (m, n) to encode data uploaded by a user, dividing the data into n data blocks, and uploading the n data blocks to n cloud storages respectively; for data requested by a user, selecting m cloud storages with the lowest access price from the n cloud storages, acquiring m data blocks, recovering original data through the m data blocks, and returning the original data to the user; and for the data stored in the cloud, a cold and hot sensing data migration algorithm is applied, the data in the cloud is migrated between a cold layer and a hot layer according to the data heat, and a proper storage layer is selected to reduce the cost of using the cloud storage.
The first embodiment is as follows:
the embodiment aims to provide a cost-driven cold and hot layered cloud storage redundancy storage method.
A cost-driven cold and hot layered cloud storage redundancy storage method is shown in an execution flow chart of fig. 1 and comprises the following steps:
step 1: the method comprises the steps that a user selects n cloud storages according to the actual situation of the user, the Aries cloud, the Amazons3, the Microsoft azure cloud, the Baidu cloud, the Huacheng cloud, the Jingdong cloud, the Jinshan cloud, the Qiniu cloud, the Tencent cloud and the IBM cloud are selected in the embodiment, the price of the cloud storage services is periodically collected by using a crawler program or an SDK provided by a cloud storage service provider, and the price is stored in a database. In order to facilitate distributed deployment of the method and system disclosed by the present disclosure, a distributed deployed NoSQL database such as Redis or MongoDB may be used. Because persistent storage is not required and the problem of read-write efficiency is considered, the distributed Redis database based on memory storage is selected in the embodiment.
Step 2: encoding data uploaded by a user by using erasure codes (m, n), dividing the data into n data blocks, and respectively uploading the n data blocks to n cloud storages; the erasure code (m, n) represents that data is divided into n data blocks, and the original data can be recovered through m data blocks, and can be implemented by using popular algorithm libraries, such as fec (forward error correction code) and zfec; the present embodiment uses a library of zfec algorithms to erasure code the data.
Before erasure coding is carried out on data with the size of V by using a zfec algorithm library, in order to facilitate subsequent calculation, the data needs to be supplemented until n can eliminate V; this embodiment performs padding by padding the binary string of "\ 0" after the data. In addition, before and after erasure coding of the data, metadata of the data, including the size of the original data, MD5, and the number of each data block, needs to be additionally calculated, and binary coding of the metadata is spliced after each data block, so as to facilitate subsequent recovery of the original data according to m data blocks.
And step 3: for data requested by a user, firstly, ordering access prices of n cloud storages stored in a database, selecting m cloud storages with the lowest prices, downloading corresponding data blocks through an SDK (software development kit) provided by a cloud storage service, recovering the data blocks into original data, and returning the original data to the user. Before the original data is restored, the metadata behind each data block is first intercepted. According to the number of the corresponding data block in the metadata and the data block without the metadata, the zfec algorithm library can restore the original data. And intercepting the restored data according to the size of the original data in the metadata, and removing the complement to obtain the original data. And then calculating the MD5 of the original data, comparing the MD5 with the MD5 contained in the metadata, if the MD5 is the same, indicating that the recovered original data is correct, and if the MD5 is different, indicating that an error occurs in the data transmission or recovery process. If the recovered original data MD5 and the metadata MD5 are different, an error condition needs to be returned to the user, and the user may drop the request or initiate a new request as needed.
In addition, the time of each user request and the corresponding data name need to be written into the log, so that the subsequent algorithm can calculate the data heat conveniently. The log may be saved in a local file, or in a distributed file storage system, or in a database. For simplicity and convenience of distributed deployment, the embodiment stores the logs in a distributed Redis database.
And 4, step 4: for the data/data blocks stored in the cloud, the data heat is calculated periodically according to the request log, a cold-heat sensing data migration algorithm is operated, and an appropriate storage layer is selected for each data/data block in each cloud storage to reduce the use cost. The execution flow of the data migration algorithm for cold and hot sensing is shown in fig. 2, and includes the following steps:
the input of the algorithm is the access times beta of a certain data block in the past delta t daysΔAnd fromThe prices of the n cloud storages obtained from the database;
if the data block executed by the algorithm is stored in the hot layer, calculating the reference value beta of the access times when the costs of delta t days generated by the hot layer and the cold layer are equalhIf beta isΔ<βhIf the data block is not stored in the hot layer, the data block is migrated to the cold layer through the SDK provided by the cloud storage service, otherwise, the data block is stored in the hot layer continuously;
if the data block executed by the algorithm is stored in the cold layer, calculating the reference value beta of the access times when the costs of delta t days generated under the decision of the cold layer and the decision of transferring to the hot layer are equalcIf beta isΔ>βcAnd migrating the data block to the hot layer through the SDK provided by the cloud storage service, otherwise, continuously storing the data block in the cold layer.
Further, the reference value of the number of accesses is βhAnd betacRespectively satisfying the following formulas:
Sh(Δt)+Ahh)=Th→c+Sc(Δt)+Ach)
Sc(Δt)+Acc)=Tc→h+Sh(Δt)+Ahc)
wherein S ish(Δ t) and Sc(Δ t) represents the storage costs incurred by storing the Δ t duration in the hot and cold layers, respectively, Ahh) And Ach) Indicating the occurrence of beta in the hot and cold layers, respectivelyhAccess charges incurred by secondary access, Ahc) And Acc) Indicating the occurrence of beta in the hot and cold layers, respectivelycAccess charges incurred by secondary access, Th→cAnd Tc→hRepresenting the costs incurred in migrating data from hot to cold and from cold to hot layers, respectively.
Further, said Sh(Δt)、Sc(Δt)、Ahh)、Ach)、Ahc)、Acc)、Th→cAnd Tc→hThe calculation process is as follows:
Sh(Δt)=ΔtVSh
Sc(Δt)=ΔtVSc
Ahh)=Ghβh
Ach)=(Gc+RV)βh
Ahc)=Ghβc
Acc)=(Gc+RV)βc
Th→c=Pc
Tc→h=Ph+RV
the above formula is explained as follows:
the calculation process of the storage cost of certain data in certain cloud storage comprises the following steps: the storage duration is multiplied by the data size and multiplied by the storage unit price of the storage layer where the data is located;
the calculation process of the access cost of the certain data in the hot layer in the certain cloud storage comprises the following steps: the unit price of the read request of the hot layer is multiplied by the number of times of access of the hot layer;
the calculation process of the access cost of certain data in a certain cloud storage cold layer is as follows: (unit price of read request of cold layer + unit price of data retrieval × data size) × number of accesses occurring in cold layer;
the cost of migrating certain data from a hot layer to a cold layer in certain cloud storage is: unit price of performing write operation to cold layer;
the cost for transferring certain data from the cold layer to the hot layer in certain cloud storage is as follows: the unit price of a write operation performed on the hot layer + the cost of retrieving data from the cold layer; the cost of retrieving data from the cold layer is calculated as: data retrieval unit price × data size.
Further, said Sh、Sc、Gh、Gc、Ph、PcR and V, which respectively represent:
the unit price of data stored in a hot layer in a certain cloud storage is/GB/day;
the unit price of data stored in a cold layer in a certain cloud storage is/GB/day;
the unit price of the data in the hot-layer reading request in certain cloud storage is one time;
the unit of reading requests of data in a cold layer in certain cloud storage is one time;
the unit price of data write requests in the hot layer in certain cloud storage is one time;
the unit price of data write requests in a cold layer in certain cloud storage is one time;
the unit price of data retrieval in certain cloud storage is/GB;
the size of single data stored in a certain cloud storage is in GB.
The embodiment describes an implementation of the cost-driven cold and hot layered cloud storage redundancy storage method in detail, and when using the method disclosed by the present disclosure, a user is not limited to the implementation described in the embodiment, and may make appropriate adjustments according to own business and actual conditions.
Example two:
the embodiment aims to provide a cost-driven cold and hot layered cloud storage redundant storage system.
A cost-driven cold and hot layered cloud storage redundant storage system, as shown in fig. 3, shows an architecture diagram of the system, specifically, the system includes six modules, which are respectively: an Application Programming interface server (APIServer), a data segmentation module, a cloud storage management module, a log module, a data migration module and a price acquisition module.
The APIServer realizes a programming interface compatible with AmazonS3 protocol widely accepted and used in the industry, and provides a convenient access mode for users; the main functions of the system are that data uploaded by a user are received and transmitted to a data segmentation module for data blocking, a user request is received and distributed, and complete data of the user request is returned to the user; the present embodiment uses a non-blocking Tornado library to implement the desired functionality.
The data segmentation module uses a zfec algorithm library to perform erasure coding on the incoming data, the coding is configured as (m, n), in this embodiment, m is made to be n-1, that is, the data is coded into n data blocks, and the original data can be recovered through any n-1 data blocks; before erasure coding is performed on data, data completion and metadata calculation need to be performed, and a specific manner is as described in step 2 in embodiment one; and transmitting the n data blocks obtained by the erasure correcting codes to a cloud storage management module for data uploading.
The cloud storage management module is responsible for managing a cloud storage service configured by a user, and uploads and downloads data through an SDK provided by the cloud storage service, and the cloud storage service adopted by the embodiment comprises Aliskiu, AmazonS3, Microsoft Azure, Baidu cloud, Huacheng cloud, Jingdong cloud, Jinshan cloud, Qiniu cloud, Tencent cloud and IBM cloud; the cloud storage management module is also responsible for adding or deleting cloud storage service resources, and the adding and deleting processes can be completed by modifying the configuration file without any codes; if a user wants to add cloud storage services which are not compatible with the AmazonS3 protocol, only the application programming interface provided by the module needs to be realized, and the system source code disclosed by the disclosure does not need to be modified.
The log module records the request time of a user of the APIServer to certain data, the uploading/downloading time of the cloud storage management module, the size of a data block and the response result of the cloud storage service; the recorded logs can be stored in a local file, or in a distributed file storage system, or in a database, and the present embodiment stores the logs in a Redis database deployed in a distributed manner.
The cloud storage price acquisition module acquires the price of cloud storage by using an SDK provided by a cloud storage service provider, and for the cloud storage service which cannot acquire the price by the SDK, the cloud storage price acquisition module acquires the price by using a crawler program developed based on a Scapy framework; the embodiment stores the collected prices in a Redis database deployed in a distributed manner; generally speaking, the price of the cloud storage service is kept stable and unchanged for a long period of time, and even if price adjustment occurs, large change does not occur, so in this embodiment, the period for acquiring the price is set to 30 days, and a Web page for acquiring the price immediately is provided for a user, after the user receives a notification of the price change given to the user by a cloud storage service provider, the user can access the Web page through a browser on a mobile phone or a computer, and the updated price can be acquired immediately through simple interaction.
The data migration module is mainly responsible for executing a cold-hot sensing data migration algorithm, and the detailed implementation process of the algorithm can be referred to step 4 in the first embodiment; and after the data migration algorithm of cold and hot sensing calculates the storage layer where each data/data block is supposed to be according to the data hot degree, initiating a data migration request through the SDK.
Example three:
the embodiment aims to provide a distributed deployed cost-driven cold and hot layered cloud storage redundancy storage system.
A cost-driven, cold-hot layered cloud storage redundant storage system for distributed deployment. Fig. 4 shows an architectural diagram thereof.
In this embodiment, only the case of deploying the system of the present disclosure in two servers in a distributed manner is described, which can be easily generalized to the case of deploying in multiple servers, and the deployment steps are consistent with those of the two servers.
Each module of the system described in this embodiment is consistent with the configuration of the corresponding module in the second embodiment.
When the distributed deployment is carried out on the system, the system is only required to be deployed on a plurality of servers, and then the logs generated by the log modules of the system deployed on each server and the data acquired by the price acquisition module are written into the same database. In consideration of the problems of no need of persistent storage and read-write efficiency, the embodiment adopts a distributed deployed Redis database based on a memory.
When a user accesses the system which is deployed in a distributed mode and is disclosed by the invention, the request of the user is distributed to different servers through a load balancing algorithm; the load balancing algorithm can be realized by arranging an Nginx server.
In further embodiments, there is also provided:
an electronic device comprising a memory and a processor, and computer instructions stored on the memory and executed on the processor, the computer instructions when executed by the processor performing the method of embodiment one. For brevity, no further description is provided herein.
It should be understood that in this embodiment, the processor may be a central processing unit CPU, and the processor may also be other general purpose processors, digital signal processors DSP, application specific integrated circuits ASIC, off-the-shelf programmable gate arrays FPGA or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and so on. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory may include both read-only memory and random access memory, and may provide instructions and data to the processor, and a portion of the memory may also include non-volatile random access memory. For example, the memory may also store device type information.
A computer readable storage medium storing computer instructions which, when executed by a processor, perform the method of embodiment one.
The method in the first embodiment may be directly implemented by a hardware processor, or may be implemented by a combination of hardware and software modules in the processor. The software modules may be located in ram, flash, rom, prom, or eprom, registers, among other storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor. To avoid repetition, it is not described in detail here.
Those of ordinary skill in the art will appreciate that the various illustrative elements, i.e., algorithm steps, described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.
One or more of the above embodiments have the following technical effects:
(1) the risk of using cloud storage by a user is greatly reduced. The risk of putting eggs all in one basket is avoided. The data is stored in a plurality of cloud storages in a blocking mode through erasure coding, and even if one or a plurality of cloud storage services are unavailable, a user can restore original data through data blocks in the rest available cloud storages.
(2) The characteristics of cold and hot layer storage in cloud storage are fully utilized, a proper storage layer is selected in a self-adaptive mode according to the data heat degree, and the use cost of the cloud storage is reduced.
(3) Compatible with AmazonS3 protocol widely used in industry. Users can easily migrate from other cloud storage to the system described in this disclosure even without any encoding.
(4) Distributed deployment can be easily carried out, and the increased business requirements of users are met.
(5) Cloud storage used in the system of the present disclosure is easily added or deleted, and "provider lockout" costs are effectively reduced. Since each cloud storage stores data blocks that are much smaller than the original data, the "provider lock" cost that users need to pay when migrating a data block in a cloud storage to a new cloud storage also decreases proportionally.
The cost-driven cold and hot layered cloud storage redundant storage method and system provided by the embodiment can be realized, and have a wide application prospect.
The above description is only a preferred embodiment of the present disclosure and is not intended to limit the present disclosure, and various modifications and changes may be made to the present disclosure by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.
Although the present disclosure has been described with reference to specific embodiments, it should be understood that the scope of the present disclosure is not limited thereto, and those skilled in the art will appreciate that various modifications and changes can be made without departing from the spirit and scope of the present disclosure.

Claims (10)

1. A cost-driven cold and hot layered cloud storage redundancy storage method is characterized by comprising the following steps:
encoding data uploaded by a user by using an erasure code, dividing the data into n data blocks, and uploading the n obtained data blocks to n cloud storages respectively;
for data requested by a user, selecting m cloud storages with the lowest access price from the n cloud storages, downloading corresponding data blocks in the cloud storages, recovering the data blocks into original data, and returning the original data to the user;
and for each data block in the cloud storage, selecting a proper storage layer for the data according to the data heat by using a data migration algorithm of cold and hot sensing to aim at reducing the cost of using the cloud storage.
2. The cost-driven, cold-hot tiered cloud storage redundancy storage method of claim 1, wherein the erasure code is:
and encoding the data by using an erasure code algorithm, dividing the data into n data blocks, and recovering the original data by using m of the n data blocks.
3. The cost-driven cold-hot layered cloud storage redundant storage method according to claim 1, wherein the cold-hot aware data migration algorithm needs to be executed periodically, and each execution cycle comprises the following steps:
number of visits beta to past delta t days to acquire dataΔ
If the data is stored in the hot layer, calculating the reference value beta of the number of visits when the costs of delta t days generated by the hot layer and the cold layer respectively are equalhIf beta isΔ<βhThen the data is migrated to the cold layer,otherwise, keeping on the hot layer;
if the data is stored in the cold layer, calculating the reference value beta of the number of visits when the costs of delta t days generated by the data under the decision of the cold layer and the data under the decision of the data migration to the hot layer are equalcIf beta isΔ>βcThen the data is migrated to the hot layer, otherwise it continues to remain in the cold layer.
4. The cost-driven cold-hot layered cloud storage redundant storage method according to claim 1, wherein the reference value β of the number of accesses in the cold-hot aware data migration algorithmhAnd betacThe following formula is satisfied:
Sh(Δt)+Ahh)=Th→c+Sc(Δt)+Ach)
Sc(Δt)+Acc)=Tc→c+Sh(Δt)+Ahc)
wherein S ish(Δ t) and Sc(Δ t) represents the storage costs incurred in the storage of Δ t days in the hot and cold layers, respectively, Ahh) And Ach) Indicating the occurrence of beta in the hot and cold layers, respectivelyhAccess charges incurred by secondary access, Ahc) And Acc) Indicating the occurrence of beta in the hot and cold layers, respectivelycAccess charges incurred by secondary access, Th→cAnd Tc→hRepresenting the costs incurred in migrating data from hot to cold and from cold to hot layers, respectively.
5. A cost-driven, cold-hot layered cloud storage redundancy storage system, comprising:
an application programming interface server configured to: receiving data uploaded by a user, receiving a data downloading request of the user and returning data requested by the user;
a data partitioning module configured to: encoding and blocking data uploaded by a user by using an erasure code;
a cloud storage management module configured to: uploading and downloading data/data blocks, and managing cloud storage service resources;
a log module configured to: collecting and storing records of data downloading requested by a user;
a price collection module configured to: collecting prices of a plurality of cloud storage services using a crawler program or an SDK provided by a cloud service provider;
and the data migration module is configured to periodically and dynamically calculate a proper storage layer of the data according to the data heat degree and make a reservation or migration decision.
6. The cost-driven, hot and cold tiered, cloud storage redundant storage system of claim 5 wherein the cloud storage management module uploads and downloads data/data blocks from the cloud using a cloud storage service provider's SDK, and adds and deletes cloud storage service resources used in the system using profiles.
7. The cost-driven cold and hot layered cloud storage redundant storage system according to claim 5, wherein the data migration module takes as input a request log recorded by the log module and a price of the cloud storage service collected by the price collection module, calculates a storage tier corresponding to the heat of the data by using a cold and hot sensing data migration algorithm, and performs data migration through the SDK.
8. A distributed deployment cost-driven, cold-hot layered cloud storage redundant storage system, characterized in that each server comprises modules in the system according to any one of claims 5-8, the respective server nodes are connected and communicate via a distributed technology, and user requests are distributed to application programming interface servers on each server via a load balancing algorithm.
9. An electronic device comprising a memory and a processor and computer instructions stored on the memory and executed on the processor, the computer instructions when executed by the processor implementing a cost-driven, cold-hot hierarchical cloud storage redundancy storage method according to any one of claims 1 to 4.
10. A computer readable storage medium storing computer instructions which, when executed by a processor, implement a cost-driven cold-hot layered cloud storage redundancy storage method according to any one of claims 1 to 4.
CN202110189368.7A 2021-02-19 2021-02-19 Cost-driven cold and hot layered cloud storage redundancy storage method and system Active CN112860189B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110189368.7A CN112860189B (en) 2021-02-19 2021-02-19 Cost-driven cold and hot layered cloud storage redundancy storage method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110189368.7A CN112860189B (en) 2021-02-19 2021-02-19 Cost-driven cold and hot layered cloud storage redundancy storage method and system

Publications (2)

Publication Number Publication Date
CN112860189A true CN112860189A (en) 2021-05-28
CN112860189B CN112860189B (en) 2022-12-30

Family

ID=75989648

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110189368.7A Active CN112860189B (en) 2021-02-19 2021-02-19 Cost-driven cold and hot layered cloud storage redundancy storage method and system

Country Status (1)

Country Link
CN (1) CN112860189B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114004979A (en) * 2021-11-05 2022-02-01 江苏赞奇科技股份有限公司 High-cost-performance data storage method and system in cloud rendering
CN114422600A (en) * 2021-12-31 2022-04-29 成都鲁易科技有限公司 File scheduling system based on cloud storage and file scheduling method based on cloud storage

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102904934A (en) * 2012-09-25 2013-01-30 东莞宇龙通信科技有限公司 Terminal data transmission method and system thereof
CN103095847A (en) * 2013-02-04 2013-05-08 华中科技大学 Cloud storage safety-ensuring method and system thereof
CN103944981A (en) * 2014-04-14 2014-07-23 中国科学院计算技术研究所 Cloud storage system and implement method based on erasure code technological improvement
CN106462605A (en) * 2014-05-13 2017-02-22 云聚公司 Distributed secure data storage and transmission of streaming media content
CN107251523A (en) * 2015-12-29 2017-10-13 深圳大学 Date storage method, integrality detection method and device, terminal device based on cloud service
CN107908360A (en) * 2017-11-09 2018-04-13 郑州云海信息技术有限公司 A kind of data-storage system and date storage method based on mixing cloud storage
CN108600316A (en) * 2018-03-23 2018-09-28 深圳市网心科技有限公司 Data managing method, system and the equipment of cloud storage service
CN109358821A (en) * 2018-12-12 2019-02-19 山东大学 A kind of cold and hot data store optimization method of cloud computing of cost driving
CN110084049A (en) * 2019-04-18 2019-08-02 湖北工业大学 A kind of medical data protection and access system and method based on cloudy end
CN110636141A (en) * 2019-10-17 2019-12-31 中国人民解放军陆军工程大学 Multi-cloud storage system based on cloud and mist cooperation and management method thereof
CN111930599A (en) * 2020-09-29 2020-11-13 北京海联捷讯科技股份有限公司 Operation and maintenance data processing method and device of cloud service system and storage medium
CN112000523A (en) * 2020-08-25 2020-11-27 浪潮云信息技术股份公司 Cloud backup system and method

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102904934A (en) * 2012-09-25 2013-01-30 东莞宇龙通信科技有限公司 Terminal data transmission method and system thereof
CN103095847A (en) * 2013-02-04 2013-05-08 华中科技大学 Cloud storage safety-ensuring method and system thereof
CN103944981A (en) * 2014-04-14 2014-07-23 中国科学院计算技术研究所 Cloud storage system and implement method based on erasure code technological improvement
CN106462605A (en) * 2014-05-13 2017-02-22 云聚公司 Distributed secure data storage and transmission of streaming media content
CN107251523A (en) * 2015-12-29 2017-10-13 深圳大学 Date storage method, integrality detection method and device, terminal device based on cloud service
CN107908360A (en) * 2017-11-09 2018-04-13 郑州云海信息技术有限公司 A kind of data-storage system and date storage method based on mixing cloud storage
CN108600316A (en) * 2018-03-23 2018-09-28 深圳市网心科技有限公司 Data managing method, system and the equipment of cloud storage service
CN109358821A (en) * 2018-12-12 2019-02-19 山东大学 A kind of cold and hot data store optimization method of cloud computing of cost driving
CN110084049A (en) * 2019-04-18 2019-08-02 湖北工业大学 A kind of medical data protection and access system and method based on cloudy end
CN110636141A (en) * 2019-10-17 2019-12-31 中国人民解放军陆军工程大学 Multi-cloud storage system based on cloud and mist cooperation and management method thereof
CN112000523A (en) * 2020-08-25 2020-11-27 浪潮云信息技术股份公司 Cloud backup system and method
CN111930599A (en) * 2020-09-29 2020-11-13 北京海联捷讯科技股份有限公司 Operation and maintenance data processing method and device of cloud service system and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘明宇,潘丽,刘士军: ""To Transfer or Not: An Online Cost Optimization Algorithm for Using Two-Tier Storage-as-a-Service Clouds"", 《IEEE》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114004979A (en) * 2021-11-05 2022-02-01 江苏赞奇科技股份有限公司 High-cost-performance data storage method and system in cloud rendering
CN114004979B (en) * 2021-11-05 2023-09-01 江苏赞奇科技股份有限公司 High-cost performance data storage method and system in cloud rendering
CN114422600A (en) * 2021-12-31 2022-04-29 成都鲁易科技有限公司 File scheduling system based on cloud storage and file scheduling method based on cloud storage
CN114422600B (en) * 2021-12-31 2023-11-07 成都鲁易科技有限公司 File scheduling system based on cloud storage and file scheduling method based on cloud storage

Also Published As

Publication number Publication date
CN112860189B (en) 2022-12-30

Similar Documents

Publication Publication Date Title
CN103152395B (en) A kind of storage means of distributed file system and device
US11500852B2 (en) Database system with database engine and separate distributed storage service
US10412170B2 (en) Retention-based data management in a network-based data store
US10474547B2 (en) Managing contingency capacity of pooled resources in multiple availability zones
CN110609797B (en) Page cache entry for block-based storage
KR101862718B1 (en) Reducing data volume durability state for block-based storage
US9672110B1 (en) Transmission time refinement in a storage system
CN105190533B (en) Snapshot in situ
CN105190623B (en) Log record management
KR101771246B1 (en) System-wide checkpoint avoidance for distributed database systems
CN112860189B (en) Cost-driven cold and hot layered cloud storage redundancy storage method and system
US9699017B1 (en) Dynamic utilization of bandwidth for a quorum-based distributed storage system
CN112948171A (en) Data processing method and device, terminal equipment and computer readable storage medium
US11579981B2 (en) Past-state backup generator and interface for database systems
US11243705B2 (en) Method and system for policy class based data migration
CN104541252A (en) Server-based hierarchical mass storage system
US10969962B2 (en) Compacting data in a dispersed storage network
WO2014194188A1 (en) Adjusting dispersed storage network traffic due to rebuilding
US20220276990A1 (en) Light weight redundancy tool for performing transactions
US10223184B1 (en) Individual write quorums for a log-structured distributed storage system
US11334456B1 (en) Space efficient data protection
US11336723B1 (en) Replicating data volume updates from clients accessing the data volume across fault tolerance zones
US10254980B1 (en) Scheduling requests from data sources for efficient data decoding
Shang et al. Distributed cache strategy based on LT codes under spark platform
CN116974482A (en) ESS elastic storage system, method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant