CN107249035B - Shared repeated data storage and reading method with dynamically variable levels - Google Patents

Shared repeated data storage and reading method with dynamically variable levels Download PDF

Info

Publication number
CN107249035B
CN107249035B CN201710506611.7A CN201710506611A CN107249035B CN 107249035 B CN107249035 B CN 107249035B CN 201710506611 A CN201710506611 A CN 201710506611A CN 107249035 B CN107249035 B CN 107249035B
Authority
CN
China
Prior art keywords
throughput
tenant
priority
tenants
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710506611.7A
Other languages
Chinese (zh)
Other versions
CN107249035A (en
Inventor
谭玉娟
赵亚军
晏志超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University
Original Assignee
Chongqing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University filed Critical Chongqing University
Priority to CN201710506611.7A priority Critical patent/CN107249035B/en
Publication of CN107249035A publication Critical patent/CN107249035A/en
Application granted granted Critical
Publication of CN107249035B publication Critical patent/CN107249035B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1095Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0876Network utilisation, e.g. volume of load or congestion level
    • H04L43/0888Throughput
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/60Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources
    • H04L67/61Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources taking into account QoS or priority requirements

Abstract

The invention provides a hierarchical dynamically variable shared repeated data storage and reading mechanism for a cloud backup system, and is suitable for hierarchical and fair service quality requirements of the cloud backup system. The method is organically matched with the repeated data deleting technology, different service quality strategies are provided for tenants of different grades, the repeated deleting rate of the system is improved, and the optimal service effect is achieved. The invention ensures that the tenant enjoys fair and graded service quality in the cloud backup system environment.

Description

Shared repeated data storage and reading method with dynamically variable levels
Technical Field
The invention belongs to the technical field of computer information storage, and provides a shared repeated data storage and reading method with dynamically variable grades for meeting the service quality requirement of a cloud backup system for deleting repeated data of multiple tenants, so that fair grade-guaranteed service quality is provided for the tenants.
Background
In the cloud backup system, different tenants can purchase corresponding backup resources to a cloud backup provider according to respective service requirements, and the tenants need the cloud backup system to provide fair and level-guaranteed service quality. With the application of the data de-duplication technology, the difficulty of maintaining the service quality of the data de-duplication technology is aggravated by sharing the data de-duplication, and the requirement of the tenant on the hierarchical service cannot be met only by the traditional method.
Aiming at the problems of the cloud backup system, the invention provides a shared repeated data storage and reading method with dynamically variable grades, so that the fairness of tenants is guaranteed and the graded service quality is achieved. Different from the existing method, the method is a hierarchical service quality control method based on data de-duplication, the resource allocation and the throughput monitoring adopt fine-grained control based on data blocks, the hierarchical service quality which is more fair than other methods can be obtained, and the re-deleting rate, the throughput rate and the data recovery speed of the system can be improved.
Disclosure of Invention
The invention provides a method for storing and reading shared repeated data with dynamically variable levels. According to the invention, the resource allocation and throughput conditions of each data block in the backup and recovery processing stages are fully considered, and the hierarchical service quality of the cloud backup system is realized from three aspects of tenant hierarchy, resource fair allocation and shared repeated data processing on the premise of stable system performance.
One of the core ideas of the present invention is a fair allocation of resources. The resource fair distribution method can ensure that the tenant obtains fair service in each processing stage of data backup and data recovery. The resource allocation method is used for allocating fair memory space for the tenants according to the service levels of the tenants. The method comprises the following steps: (1) firstly, applying for a memory space from a cloud backup system; (2) quantifying the memory space, and dividing the applied memory space capacity by the size of the metadata corresponding to 1 data block to obtain the total memory space number of the cloud backup system; (3) and (3) allocating memory space for each tenant by using a formula (1) according to the weight corresponding to the tenant grade and the total number of the tenants in each grade.
Figure GDA0002410078880000021
Wherein the MemoryLMemory space size Memory representing L-level tenant allocationtotalIs the total amount of memory space applied by the system, N represents the number of grades of the current system, Pn、AnRespectively representing the memory space weight and the tenant population corresponding to the nth level of the current system,
Figure GDA0002410078880000022
represents the sum of the weights of all tenants, PLiRepresenting the memory space weight corresponding to the tenant of the L level,
Figure GDA0002410078880000023
is the ratio of the total amount of the memory space occupied by the L-level tenants,
Figure GDA0002410078880000024
and the memory space size obtained by the tenants representing the L level from the total memory space.
The second core idea of the invention is throughput monitoring, and the purpose of the throughput monitoring method is to dynamically adjust the memory space and throughput threshold of the tenant, and ensure the hierarchical service quality of the tenant. The method comprises the following steps: (1) periodically monitoring the throughput of each tenant in data backup and data recovery in real time; (2) and (3) summing the tenant throughputs obtained through monitoring, and calculating the average throughput of each grade by using a formula (2) according to the weight corresponding to the tenant grade and the number of the tenants in each grade in the system.
Figure GDA0002410078880000025
Wherein ThroughputL,aAverage Throughput size, Throughput, of tenants representing L classtotalIs the total throughput of the system, N represents the number of levels of the current system, Pn、AnRespectively representing the throughput weight and the tenant population corresponding to the nth level of the current system,
Figure GDA0002410078880000026
represents the sum of the weights of all tenants, PLtIs the throughput weight corresponding to the L-level tenant.
Figure GDA0002410078880000027
Is the ratio of the throughput size of the L-class tenant to the total throughput size,
Figure GDA0002410078880000028
representing the average throughput size of the L-class tenants.
(3) And initializing a tenant throughput threshold by using the throughput size of each tenant monitored in the first period. After each monitoring period is finished, if the real-time throughput of the tenant is not equal to the average throughput of the corresponding level, the memory space size and the throughput threshold value of the tenant are dynamically adjusted by using a formula (3), wherein the formula (3) is shown as follows.
Figure GDA0002410078880000031
Wherein ThroughputL,aRepresents the average Throughput size, Throughput, of the L classiIs the current real-time Throughput of the tenant i, and Δ Memory, Δ Throughput are the increases of the Memory space and the Throughput threshold when the current Throughput of the tenant is not equal to the average Throughput of the corresponding level, and Δ Throughput is the Throughput of the tenantL,a-ThroughputiRepresenting the amount of throughput lost by tenant i, MemoryiOn behalf of tenant i the current memory size,
Figure GDA0002410078880000032
is the size of the memory space compensated for tenant i.
And (4) according to the increment delta Memory of the Memory space and the increment delta Throughput of the Throughput threshold calculated by the formula (3), respectively increasing the Memory space size and the Throughput threshold of the tenant by delta Memory and delta Throughput.
The third core idea of the invention is to share the repeated data processing. The shared repeated data processing method comprises a shared repeated data storage method and a shared data reading method. The method specifically comprises the following steps:
(a) the method for storing the shared repeated data comprises the following steps: and when the new data blocks backed up by the high-priority tenants and the low-priority tenants are repeated in a unit time period, checking whether the throughput ratio of the high-priority tenants and the low-priority tenants is greater than or equal to the average throughput ratio of the corresponding levels according to the throughput of the tenants and the average throughput of each level obtained by the throughput monitoring module. If the throughput ratio values of the high-priority tenants and the low-priority tenants are larger than or equal to the average throughput ratio value of the corresponding levels, the fact that the high-priority tenants complete the data block backup and have no influence on the performance of the high-priority tenants is indicated, the backup tasks of the data block are given to the high-priority tenants to complete, and the data block of the low-priority tenants is marked as being repeated and points to the data block; otherwise, the low-priority tenant completes the backup of the data block, and the data block mark of the high-priority tenant is repeated and points to the data block.
(b) The shared data reading method comprises the following steps: and checking whether the throughput ratio values of the high-priority tenants and the low-priority tenants are larger than or equal to the average throughput ratio value of the corresponding grade or not according to the throughput of the tenants and the average throughput of each grade obtained by the throughput monitoring module when the data blocks recovered by the high-priority tenants and the low-priority tenants are repeated and are not in the cache in a unit time period. If the throughput ratio values of the high-priority tenants and the low-priority tenants are larger than or equal to the average throughput ratio value of the corresponding levels, it is indicated that the high-priority tenants finish the data block caching and have no influence on the performance of the data block caching, the caching tasks of the data block are finished and recovered by the high-priority tenants, the memory space of the high-priority tenants is increased by 1 data block corresponding metadata, and the memory space of the low-priority tenants is reduced by 1 data block corresponding metadata; otherwise, caching the data block by the low tenant and completing the recovery of the data block.
The invention provides a method for storing and reading shared repeated data with dynamically variable grades, which mainly comprises two parts of data backup and data recovery.
The data backup method comprises the following specific steps:
(10) the client performs data blocking on a data stream needing to be backed up by a tenant, then calculates the blocked data blocks by using a Hash algorithm to obtain corresponding fingerprints, and sends the data fingerprints and tenant grade information to a server.
(11) After receiving the data information sent by the client, the server performs the following steps:
(11.1) establishing corresponding priority for backup services of the tenants according to the service levels of the tenants; and the resource fair distribution module is used for distributing the memory space and the throughput threshold value for the tenant by using a formula (1) according to the weight corresponding to the service level.
And (11.2) carrying out periodic real-time monitoring on the throughput of each tenant in the data backup. And (3) summing the tenant throughputs obtained through monitoring, and calculating the average throughputs of different levels by using a formula (2) according to the weight corresponding to the tenant level and the number of the tenants in each level in the system. And (4) after each monitoring period is finished, if the throughput of the tenant is not equal to the average throughput of the corresponding grade, adjusting the memory space and the throughput threshold of the step (11.11) by using a formula (3).
(11.3) after the service priority is determined in the step (11.1), traversing the fingerprint sequence sent in the step (1) from high priority to low priority in sequence according to the tenant backup service priority, inquiring in a fingerprint index table, and if the fingerprint index table does not exist, marking the corresponding data block as a new data block; otherwise, the corresponding data block is stored, the data block is marked as a repeated data block, and the storage address of the data block is recorded.
(11.4) storing the new data block, and specifically comprising the following steps:
(a) and if the new data block is data commonly backed up by the high-priority tenants and the low-priority tenants in a unit time period, adopting a shared repeated data storage strategy, and updating the fingerprint index table according to the storage address of the new data block. The strategy for storing the shared repeated data is specifically as follows:
and (4) checking whether the throughput ratio values of the high-priority tenants and the low-priority tenants are larger than or equal to the average throughput ratio value of the corresponding grade or not according to the tenant throughput and the average throughput of each grade obtained in the step (11.2). And if the throughput ratio values of the high-priority tenants and the low-priority tenants are larger than or equal to the average throughput ratio value of the corresponding grade, finishing the storage of the new data block by the high-priority tenants, and otherwise finishing the storage of the new data block by the low-priority tenants.
(b) And if the new data block is not the data commonly backed up by the high-priority tenant and the low-priority tenant in the unit time period, the tenant to which the data block belongs finishes data block storage, and the fingerprint index table is updated according to the storage address of the new data block.
The data recovery method comprises the following specific steps:
(20) the client reads the address of the data needing to be recovered by the tenant, and sends the address of the data needing to be recovered and the tenant level information to the server.
(21) The server receives the recovery data address and the tenant grade information sent by the client, and the following steps are carried out:
and (21.1) establishing corresponding priority for the recovery service of the tenant according to the service level of the tenant.
And (21.2) searching the metadata information for storing the data on the disk by restoring the address of the data.
And (21.3) according to the weight corresponding to the service level, allocating memory space and a throughput threshold value for the tenant by using the formula (1).
And (21.4) carrying out periodic real-time monitoring on the throughput of each tenant in data recovery. And summing the tenant throughputs obtained through monitoring, and calculating the average throughputs of different levels according to the formula (2) according to the weight corresponding to the tenant level and the number of the tenants in each level in the system. And (4) after each monitoring period is finished, if the throughput of the tenant is not equal to the average throughput of the corresponding grade, adjusting the memory space and the throughput threshold of the step (21.3) by using a formula (3).
(21.5) after the service priority is determined in the step (21.1), scanning metadata information of the data recovered in the step (21.2) from high priority to low priority according to the service priority recovered by the tenant, searching in a server cache, and directly recovering if a data block corresponding to the metadata information exists in the cache; if the data block corresponding to the metadata information is not in the cache, executing the following steps:
(a) and if the data block is the data which is recovered by the high-priority tenant and the low-priority tenant together in the unit time period, adopting a shared data reading strategy for processing. The strategy is specifically as follows:
and (5) according to the tenant throughput and the average throughput of each grade obtained in the step (21.4), checking whether the throughput ratio values of the high-priority tenant and the low-priority tenant are larger than or equal to the average throughput ratio value of the corresponding grade, and judging the performance influence of the high-priority tenant on the data block cache. If the throughput ratio of the high-priority tenant and the low-priority tenant is larger than or equal to the average throughput ratio of the corresponding grade, the high-priority tenant caches the data blocks, the memory space of the high-priority tenant is increased by the size of metadata corresponding to 1 data block, and the memory space of the low-priority tenant is decreased by the size of metadata corresponding to 1 data block; otherwise, the low-priority tenants finish caching and recovering the data blocks;
(b) if the data block is not the data commonly backed up by the high-priority tenant and the low-priority tenant in the unit time period, the tenant to which the data block belongs caches and recovers the data block.
The invention has the characteristics that: the invention relates to a hierarchical service quality control method based on repeated data deletion, which is characterized in that the resource allocation and the throughput monitoring adopt fine-grained control based on data blocks, the data blocks at each stage of data backup and data recovery are accurately obtained, the problem of unfairness of service quality caused by repeated data deletion is solved, and the repeated data deletion technology is better fit with a cloud backup system.
Drawings
FIG. 1 is a schematic block diagram;
FIG. 2 is a flowchart of a data de-duplication method and a shared data reading method
Detailed Description
Fig. 1 is a schematic diagram of a module structure according to the present invention. The present invention relates to a client 100 and a server 200. The client comprises a fingerprint processing module 110, which mainly performs data block chunking on the backup data set and calculates a fingerprint of each data block by using a hash function. The server comprises a tenant hierarchical management module 210, a resource fair distribution module 220, a throughput monitoring module 240 and a shared repeated data processing module 230. The tenant-level management module 210 establishes a corresponding priority according to the service level of each tenant, and schedules the tenant data from a high priority to a low priority in sequence according to the service priority of data backup or data recovery of the tenant. The resource fair allocation module 220 and the throughput monitoring module 240 are used for guaranteeing the hierarchical service quality of the tenant, wherein the resource fair allocation module 220 allocates a fair memory space for the tenant by using a formula (1); the throughput monitoring module 240 monitors the throughput of tenant data processing in real time, and dynamically adjusts the memory space and throughput threshold of the tenant by using the formulas (2) and (3). When a high-priority tenant and a low-priority tenant have data blocks stored or cached together in a unit time period, the shared duplicated data processing module 230 checks whether the throughput ratio values of the high-priority tenant and the low-priority tenant are greater than or equal to the average throughput ratio value of the corresponding level, if the throughput ratio values of the high-priority tenant and the low-priority tenant are greater than or equal to the average throughput ratio value of the corresponding level, the high-priority tenant completes data block storage or caching, otherwise, the low-priority tenant completes data block storage or caching.
FIG. 2 is a processing flow diagram of a data de-duplication method and a shared data reading method according to the present invention, which includes two parts, namely data backup and data recovery.
The data backup method comprises the following specific steps:
(10) the fingerprint processing module 110 of the client 100 performs data blocking on a data stream that a tenant needs to backup, then calculates a corresponding fingerprint by using a hash algorithm on the blocked data block, and sends the data fingerprint and tenant level information to the server.
(11) After receiving the data information sent by the client, the server 200 performs the following steps:
(11.1) the tenant level management module 210 establishes a corresponding priority for the backup service of the tenant according to the service level of the tenant; the resource fair allocation module 220 allocates the memory space and the throughput threshold to the tenant using the formula (1) according to the weight corresponding to the service level.
(11.2) the throughput monitoring module 240 periodically monitors the throughput of each tenant in the data backup in real time. And summing the monitored tenant throughputs, and calculating the average throughputs of different levels according to the corresponding weight of the tenant level and the number of the tenants in each level in the system. And (3) after each monitoring period is finished, if the throughput of the tenant is not equal to the average throughput of the corresponding grade, adjusting the memory space and the throughput threshold of the step (11.1) by using a formula (3).
(11.3) after the service priority is determined in the step (11.1), sequentially traversing the fingerprint sequence sent in the step (10) from the high priority to the low priority by the tenant level management module 210 according to the tenant backup service priority, inquiring in the fingerprint index table, and if the fingerprint index table does not exist, marking the corresponding data block as a new data block; otherwise, the corresponding data block is stored, the data block is marked as a repeated data block, and the storage address of the data block is recorded.
(11.4) storing the new data block, and specifically comprising the following steps:
(a) if the new data block is data that is backed up by a high-priority tenant and a low-priority tenant together in a unit time period, the shared duplicate data processing module 230 adopts a shared duplicate data storage strategy and updates the fingerprint index table according to the storage address of the new data block. The strategy for storing the shared repeated data is specifically as follows: and (4) checking whether the throughput ratio values of the high-priority tenants and the low-priority tenants are larger than or equal to the average throughput ratio value of the corresponding grade or not according to the tenant throughput and the average throughput of each grade obtained in the step (11.2). And if the throughput ratio values of the high-priority tenants and the low-priority tenants are larger than or equal to the average throughput ratio value of the corresponding grade, finishing the storage of the new data block by the high-priority tenants, and otherwise finishing the storage of the new data block by the low-priority tenants.
(b) And if the new data block is not the data commonly backed up by the high-priority tenant and the low-priority tenant in the unit time period, the tenant to which the data block belongs finishes data block storage, and the fingerprint index table is updated according to the storage address of the new data block.
The data recovery method comprises the following specific steps:
(20) the client 100 reads the address of the tenant needing to recover the data, and sends the address needing to recover the data and the tenant level information to the server.
(21) The server 200 receives the recovery data address and the tenant level information sent by the client, and performs the following steps:
(21.1) the tenant level management module 210 establishes a corresponding priority for the recovery service of the tenant according to the service level of the tenant.
And (21.2) searching the metadata information for storing the data on the disk by restoring the address of the data.
And (21.3) the resource fair allocation module 220 allocates the memory space and the throughput threshold to the tenant by using the formula (1) according to the weight corresponding to the service level.
(21.4) the throughput monitoring module 240 periodically monitors the throughput of each tenant in the data recovery in real time. And summing the monitored tenant throughputs, and calculating the average throughputs of different levels according to the corresponding weight of the tenant level and the number of the tenants in each level in the system. And (4) after each monitoring period is finished, if the throughput of the tenant is not equal to the average throughput of the corresponding grade, adjusting the memory space and the throughput threshold of the step (21.3) by using a formula (3).
(21.5) after the service priority is determined in the step (21.1), the tenant level management module 210 scans the metadata information of the data recovered in the step (21.2) from high priority to low priority according to the tenant recovery service priority, searches the metadata information in the server cache, and directly recovers the metadata information if the metadata information corresponding to the data block exists in the cache; if the data block corresponding to the metadata information is not in the cache, executing the following steps:
(a) if the data block is data that is recovered by a high-priority tenant and a low-priority tenant together in a unit time period, the shared duplicate data processing module 230 employs a shared data reading policy for processing. The strategy is specifically as follows: and (5) according to the tenant throughput and the average throughput of each grade obtained in the step (21.4), checking whether the throughput ratio values of the high-priority tenant and the low-priority tenant are larger than or equal to the average throughput ratio value of the corresponding grade, and judging the performance influence of the high-priority tenant on the data block cache. If the throughput ratio of the high-priority tenant and the low-priority tenant is larger than or equal to the average throughput ratio of the corresponding grade, the high-priority tenant caches the data blocks, the memory space of the high-priority tenant is increased by the size of metadata corresponding to 1 data block, and the memory space of the low-priority tenant is decreased by the size of metadata corresponding to 1 data block; otherwise, the low-priority tenants finish caching and recovering the data blocks;
(b) if the data block is not the data commonly backed up by the high-priority tenant and the low-priority tenant in the unit time period, the tenant to which the data block belongs caches and recovers the data block.

Claims (1)

1. A shared repeated data storage and reading method with dynamically variable grades mainly comprises two parts of data backup and data recovery;
the data backup method comprises the following specific steps:
(10) the client performs data blocking on a data stream needing to be backed up by a tenant, then calculates the blocked data by using a Hash algorithm to obtain a corresponding fingerprint, and sends the data block fingerprint and tenant grade information to a server;
(11) after receiving the data information sent by the client, the server performs the following steps:
(11.1) establishing corresponding priority for backup services of the tenants according to the service levels of the tenants, and distributing memory spaces in corresponding proportion for the tenants according to the weights corresponding to the service levels:
Figure FDA0002410078870000011
wherein the MemoryLMemory space size, Memory, representing L-level tenant allocationtotalIs the total amount of memory space applied by the system, N represents the number of grades of the current system, Pn、AnRespectively representing the memory space weight and the tenant population corresponding to the nth level of the current system,
Figure FDA0002410078870000012
represents the sum of the weights of all tenants,
Figure FDA0002410078870000016
representing the memory space weight corresponding to the tenant of the L level,
Figure FDA0002410078870000013
is the ratio of the total amount of the memory space occupied by the L-level tenants,
Figure FDA0002410078870000014
representing the size of the memory space obtained by the tenants of the L level from the total memory space;
(11.2) carrying out periodic real-time monitoring on the throughput of each tenant in the data backup, summing the monitored throughputs of the tenants, and calculating the average throughputs of different levels according to the corresponding weight of the tenant level and the number of the tenants in each level in the system:
Figure FDA0002410078870000015
wherein ThroughputL,aAverage Throughput size, Throughput, of tenants representing L classtotalIs the total throughput of the system, N represents the number of levels of the current system, Pn、AnRespectively representing the throughput weight and the tenant population corresponding to the nth level of the current system,
Figure FDA0002410078870000021
represents the sum of the weights of all tenants, PLtIs the throughput weight for the L level tenant,
Figure FDA0002410078870000022
is the ratio of the throughput size of the L-class tenant to the total throughput size,
Figure FDA0002410078870000023
represents an average throughput size of tenants of the L-class;
initializing a tenant throughput threshold value by using the throughput of each tenant monitored in the first period, and increasing a memory space and the throughput threshold value if the throughput of the tenant is lower than the average throughput of the corresponding grade after each monitoring period is finished; if the throughput size of the tenant is higher than the average throughput size of the corresponding grade, reducing the memory space and the throughput threshold:
Figure FDA0002410078870000024
wherein ThroughputL,aAverage throughput representing L levelMagnitude of volume, ThroughputiIs the current real-time Throughput of the tenant i, and Δ Memory, Δ Throughput are the increases of the Memory space and the Throughput threshold when the current Throughput of the tenant is not equal to the average Throughput of the corresponding level, and Δ Throughput is the Throughput of the tenantL,a-ThroughputiRepresenting the amount of throughput lost by tenant i, MemoryiOn behalf of tenant i the current memory size,
Figure FDA0002410078870000025
is the compensated memory space size of tenant i;
(11.3) after the service priority is determined in the step (11.1), according to the tenant backup service priority, traversing the fingerprint sequence sent in the step (10) from high priority to low priority in sequence, inquiring in a fingerprint index table, and if the fingerprint does not exist, marking the corresponding data block as a new data block; otherwise, marking the data block as a repeated data block, and recording the storage address of the data block;
(11.4) storing the new data block, and specifically comprising the following steps:
(a) if the new data block is data commonly backed up by a high-priority tenant and a low-priority tenant in a unit time period, adopting a shared repeated data storage strategy, and updating a fingerprint index table according to a storage address of the new data block, wherein the shared repeated data storage strategy specifically comprises the following steps:
checking whether the throughput ratio values of the high-priority tenants and the low-priority tenants are larger than or equal to the average throughput ratio value of the corresponding grade or not according to the throughput of the tenants and the average throughput of each grade obtained in the step (11.2), if the throughput ratio values of the high-priority tenants and the low-priority tenants are larger than or equal to the average throughput ratio value of the corresponding grade, finishing the storage of the new data block by the high-priority tenants, and if not, finishing the storage of the new data block by the low-priority tenants;
(b) if the new data block is not the data commonly backed up by the high-priority tenant and the low-priority tenant in the unit time period, the tenant to which the data block belongs completes data block storage, and the fingerprint index table is updated according to the storage address of the new data block;
the data recovery method comprises the following specific steps:
(20) the client reads the address of the data needing to be recovered by the tenant, and sends the address of the data needing to be recovered and the tenant level information to the server;
(21) the server receives the recovery data address and the tenant grade information sent by the client, and the following steps are carried out:
(21.1) establishing corresponding priority for the recovery service of the tenant according to the service level of the tenant;
(21.2) searching metadata information for storing the data on the disk by restoring the address of the data;
(21.3) distributing memory space for the tenant according to the weight corresponding to the service level;
Figure FDA0002410078870000031
wherein the MemoryLMemory space size, Memory, representing L-level tenant allocationtotalIs the total amount of memory space applied by the system, N represents the number of grades of the current system, Pn、AnRespectively representing the memory space weight and the tenant population corresponding to the nth level of the current system,
Figure FDA0002410078870000032
represents the sum of the weights of all tenants, PLiRepresenting the memory space weight corresponding to the tenant of the L level,
Figure FDA0002410078870000033
is the ratio of the total amount of the memory space occupied by the L-level tenants,
Figure FDA0002410078870000041
representing the size of the memory space obtained by the tenants of the L level from the total memory space;
(21.4) carrying out periodic real-time monitoring on the throughput of each tenant in data recovery, summing the monitored throughputs of the tenants, and calculating the average throughputs of different levels according to the corresponding weight of the tenant level and the number of the tenants in each level in the system:
Figure FDA0002410078870000042
wherein ThroughputL,aAverage Throughput size, Throughput, of tenants representing L classtotalIs the total throughput of the system, N represents the number of levels of the current system, Pn、AnRespectively representing the throughput weight and the tenant population corresponding to the nth level of the current system,
Figure FDA0002410078870000043
represents the sum of the weights of all tenants, PLtIs the throughput weight for the L level tenant,
Figure FDA0002410078870000044
is the ratio of the throughput size of the L-class tenant to the total throughput size,
Figure FDA0002410078870000045
represents an average throughput size of tenants of the L-class;
initializing a tenant throughput threshold value by using the throughput of each tenant monitored in the first period, and increasing a memory space and the throughput threshold value if the throughput of the tenant is lower than the average throughput of the corresponding grade after each monitoring period is finished; if the throughput size of the tenant is higher than the average throughput size of the corresponding grade, reducing the memory space and the throughput threshold:
Figure FDA0002410078870000046
wherein ThroughputL,aRepresents the average Throughput size, Throughput, of the L classiIs the current real-time Throughput of the tenant i, and Δ Memory, Δ Throughput are when the current Throughput of the tenant is not equal to the average Throughput of the corresponding classIncrease in memory space and Throughput threshold, ThroughputL,a-ThroughputiRepresenting the amount of throughput lost by tenant i, MemoryiOn behalf of tenant i the current memory size,
Figure FDA0002410078870000051
is the compensated memory space size of tenant i;
(21.5) after the service priority is determined in the step (21.1), scanning metadata information of the data recovered in the step (21.2) from high priority to low priority according to the service priority recovered by the tenant, searching in a server cache, and directly recovering if a data block corresponding to the metadata information exists in the cache; if the data block corresponding to the metadata information is not in the cache, executing the following steps:
(a) if the data block is data which is recovered by the high-priority tenant and the low-priority tenant together in a unit time period, adopting a shared data reading strategy for processing, wherein the strategy specifically comprises the following steps: checking whether the throughput ratio values of the high-priority tenants and the low-priority tenants are larger than or equal to the average throughput ratio value of the corresponding grade or not according to the tenant throughput and the average throughput of each grade obtained in the step (21.4), if the throughput ratio values of the high-priority tenants and the low-priority tenants are larger than or equal to the average throughput ratio value of the corresponding grade, completing caching of data blocks by the high-priority tenants, increasing the metadata size corresponding to 1 data block in the memory space of the high-priority tenants, and reducing the metadata size corresponding to 1 data block in the memory space of the low-priority tenants; otherwise, the low-priority tenants finish caching and recovering the data blocks;
(b) if the data block is not the data commonly backed up by the high-priority tenant and the low-priority tenant in the unit time period, the tenant to which the data block belongs caches and recovers the data block.
CN201710506611.7A 2017-06-28 2017-06-28 Shared repeated data storage and reading method with dynamically variable levels Active CN107249035B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710506611.7A CN107249035B (en) 2017-06-28 2017-06-28 Shared repeated data storage and reading method with dynamically variable levels

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710506611.7A CN107249035B (en) 2017-06-28 2017-06-28 Shared repeated data storage and reading method with dynamically variable levels

Publications (2)

Publication Number Publication Date
CN107249035A CN107249035A (en) 2017-10-13
CN107249035B true CN107249035B (en) 2020-05-26

Family

ID=60013512

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710506611.7A Active CN107249035B (en) 2017-06-28 2017-06-28 Shared repeated data storage and reading method with dynamically variable levels

Country Status (1)

Country Link
CN (1) CN107249035B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108733511B (en) * 2018-03-23 2022-05-24 赵浩茗 Electronic data processing method based on big data
CN110609807B (en) * 2018-06-15 2023-06-23 伊姆西Ip控股有限责任公司 Method, apparatus and computer readable storage medium for deleting snapshot data
CN110083309B (en) * 2019-04-11 2020-05-26 重庆大学 Shared data block processing method, system and readable storage medium
CN110955522B (en) * 2019-11-12 2022-10-14 华中科技大学 Resource management method and system for coordination performance isolation and data recovery optimization
CN113407338A (en) * 2021-05-29 2021-09-17 国网辽宁省电力有限公司辽阳供电公司 A/D conversion chip resource allocation method of segmented architecture
CN114116323B (en) * 2022-01-27 2022-04-19 天津市城市规划设计研究总院有限公司 Data backup strategy management method and system based on permission level
CN116126596B (en) * 2023-02-13 2023-08-18 北京易华录信息技术股份有限公司 Information processing system and method based on block chain
CN117435144B (en) * 2023-12-20 2024-03-22 山东云天安全技术有限公司 Intelligent data hierarchical security management method and system for data center

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101741536A (en) * 2008-11-26 2010-06-16 中兴通讯股份有限公司 Data level disaster-tolerant method and system and production center node
CN102541751A (en) * 2010-11-18 2012-07-04 微软公司 Scalable chunk store for data deduplication
CN103377285A (en) * 2012-04-25 2013-10-30 国际商业机器公司 Enhanced reliability in deduplication technology over storage clouds
US9128948B1 (en) * 2010-09-15 2015-09-08 Symantec Corporation Integration of deduplicating backup server with cloud storage
CN105302669A (en) * 2015-10-23 2016-02-03 浙江工商大学 Method and system for data deduplication in cloud backup process
CN106066818A (en) * 2016-05-25 2016-11-02 重庆大学 A kind of data layout's method improving data de-duplication standby system restorability

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101741536A (en) * 2008-11-26 2010-06-16 中兴通讯股份有限公司 Data level disaster-tolerant method and system and production center node
US9128948B1 (en) * 2010-09-15 2015-09-08 Symantec Corporation Integration of deduplicating backup server with cloud storage
CN102541751A (en) * 2010-11-18 2012-07-04 微软公司 Scalable chunk store for data deduplication
CN103377285A (en) * 2012-04-25 2013-10-30 国际商业机器公司 Enhanced reliability in deduplication technology over storage clouds
CN105302669A (en) * 2015-10-23 2016-02-03 浙江工商大学 Method and system for data deduplication in cloud backup process
CN106066818A (en) * 2016-05-25 2016-11-02 重庆大学 A kind of data layout's method improving data de-duplication standby system restorability

Also Published As

Publication number Publication date
CN107249035A (en) 2017-10-13

Similar Documents

Publication Publication Date Title
CN107249035B (en) Shared repeated data storage and reading method with dynamically variable levels
EP2327024B1 (en) Techniques for resource location and migration across data centers
CN106599308B (en) distributed metadata management method and system
US10394452B2 (en) Selecting pages implementing leaf nodes and internal nodes of a data set index for reuse
CN106933868A (en) A kind of method and data server for adjusting data fragmentation distribution
CN103631894A (en) Dynamic copy management method based on HDFS
US10747665B2 (en) Cost-based garbage collection scheduling in a distributed storage environment
CN113655969B (en) Data balanced storage method based on streaming distributed storage system
CN109492429B (en) Privacy protection method for data release
CN113486026A (en) Data processing method, device, equipment and medium
US11765099B2 (en) Resource allocation using distributed segment processing credits
US20050097130A1 (en) Tracking space usage in a database
Si Salem et al. Enabling long-term fairness in dynamic resource allocation
CN102609508A (en) High-speed access method of files in network storage
US8290906B1 (en) Intelligent resource synchronization
WO2017049488A1 (en) Cache management method and apparatus
CN116820323A (en) Data storage method, device, electronic equipment and computer readable storage medium
CN113742304A (en) Data storage method of hybrid cloud
CN110083309B (en) Shared data block processing method, system and readable storage medium
CN102096723A (en) Data query method based on copy replication algorithm
Li Dynamic Load Balancing Method for Urban Surveillance Video Big Data Storage Based on HDFS
Jian et al. A HDFS dynamic load balancing strategy using improved niche PSO algorithm in cloud storage
CN110554916A (en) Distributed cluster-based risk index calculation method and device
CN113297003A (en) Method, electronic device and computer program product for managing backup data
CN106033434A (en) Virtual asset data replica processing method based on data size and popularity

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant