CN109154905A - Multiple data set backup versions of spanning multilayer storage - Google Patents

Multiple data set backup versions of spanning multilayer storage Download PDF

Info

Publication number
CN109154905A
CN109154905A CN201780031635.XA CN201780031635A CN109154905A CN 109154905 A CN109154905 A CN 109154905A CN 201780031635 A CN201780031635 A CN 201780031635A CN 109154905 A CN109154905 A CN 109154905A
Authority
CN
China
Prior art keywords
backup
data
data set
cloud
accumulation layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201780031635.XA
Other languages
Chinese (zh)
Other versions
CN109154905B (en
Inventor
K·瓦德瓦
S·A·迪伦
A·P·S·库什瓦
S·C·凯亚森纳哈利
S·P·T·纳加拉杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Network Area Storage Technology Co Ltd
Original Assignee
Network Area Storage Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Network Area Storage Technology Co Ltd filed Critical Network Area Storage Technology Co Ltd
Publication of CN109154905A publication Critical patent/CN109154905A/en
Application granted granted Critical
Publication of CN109154905B publication Critical patent/CN109154905B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • G06F3/0605Improving or facilitating administration, e.g. storage management by facilitating the interaction with a user or administrator
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • G06F11/1453Management of the data involved in backup or backup restore using de-duplication of the data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1464Management of the backup or restore process for networked environments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • G06F3/0641De-duplication techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/0647Migration mechanisms
    • G06F3/0649Lifecycle management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/0652Erasing, e.g. deleting, data cleaning, moving of data to a wastebasket
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Abstract

Store the retention period different different editions of shelf manager creation data set backup.Each of version is clearly identifiable, even if initially indicating same data set backup.One version can be described as the cached version of data set backup, and another version can be described as the cloud version of data set backup.When the retention period of the cached version of data set backup expires, shelf manager is stored by the cloud version of data set backup and moves to cloud storage layer from cache accumulation layer.Then, storage shelf manager can restore the memory space occupied by the data migrated, as long as that data is not shared with other cached versions of other data set backups due to duplicate removal.

Description

Multiple data set backup versions of spanning multilayer storage
Technical field
The disclosure generally relates to the field of data processing, and relates more particularly to Backup Data.
Background technique
Tissue backups to public and/or private cloud storage equipment (" cloud backup ") to reduce information technology (" IT ") cost. In the case where cloud backup, tissue can be easier to expansion scale, because the IT department of tissue can avoid their storage base of extension The time and money cost of Infrastructure.Cloud is backed up, the data of tissue usually arrive public or private cloud storage equipment in storage It by duplicate removal and compresses before.
Detailed description of the invention
The aspect of the disclosure can be more fully understood by reference to attached drawing.
Fig. 1 provides the concept side for describing the retention period different two different back-up devices indicated of creation data set backup Block diagram.
Fig. 2 description is created by the explicit request to different expressions from backup application using multiple retention periods The flow chart of the exemplary operation of multiple expressions of data set backup.
Fig. 3 depicted example stores the conceptual diagram of shelf manager, and the storage shelf manager is utilized to duplicate removal data The arrangement of reference, which efficiently to create cloud-type backup, to be indicated.
Fig. 4 describes the stream that the cache backup object based on data set efficiently creates the exemplary operation of cloud backup object Cheng Tu.
Fig. 5 is that the cloud backup object and lower data in arranging data plate move to object-based cloud storage equipment Exemplary operation flow chart.
Fig. 6 is the cached representation of the release data set backup after the cloud of data set backup indicates to move to cloud target Exemplary operation flow chart.
Fig. 7 is to restore trustship after the retention period of the cached representation of data set backup expires and store shelf manager Storage system memory space exemplary operation flow chart.
Fig. 8 describes the exemplary memory system with storage shelf manager, and the system is based on multiple retention periods and generates Multiple expressions of data set backup.
Specific embodiment
Description
Exemplary system, method, technology and the program flow of the aspect including embodying the disclosure is described below.However, should manage Solution, can practice the disclosure without these specific details.For example, this disclosure relates to being moved to by data set backup Data are temporarily stored at local backup device before cloud storage equipment.Data, which are moved to cloud from back-up device, is only One example of layer to layer migration.The aspect of the disclosure can be applied to other layers and migrate to layer data, such as input/output performance Layer between two different cloud targets of ability is migrated to layer data.In other cases, it is not illustrated in detail well-known Command Example, agreement, structure and technology so as not to keeping description fuzzy.
Brief introduction
In order to promote cloud backup to allow the local recoveries of data simultaneously, tissue be can be used local cache and cloud backup The back-up device (" integrating cloud back-up device ") of integration.When data will be backed up to Yun Zhongshi, the data to take various forms are from depositing Storage server integrates cloud back-up device to cloud traversal.It integrates cloud back-up device and data set backup is locally stored, this allows from whole It closes cloud back-up device and efficiently restores data set backup.Then, it integrates cloud back-up device and data backup is moved to specified cloud mesh Mark.Integrate cloud back-up device when being locally stored can duplicate removal, compression, and encrypt the data from storage server.Thus, it is whole Specified cloud target can be moved to for compression, encryption data by closing cloud back-up device.
It summarizes
The application program (" storage shelf manager ") of data at management accumulation layer can be designed to creation data set backup Different expressions or version.Each of expression of data set backup be it is clearly identifiable, even if initially indicating phase Same data set backup.These different expressions are associated from different retention periods.This causes administrator preferably to control data Life cycle management and allow additional data management function.Expression corresponds to the structuring member number of data set backup According to.Although duplicate removal causes to indicate reference same data set backup, indicates and data set backup is logically after can deviating from Two different data set backups of continuous manipulation.One expression is to reside in back-up device according to life cycle management strategy The cache backup version for continuing to provide the data set backup of low delay access at accumulation layer when relatively short retention period is (" high Speed caching backup " or " cached representation ").Another expression is to retain cloud storage according to the offer of life cycle management strategy Continue the cloud backup version (" cloud backup " or " cloud expression ") of the data set backup of longer retention period in equipment.Work as data set backup The retention period of cached version when expiring, storage shelf manager is by the cloud version of data set backup from cache accumulation layer Move to cloud storage layer.Then, the memory space that the data that storage shelf manager is restored to have migrated occupy, as long as the data It is not shared with other cached versions of other data set backups due to duplicate removal.
Exemplary illustration
Fig. 1 provides the concept side for describing the retention period different two different back-up devices indicated of creation data set backup Block diagram.From the perspective of data station 103, back-up device 110 as data set backup cache and operate.Data station Point 103 further includes data storage 115 and backup server 116.115 trustship backup application 114 of data storage and 116 trustship backup application 113 of backup server.
Fig. 1 is explained with a series of letter A-D.These letters indicate the level segment of operation, and wherein each level segment may include one A or multiple operations.Level segment is not necessarily mutual exclusion, and can be overlapped.Although these level segments are sorted for this example Son, but level segment illustrates an example to help to understand the disclosure and should not be taken to limit claims.Belong to right to want Ask the theme in the range of book that can change about some in sequence and operation.
Level segment A includes that data set A 112 is transferred to back-up device 110 to be used for data set A by backup application 114 112 backup.Data set A is backuped to the backup application 113 of cloud target by request back-up device 110 by level segment A Triggering.In this description, cloud target is in cloud storage equipment 140.Before data set A 112 is written, backup application 114 is (remote according to Common Internet File System (CIFS) agreement or Network File System (NFS) agreement or use customization RPC Journey Procedure Call) the storage network protocols such as proprietary protocol open the connection with back-up device 110.Backup application 114 is logical Believe the request of the identifier of designation date collection A 112, the cloud storage equipment 140 and two reservations of the cloud target backed up as cloud Phase (NP and NC) dominate the instruction that data set A is backed up.Data lifecycle management strategy 101 defines two retention periods.Retention period NP It provides time cycle (generally about a couple of days or several weeks) of the data set A backup cache at back-up device 110, and retains Phase NCRegulation data set A backup remains in the time cycle (generally about several months or several years) in cloud storage equipment 140.Some In the case of, time cycle NPThe triggerable data set A that expires backup to attribute it is different (for example, different accessibilities with it is difference extensive It is multiple to guarantee) cloud storage equipment migration.After establishing the connection with back-up device 110, backup application 114 will be backed up Device 110 is set as backup target, is effectively intermediate/temporary backup target, and start for data set A 112 to be transferred to standby Part device 110.Data set A 112 is transmitted as fixed or variable-size composition data unit (example by backup application 114 Such as, panel, data block etc.).
Level segment B includes the first expression of the backup that back-up device 110 creates data set A 112 on back-up device 110.Grade Section B is triggered by from the request of backup application 114.As mentioned, the backup of designation date collection A 112 is requested to have Two retention periods.Based on the instruction of two retention periods, back-up device 110 creates data set A is backed up first and indicates, and described first It indicates to correspond to retention period NC.This first expression includes data set A backup metadata 122.Data set A backup metadata 122 Identifier including metadata 122 (for example, unique identifier in the NameSpace managed by back-up device 110).First number It further include the identifier of data set A 112 and the metadata (for example, license, size, creation data etc.) of data set A according to 122.When Receive data set A 112 composition data unit when, back-up device 110 carry out include duplicate removal storage efficiency operate.Backup dress Set 110 backups that compression and encryption can also be applied to data set A 112.When data set A 112 is processed and by locally When storage, back-up device 110 is with the reference of the component units backed up to data set A more new metadata 122.Because duplicate removal just by into Row, so metadata 122 can quote duplicate removal data and non-duplicate removal data.In this description, backup application 113 is in level segment Before A or data set B 111 is transferred to back-up device 110 to be used to back up by a certain moment Chong Die with level segment A.Work as progress When deduplication operation, back-up device 110 finds the repeated data between data set A 112 and data set B 111.Therefore, metadata The non-duplicate removal data 129 (that is, not making carbon copies the data on back-up device 110) of 122 references and duplicate removal data 130.To avoid Degree is complicated, and duplicate removal data 130 are only quoted by metadata 122 and database B cache backup metadata 121.Data set B high The backup metadata 121 of speed caching also quotes the non-duplicate removal data 124 corresponding to data set B 111.
Level segment C includes that back-up device 110 is based on retention period NPCreate the backup of data set A 112 second indicates.Level segment C It can be by being triggered from the explicit request of back-up device 114 to create the second of data set A backup and indicate, it can be for from data Collect the implicit request of the instruction of two retention periods of A backup, or can be the default action (example for the data with particular community Such as, it second indicates to be in the data for tissue completely or only for the creation of certain departments).Back-up device 110 is based on number Indicate that creation second indicates according to the first of collection A backup.The creation of backup application 110 second indicates standby to include data set A cloud Part metadata 123 and data set A backup 125.Back-up device 110 can be indicated by duplication first to create the second expression.Although It is copy, but the different identifiers that the result of the first copy indicated will be indicated at least with second.Back-up device 110 can answer Data set A 112 processed inhibits duplicate removal to create data set A backup 125 to maintain the separation of lower data.However, backup Device 110 also allows for duplicate removal, this will lead to data set A backup 125 for institute in duplicate removal data 130 and non-duplicate removal data 129 The data of the identical data set A 112 indicated.In other words, data set A backup 125 can be the reference to data.These draw With the part that can be data set A cloud backup metadata 123 or the independent structure quoted by data set A cloud backup metadata 123.
Level segment D, which includes back-up device 110, moves to cloud storage equipment 140 for metadata 123 and data set A backup 125.Member The migration of data 123 and data set A backup 125 leads to object 141 (including metadata 142 and data set backup 143), it is assumed that cloud It stores equipment 140 and uses object-based storage technology.Metadata 123 and the migration form of data set A backup 125 may depend on Used service and/or agreement are (for example, AmazonStorage service, Microsoft Azure platform,Webscale object storage software,Swift object/binary large object storage connects Mouthful etc.) and change.Retention period NCDominate the object 141 in cloud storage equipment 140.In NCAfter expiring, object 141 can be moved again Move or move to storage/archive of different level.After Successful migration metadata 123 and data set A backup 125, metadata 123 can be removed from back-up device 110, and back-up device 110 can notify backup application 114: data set A backup 112 It has been stored in cloud storage equipment 140.This identifier of notice including object A 141 is to allow from cloud storage equipment 140 Retrieval.After having notified backup application 114, back-up device 110 can start in composition data unit not by back-up device Data set A backup is removed from back-up device 110 in the degree that other backups on 110 are shared.
Data station 103 may include not describing to avoid the various other hardware and/or software of Fig. 1 complexity are unnecessarily made Element.Fig. 1 describes the data memory 115 as the comparison with backup server 116, and the disclosure is avoided to be limited to take from backup Business device receives the misunderstanding of the data set of backup.For example, client device can trustship by the data set transmissions of backup to back-up device 110 backup application.
Other than allowing to manage the independent retention period of data set backup, the multiple expressions for creating data set backup allow pair The more preferable control of other Data lifecycle management variables (for example, backup time delay, regulation backup strategy of different user group etc.) System.Unique identifier is provided for each of multiple expressions of data set backup, by allowing different variables and multiple expressions In specific one be associated to promote more preferably to control.Service level agreement (" SLA ") and/or Storage Lifecycle Policy (" SLP ") can be assigned to the expression of each data set backup with unique identifier.Indicate that the uniqueness of identifier can pass through Ensured using mutual exclusion NameSpace.For example, expression can be stored in different logics by modifying directories or subdirectories by identifier In container (for example, volume), it will indicate to be stored in that the medium with different installation points is medium to occupy mutual exclusion NameSpace.
The cached representation and cloud at least creating data set backup indicate permission data isolation, such as closing rule safely Property.The license creation and/or storage that cached representation can be limited arrive limited storage equipment/memory.Cloud expression can take It indicates from limitation cloud to the security strategy of the movement of limited destination set (for example, only to particular cloud target) or by the peace Full strategy dominates, and is only transferred to limited destination by secure connection and/or agreement and gathers.
Multiple expressions of creation data set backup also allow the different different SLA/SLP phases for indicating and meeting jurisdiction requirement Association.For example, two clouds that back-up device can create data set backup indicate, each of medium cloud expression will be stored in In cloud storage equipment under different jurisdictions.Because jurisdiction can have different data-privacy methods, individual cloud is indicated The specific SLA/SLP of jurisdiction is allowed efficiently to be applied.For example, data owner can avoid creation with each jurisdictional The overall data management strategy of rule and the assessment of the rule of each data set backup moved in cloud storage equipment.In addition, Additional expression can be created using the SLA/SLP of their own and move to different back-up devices as the spare of failover.
Fig. 2 description is directed to multiple retention period creations by the explicit request to different expressions from backup application The flow chart of the exemplary operation of multiple expressions of data set backup.When carrying out the exemplary operation of Fig. 2, Fig. 2 is related to storing Shelf manager.Because the creation of multiple expressions of multiple retention periods corresponds to data to the backup in cloud storage equipment, make With term " storage shelf manager ".Fig. 2 is related to storing shelf manager rather than back-up device, to avoid specifically configured set is required The explanation of standby (for example, storage device).For example, storage shelf manager can execute in virtual machine.Dotted line in Fig. 2 is used to show Indirect or asynchronous flow between represented exemplary operation.
At box 201, storage shelf manager detects the request of the low delay backup to data set.To the low of data set The request backed up that is delayed can be the explicit request for creating backup in low delay storage equipment, or can be implicitly.It is standby Part request can impliedly request low delay backup to backup to the seondary effect in cloud storage equipment as requested data set.Accumulation layer pipe Reason device can be programmed to be the request to both cloud backup and low delay backup to the request processing that cloud backs up.As another Any request processing for being used to backup data set can be the request for creating two expressions of data set by example, request, often The retention period of a expression, is different.
At box 203, cloud backup application creates the backup of data set.Store the standby of shelf manager creation data set Part indicates.Identifier is assigned to created backup and indicated by cloud backup application, and is set backup and be expressed as delaying at a high speed Deposit the instruction of type.The backup expression of cache types is set to indicate lower data and resides in local, the lower number of plies According to being not necessarily resident in the memory of conventional cache type.Although data set is just being backed up to relative to cloud storage equipment Local memory device, but data set is finally backed up in cloud storage equipment.Local memory device will have compared to cloud Store the lower access delay of equipment.As exemplified in figure 1, local memory device can be for by the data station of source data " local " The storage equipment (for example, disk memory array or flash memory storage array) of equipment (for example, storage device) management at place." local " Source data can be represented in same local network, it is medium in identical building.Do not consider specific deployments, the backup of cache types It indicates to access compared to the relatively low delay of cloud storage equipment.Backup indicates to include the metadata indicated with both data sets. Backup indicates that metadata includes being assigned to the identifier of backup expression (for example, Universal Unique Identifier by storage shelf manager (UUID)).Backup, which indicates to be considered as, to be included data set backup or may include reference to data set backup.Cloud back-up device Also it for example indicates to set in metadata in backup and backs up the instruction for being expressed as cache types.This can be later used to right by type Indicate the operation of operation.Backup indicates to be logically viewed as including data set backup, although backup indicates can there is logarithm According to the reference of the component units of collection backup.In more general terms, type instruction can be the value of the corresponding accumulation layer of expression.For example, with In the value " 1 " that the type instruction of the first accumulation layer of low access delay storage equipment can be corresponding to the first accumulation layer, or can quilt " cache " is to indicate that it corresponds to low access delay accumulation layer.
At box 206, storage shelf manager, which backs up cache, indicates that identifier and cache types instruction pass Up to requestor.Storage shelf manager is according to the communication protocol for being used to communicate with storage shelf manager by backup application or deposits It stores up network protocol and conveys identifier and type instruction.Storage shelf manager provides identifier and type instruction to back-up application journey Sequence indicates previously described to allow to back up the backup application management indicated control or carry out the multiple of data set backup Manipulation.For example, requestor can notify storage shelf manager when the retention period that cache backup indicates expires.
At box 207, storage shelf manager detects the request of Indicated Cloud target and data set.Backup application can Send another request for the data set that instruction had previously indicated in other requests.In some implementations, backup application will Single backup request is passed to storage shelf manager.Single request processing can be for creating multiple tables by cloud backup application The request shown.
At box 208, another backup of storage shelf manager creation data set is indicated.Storing shelf manager will be different Identifier be assigned to this additional backup and indicate, and set the instruction for being expressed as the expression of cloud-type.Store layer-management Device sets cloud-type instruction, is indicated with instruction backup and represented data set backup will be stored in cloud storage equipment.Cloud Type backup indicates there is longer retention period, usually substantially indicates long (for example, a couple of days is compared than the backup of cache types Several years).
At box 209, storage shelf manager, which backs up cloud, indicates that identifier and cloud-type instruction are communicated to requestor.Please Cloud backup can be used to indicate identifier to access the data set backup in cloud storage equipment for the person of asking.Storage shelf manager is evicting number from According to or carry out garbage collection when can be used type come distinguishes data collection backup and metadata.
At box 210, the cloud-type of data set is backed up expression migration in response to migration triggering by storage shelf manager To cloud target.The retention period that migration triggering can indicate for cache backup, expires.If cloud-type backup indicates Lower data collection backup, then the backup of lower data collection is also migrated and relationship between the two is maintained as the part of migration In cloud storage equipment.
The exemplary operation for covering creation and the variation in the multiple expressions of management is presented in Fig. 2.Fig. 3-4, which provides to have, removes tuple According to the backup with multiple retention periods multiple expressions exemplary illustration.In the case where duplicate removal data, metadata can be with The multiple expressions backed up are allowed to arrange in a manner of creation faster.Fig. 3 depicted example stores the conceptual diagram of shelf manager, The storage shelf manager is indicated using the arrangement of the reference to duplicate removal data efficiently to create cloud-type backup.Fig. 4 describes high The flow chart for the exemplary operation that effect creation cloud-type backup indicates.
In Fig. 3, storage shelf manager 302 backs up data to the cloud storage equipment 340 using object storage technology.It deposits Reservoir management device 302 is illustrated as the backup of management data set A and data set B.Due to duplicate removal, data set A backup and data set B are standby Part shares some data.Store the data cell (" composition data unit ") that shelf manager 302 assembles composition data collection.Form number " data plate " is referred to herein as according to the aggregation of unit.Storing shelf manager 302 can be by the composition data list of multiple data sets Member forms data plate.Each composition data unit of data plate can be shared by multiple data sets.Storing shelf manager 302 can Data plate is formed based on the configuration size of data plate.Storing shelf manager 302 can be accumulated with duplicate removal, composition data unit Data plate, the database block size until reaching configuration, with and without filling.Storage shelf manager 302 maintains every number According to the metadata of collection, to restore data set from data plate.This metadata of data set is referred to herein as composition data and reflects It penetrates.The composition data mapping of data set includes the identifier of data plate, and the identifier has the composition data list of data set The location information of first and every data plate.Each composition data unit of location information designation date collection starts in data plate In the length or size of where (" offset of data plate ") and composition data unit.Composition data mapping can also indicate that data plate Compression algorithm and encryption.
In the case where data set A backup and data set B backup, storage shelf manager 302 has formed the database in Fig. 3 Block 309.Data set A cache backup object 301 is related to composition data mapping 305A.Composition data maps 305A and identifies data Plate 307, the data plate are the subsets of data plate 309.Data set A cache backup object 301 includes data set A Metadata, object be cache types instruction and object identifier.Fig. 3 is related to object rather than indicates, because of data The metadata for collecting A is different from the metadata for lower layer's composition data unit that data set A is backed up.This arrangement can be considered backup point Solution is at 3 parts: 1) data cell of composition data collection, 2) metadata of positioning composition data unit or retrieval metadata, and 3) data set backup metadata.Data plate 307 includes the composition data unit of data set A.It is several for this limited example A composition data unit is identified as A1、S1And AN, wherein composition data cell S1Indicate shared composition data cell.Form number According to cell S1It is also the composition data unit of data set B.The composition data of data set B backup maps 311 reference data plate collection Close the composition data cell S in 3071And the element of other data plates in data plate 309.Data set B cache Backup metadata 313 is related to composition data mapping 311.
Such as in Fig. 1, Fig. 3 is explained with a series of letter A-C.These letters indicate the level segment of operation, wherein each grade Section may include one or more operations.Level segment is not necessarily mutual exclusion, and can be overlapped.Although these level segments be sorted with In this example, but level segment illustrates an example to help to understand the disclosure and should not be taken to limit claims.Belong to Theme in the range of claims can change about some in sequence and operation.
Level segment A includes that storage shelf manager 302 replicates and modifies data set A cache backup object 301 to create number According to collection A cloud backup object 303.Storage shelf manager 302 at least modifies copy to indicate the new identifier of object 303 and by object Type is designated as cloud-type.Level segment A is triggered by the explicit or implicit request from backup application with backup data set A, and And data set A backup is dominated by multiple retention periods.
Level segment B includes that storage shelf manager 302 replicates composition data mapping 305A to create composition data mapping 305B.It deposits Reservoir management device 302 modifies data set A cloud backup object 303 to be related to the composition data of duplication and map 305B.At the moment, it stores Two expressions that creation data set A is backed up in the case where the expense of no duplication lower data of shelf manager 302.In addition, Storage shelf manager 302 can rely on duplicate removal program code manage make data set A back up two indicate diverging lower layer after Continuous modification.If the cache backup of request modification data set A, duplicate removal program code quotes management, so that composition Data mapping 305A is related to being updated may be in the data of the change in different data plate, and composition data maps 305B It will continue to be related to data unchanged.
Level segment C includes the cloud backup for storing 302 migrating data collection A of shelf manager.In order to which the cloud of migrating data collection A backs up, Storage shelf manager may reflect data set A cloud backup object 303, composition data after transformation (for example, compression, encryption etc.) It penetrates 305B and data plate 307 is communicated to object storage device 340.Migration leads to four be stored in object storage device 340 A object: 1) data set A cloud backup metadata object 315;2) composition data mapping object 317;3) in data plate 307 The object 319 of one data plate;With the object 321 of the second data plate in 4) data plate 307.Store shelf manager 302 It is created in cloud storage equipment 340 based on the identifier of the counter structure managed by storage shelf manager 302 with object key These objects.Shelf manager 302 is stored based on the data plate identifier of corresponding data plate to create object with object key 321.Similarly, storage shelf manager 302 creates object based on the plate identifier of its corresponding data plate with object key 319.Shelf manager 302 is stored based on the identifier of composition data mapping 305B come with object key creation composition data mapping pair As 317.Finally, storage shelf manager 302 creates number based on the identifier of data set A cloud backup object 303 with object key According to collection A cloud backup metadata object 315.
Although how much being driven by data management/efficiency function of storage shelf manager 302, it is more that backup is separated into these A object allows the efficient retrieval and the storage overdue efficiency of equipment of the different aspect of data set.In the feelings of metadata object 315 Under condition, the metadata of data set A can be retrieved in the case where not retrieving lower data collection A, this will include retrieval data plate pair As 319,321, and then database block object 319, the 321 data set for reconstruction A by retrieving.Expense from reconstruction will depend on The transformation of data before moving in cloud storage equipment and change.For example, storage shelf manager 302 can be by data plate Simultaneously encryption data plate is compressed before storing cloud storage equipment.For data set for reconstruction, data set can extracted from data plate Composition data unit before, the data plate of retrieval will be decrypted, and then be decompressed.It is gone at storage shelf manager 302 The storage efficiency of weight is brought into cloud storage equipment 340, because database block object will be shared containing multiple data set backups are crossed over Composition data unit.After Successful migration, storage shelf manager 302, which may depend on, to be dominated SLP and sets from managed storage It is standby to remove data set A cloud backup object 303 and composition data mapping 305B.For example, SLP allows data set backup to be present in tool On the multiple layers for having overlapping retention period.Confirm after the removing and/or depending on dominating SLP and move to cloud storage equipment After 340, the migration of the cloud backup of data set A can be considered as completely, and storing shelf manager 302 can notify request to back up Application data collection A has been successfully stored in cloud storage equipment 340.Storage shelf manager 302 can also be removed or be evicted from The composition data unit of the data set A of other data sets is not formed.
At a time, similar operation can be applied to the first number of data set B cache backup by storage shelf manager 302 According to 313 and composition data mapping 311.Storage shelf manager 302 will replicate and to modify metadata 313 standby to generate data set B cloud Part object.The composition data mapping 311 that storage shelf manager 302 is also quoted duplication by metadata 313.Store shelf manager 302 quote more new data set B cloud backup object in the copy of composition data mapping 311.Then, storage shelf manager 302 will Such as modification migrating data collection B cloud backup object and the composition data mapping 311 above in relation to reference described in data set A Copy.Cloud storage equipment 340 is quoted and had not migrated into migration also by the copy of composition data mapping 311 by storage shelf manager Those of data plate 309.For this example, when migrating data collection A backup, storage shelf manager 302 has been created Database block object 321.Therefore, the migration of data set B backup is by reference data plate object 321.Storing shelf manager 302 can The migration of data plate is tracked using different technologies.Storage shelf manager 302 can be moved to the success in cloud storage equipment The instruction of shifting is come locally flag data plate and/or maintains to list to have migrated to another accumulation layer in managed accumulation layer Data plate independent data structure and identify target storage layer.Storage shelf manager 302 can also use cloud service API The function of definition is to determine whether data plate has been migrated to cloud storage equipment 340.
Fig. 4 describes the cache backup object based on data set efficiently to create the exemplary operation of cloud backup object Flow chart.For the consistency with Fig. 2-3, Fig. 4 is related to the storage shelf manager operated.The operation of Fig. 4 is provided such as Fig. 2 The exemplary illustration of an indicated embodiment for creating another backup expression in box 208.In the operation of Fig. 4 At the time of beginning, storage shelf manager has determined that data set backup will indicate there are two having.
At box 402, the copy of storage shelf manager creation cache backup object, to create cloud backup object. Different identifiers by copy backup object but is assigned to copy by duplication operation.This identifier can be generated by operating system And it is assigned to copy, storage shelf manager executes in the operating system.
At box 403, storage shelf manager modifies copy by the instruction of the backup object of cloud-type of object.Accumulation layer Title/routing update copy that manager can also be provided by backup application.For example, cache backup object and cloud are standby Part object can be written to different paths and/or the different sets of the storage medium corresponding to object type.
At box 404, the duplication of storage shelf manager is mapped by the composition data of the copy reference for cloud backup object.When When storing shelf manager duplication cache backup object, the reference of composition data mapping is also replicated.
At box 408, storage shelf manager updates cloud backup object to quote the copy of composition data mapping.Accumulation layer Manager replicates composition data mapping and the reference from cloud backup object is updated the copy to composition data mapping, with true It is distinguishing for protecting backup object.Cloud backup object will no longer affect to the variation of cache backup object.
Fig. 5 is that the cloud backup object and lower data in arranging data plate move to object-based cloud storage equipment Exemplary operation flow chart.Fig. 5 is related to the operation indicated by the level segment C of Fig. 3.
At box 501, storage shelf manager detects migration triggering.Migration triggering can be sent from backup application. Migration triggering, which can be, detects retention period NP(the retention period NPProvide cache backup object retention period) expire And/or in relation to the expired notice.Although retention period, embodiment can be defined for each type of backup object Setting multiple retention periods can be backed up for data sets.If the cloud backup object of data set backup exists, accumulation layer pipe Reason device can be used as default action and the expired of the retention period of data set backup be construed to expiring for cache backup object.Separately Outside, backup application or other entities can notify storage shelf manager NPIt has expired, and instruction can be conveyed standby to migrate cloud Part object.
Box 503 starts the process circuit in the exemplary operation of Fig. 5, so that, in conjunction with box 509 (lower section), make box The 505 and 507 substantially each data plates quoted for the composition data mapping by cloud backup object are repeated at least once more.Pass through " substantially each data plate ", meaning under certain condition, may not by the specific data plate of composition data mapping reference Including in the process circuit formed by box 503 and 509.Come really for example, storage shelf manager can carry out additional operation Whether fixed number has moved according to plate to cloud target, then avoids the operation for migrating identical data plate again.Store shelf manager Access cloud backup object is mapped with determining by the composition data that cloud backup object is quoted.It is reflected with the composition data of cloud backup object It penetrates, the reference that storage shelf manager can start the composition data unit in mapping composition data is iterated.
During each iteration in the process circuit established by box 503 and 509, at box 505, shelf manager is stored Object is created in cloud target for the releasing reference data plate of iteration.Storage shelf manager is to release reference data plate wound Build object.For example, storage shelf manager can call the function defined by the Application Programming Interface of cloud service supplier to create Object.One in argument of function can be data plate, perhaps from compression and enciphering transformation, and function another Independent variable can be the object key for the object that will just creating for identification.
At box 507, storage shelf manager updates composition data mapping, to indicate the database block object for creation Object key.Storing shelf manager will be finally with the object key with the database block object of identification creation rather than to backup The reference of data plate at device is updated to be mapped by the composition data that cloud backup object is quoted.
At box 509, storage shelf manager is determined whether there is by another database of composition data mapping reference Block.If it is present, control is back to box 503 with the data plate for handling next reference.Otherwise, control continues Box 511.
At box 511, backup application is mapped in cloud target with composition data and creates object.In the data of reference After plate moves in cloud target, composition data mapping include database block object key rather than to low delay accumulation layer or The reference of the data plate in layer (that is, relative to storage shelf manager local) is locally stored.If from cloud target retrieval group At data mapping object, then database block object key will be used to retrieve required data plate.Composition data mapping pair As will still include restoring the information of the composition data unit of data set (for example, location information, solution confidential information, compressing information Deng).
At box 513, after confirmation composition data mapping object has been created, storage shelf manager updates cloud backup Object, to indicate the object key of composition data mapping object.Substantially, cloud of the shelf manager to map composition data is stored It quotes (that is, object key) and replaces local reference.
At box 515, shelf manager is stored with cloud backup object and creates object in cloud target.For example, accumulation layer pipe Reason device calls previously mentioned creation objective function in the case where cloud backup object is as independent variable.Storage shelf manager can make Use the identifier of cloud backup object as object key, or can be from the identifier of cloud backup object or the title of exposure (for example, text Part system handle) derived object key.
At box 517, storage shelf manager generates the instruction retained in cloud target in relation to data set backup.Storage Data set backup can be stored in cloud target and be communicated to backup application by shelf manager, and provide cloud backup object Object key.
At box 519, storage shelf manager is from associated low delay accumulation layer removal cloud backup object and and by cloud The composition data mapping of backup object reference.If not by the composition data unit in the data plate of composition data mapping reference By other object references, then can be removed by garbage collection.
The removal of the composition data unit of the data set backup of migration can carry out in different ways.Fig. 6 describes as from originally Ground/low delay accumulation layer deletes the part of the cached representation of data set backup to remove the exemplary behaviour of composition data unit The flow chart of work.The rubbish that Fig. 7 is incorporated to the composition data unit of the expired cached representation in relation to removing data set backup is received Collection aspect.
Fig. 6 is the cached representation of the release data set backup after the cloud of data set backup indicates to move to cloud target Exemplary operation flow chart.Storage shelf manager restores triggering in response to memory space and removes cached representation.It deposits The example that triggering is restored in storage space includes that data set backup moves to the completion of different accumulation layers, in relation to deleting data from current layer The request of collection backup, and/or retention period associated with current layer, are expired.
At box 601, storage shelf manager detects the triggering based on retention period to remove cache backup.It is based on The triggering of retention period expires corresponding to data retention period Np's.However, triggering is not necessarily expiring for retention period.Triggering can be pair The Successful migration for answering cloud to back up is triggered in response to expiring for retention period.
At box 602, storage shelf manager generates the composition data unit quoted by cache backup metadata List.Storing shelf manager can be with the reference (for example, logical address) of composition data unit and/or with the knowledge of composition data unit It Fu not (for example, block number) filling array, hash table, lists of links etc..
At box 603, storage shelf manager starts to process each of the composition data unit indicated in list.It deposits Reservoir management device traversal of lists and select the composition data unit of each instruction for processing.
At box 605, storage shelf manager determines whether the metadata of another cache backup quotes selected group At data cell.If fingerprint database or associated structure recognition are quoted by the object of the data of fingerprint representation, deposit Reservoir management device can make the determination whether selected composition data unit is shared with fingerprint database.If fingerprint database or Associated structure is unidentified to be related to object, then storage shelf manager can traverse the backup of all caches with determination it is any its Whether the backup of its cache is related to selected composition data unit.In some embodiments, storage shelf manager, which may have access to, refers to Line database whether there is with the entry for determining selected composition data unit.If entry is not present or if reference counter quilt Setting is to 1, then storage shelf manager can proceed with, like selected composition data list is quoted in no other cache backups Member.If reference counter is greater than 1, storage shelf manager continues with the additional reference of determination to be from cache Backup or cloud backup.If selected composition data unit is quoted in the backup of another cache, control flow to box 606.Otherwise, control proceeds to box 607.
At box 606, storage shelf manager removes the instruction of selected composition data unit from list.If another is high Selected composition data unit is quoted in speed caching backup, is unsuitable then discharging selected composition data block.
At box 607, storage shelf manager determines whether list includes another composition data unit not yet selected. If including control flowing returns to box 603.Otherwise, control flow to box 609.
At box 609, storage shelf manager deletes all compositions still indicated in lists from low delay accumulation layer Data cell.At this point, list should only indicate the composition data unit of the cache backup reference only to be expired by retention period.Yun Bei Part should not quote not by the composition data unit of its corresponding cache backup reference.Thus, the backup of reference cloud may will be by It removes or is just removed.
Fig. 7 is the sky for restoring storage shelf manager by storage shelf manager after the retention period of cache backup expires Between exemplary operation flow chart.Storage space is restored to be sweeping extensively by substantially all data blocks on low delay layer Retouch realization.Fig. 7 restores to have migrated but still resided in the storage space of the data cell on managed layer.
At box 701, storage shelf manager detects expiring for the retention period of cache backup.Similar to 601, base In triggering the expiring corresponding to data retention period Np of retention period.However, triggering is not necessarily expiring for retention period.Triggering to be The Successful migration of corresponding cloud backup, is triggered in response to expiring for retention period.
At box 702, storage shelf manager generates the composition data unit quoted by cache backup metadata List.Similar to 602, storage shelf manager can be with the reference (for example, logical address) of composition data unit and/or to form number According to identifier (for example, block number) the filling array of unit, hash table, lists of links etc..
At box 703, storage shelf manager starts to scan by storage shelf manager for the data cell in accumulation layer The memory space of the accumulation layer of management.Each data cell that storage shelf manager encounters during scanning is known as selected data list Member.For Fig. 7 description during scanning by data cell rather than composition data unit is used in operation because discovery number Any data set may not be formed according to unit.
At box 705, storage shelf manager determines whether the metadata of another cache backup quotes selected number According to unit.Similar to the 605 of Fig. 6, how storage shelf manager, which makes this, is determined depending on backup maintenance for data sets Information, such as the specific implementation of fingerprint database.If fingerprint database or the reference of associated structure recognition are by fingerprint representation Data object, then storage shelf manager the determination whether selected data unit is shared can be made with fingerprint database. If fingerprint database or associated structure is unidentified is related to object, storage shelf manager can traverse all caches Whether backup is related to selected data unit with any other cache backup of determination.In some embodiments, accumulation layer pipe Reason device may have access to fingerprint database whether there is with the entry for determining selected composition data unit.If entry is not present or entry In the presence of and reference counter be set to 1, then storage shelf manager can proceed with, like no other caches are standby Part reference selected data unit.If reference counter is greater than 1, storage shelf manager continues to draw so that determination is additional Be from cache backup or cloud backup.If selected composition data unit is quoted in the backup of another cache, Control flow to box 707.Otherwise, control proceeds to box 709.
At box 707, if indicated in lists, storage shelf manager removes the finger of selected data from list Show.Selected data unit can be backed up by another cache rather than current cache backup reference, in this case, Selected data unit will be not present in list.Control flow to box 713 from box 707.
If storage shelf manager determines that the metadata of another cache backup is unreferenced selected at box 705 Data cell, then, at box 709, storage shelf manager is determined: 1) whether the list of composition data unit includes selected number According to unit and 2) cloud backup metadata whether also quote selected data unit.Storage shelf manager makes this determination, to know Those of do not quoted by the metadata of the metadata of cache backup and the cloud backup still resided in managed accumulation layer Selected data unit.If list indicates selected data unit and the metadata of cloud backup quotes selected data unit, Control flow to box 711.Otherwise, control flow to box 713.
At box 711, the instruction of the selected data unit in shelf manager flag column table is stored.Store shelf manager with Data flag (for example, place value or multiple bit value) label instruction.Data flag, which is used to identify, will retain the composition in cloud storage layer Data block.After the completion of box 711, stream may then continue to carry out box 713.
At box 713, whether storage shelf manager determination is completed by the scanning of the accumulation layer of accumulation layer manager administration. If completed, control flow to box 715.Otherwise, control flowing returns to box 703.
At box 715, storage shelf manager migration still indicates in lists and forms number with data flag marker According to unit.It stores shelf manager and the data cell marked in lists is moved into cloud storage layer.This migration can be in cloud Object is created in accumulation layer.As mentioned in the description for 709, the composition data unit in cloud storage layer is moved to by cloud The metadata of backup is quoted.Cloud backup is not necessarily the cloud backup corresponding to cache backup.In other words, accumulation layer is swept Retouch the migration for ensuring the composition data unit of the metadata reference by moving to the cloud not yet completed in cloud storage layer backup.Or Person, since the reference of cloud backup metadata is for example not yet ready for moving to the data plate in cloud storage layer, even if corresponding height Speed caching backup expires, and cloud backup still cannot migrate.Indicate in lists do not have markd composition data unit indicate by The composition data unit of the metadata reference of cache backup, but for the composition data unit, cloud backup has been moved Move on to cloud storage layer.
At box 717, after moving to cloud storage layer, store in shelf manager delete list indicate unmarked and Both label composition data cells.That is, once the migration of the label composition data cell indicated in list is successfully It completes, then storing shelf manager can proceed with the composition data unit indicated in delete list or make the composition data list Member is expired without considering to mark.The composition data unit for those of not marking list is the composition data list previously migrated Member is perhaps the shared data unit for the migration of another data set backup.Therefore, their removal can be considered as rubbish receipts Collection, while assuming that idempotent migrates, also avoid the resource expenditure for migrating them again.
Variation
Foregoing exemplary illustrates to be related in the case where multiple expressions of data set backup according to two reservation period management numbers It is backed up according to collection.However, embodiment can create multiple expressions of data set backup, to promote several retention periods and greater than two Accumulation layer.For example, each accumulation layer can trustship storage shelf manager.Storage shelf manager hosted data collection at accumulation layer N is standby The expression N of part.The related data set backup of storage shelf manager can be notified to be subjected to retention period NNWith retention period NN+1,J1And NN+1,J2, It is both middle to be greater than NN.Symbol J1 and J2 indicate the migration target under different jurisdictions.Based on multiple retention periods, shelf manager is stored Creation indicates NN+1,J1With expression NN+1,J2.Work as NNWhen expiring, storage shelf manager will indicate NN+1,J1It moves under jurisdiction J1 Cloud target, and will indicate N NN+1,J2Move to the cloud target under jurisdiction J2.
In addition, migration be not necessarily performance capability it is lower and lower (for example, reliability it is lower and lower or access delay it is more next It is higher) accumulation layer.In some cases, the expired of retention period that data set backup indicates can trigger higher carry out accumulation layer Migration.In order to illustrate the financial documentation of enterprise can be migrated to high access delay accumulation layer up to 9 months, then move to low Access delay accumulation layer reaches the duration in season of declaring dutiable goods.
In addition, terms used herein are flexible to a certain extent.For example, exemplary illustration is related to data set backup Expression.Then the metadata and data set of data set backup are resolved into the expression of disclosure permission data set backup.Then show Example property explanation maps separated from meta-data at backup object and data.Logically, the different of data set backup indicate to be regarded For the difference backup of data set.Although different backups can composition data unit having the same (for example, being related to identical data Block, panel or plate), but metadata is identified as difference, allow to back up individually being manipulated and being accessed.In order to illustrate standby Part device may be in response to the request of backup file EX1 and create backup file EX1_Cache, and create backup file EX2_ Cloud.Two files EX1_Cache and EX2_Cloud have the pointer of parsing same composition data cell.Backup file is that have Difference, but file initially shares identical composition data unit, because they back up identical data set.
Flow chart is provided to help to understand explanation, and is not limited to the range of claims.Flow chart description can The exemplary operation changed within the scope of the claims.It can carry out additional operation;It can carry out less operation;It can be parallel Ground is operated;And it can be operated by different order.It will be understood that flow chart illustrates and/or each box of block diagram, and Flow chart illustrates and/or the combination of the box in block diagram can be realized by program code.Program code be can provide to general meter The processor of calculation machine, special purpose computer or other programmable machines or device.
As will be understood, the aspect of the disclosure can be presented as the system being stored in one or more machine readable medias, Method or program code/instruction.Therefore, can take herein can be all collectively referred to as circuit, " module " or " system " for aspect Hardware, the combined form in terms of software (including firmware, resident software, microcode etc.) or software and hardware.Exemplary Be rendered as in explanation separate modular/unit function can according to platform (operating system and/or hardware), application program ecosystem, Any one of interface, programmer's preference, programming language, administrator preferences etc. are subject to tissue in different ways.
It can use any combination of one or more machine readable medias.Machine readable media can be machine-readable signal Medium or machine readable storage medium.Machine readable storage medium can be used to store program code such as, but not limited to use Any one of electronics, magnetism, optics, electromagnetism, infrared or semiconductor technology or combined system, device or equipment.Machine The particularly example (non-exhaustive list) of readable storage medium storing program for executing will include the following: portable computer diskette, hard disk, random access It is memory (RAM), read-only memory (ROM), erasable programmable read only memory (EPROM or flash memory), portable Compact disc read-only memory (CD-ROM), optical storage apparatus, magnetic storage apparatus, or previously every any suitable combination.At this In the context of a document, machine readable storage medium can be can to contain, or storage is for by instruction execution system, device, or Any tangible medium for the program that equipment uses or combine described instruction execution system, device or equipment to use.It is machine readable to deposit Storage media is not machine-readable signal medium.
Any appropriate medium transmission, any appropriate medium can be used in the program code embodied on a machine-readable medium Including but not limited to wireless, wired, fiber optic cables, RF etc., or previously every any suitable combination.
Computer program code for executing the operation in terms of being used for the disclosure can use one or more programming languages Any combination is write, and the programming language includes the programming language of object-oriented, such asProgramming language, C++ etc.;Dynamic is compiled Cheng Yuyan, such as Python;Scripting language, such as Perl programming language or PowerShell scripting language;And conventional process programming Language, such as " C " programming language or similar programming language.Program code can execute on stand-alone machine completely, may span across multiple machines Device executes with a scattered manner, and can execute on one machine, and provides result on another machine and/or receive input.
Program code/instruction may also be stored on machine readable media, the machine readable media can guidance machine with spy Determine mode to work, so that it includes implementing in flow chart and/or one or more that the instruction that is stored in machine readable media, which generates, The product of the instruction of function action specified in a block diagram block.
Fig. 8 describes the exemplary memory system with storage shelf manager, and the system is based on multiple retention periods and generates Multiple expressions of data set backup.Storage system includes that processor unit 801 (may include multiple processors, multiple kernels, more A node, and/or implement multithreading etc.).Storage system includes memory 807.Memory 807 can for system storage (for example, Cache, SRAM, DRAM, zero capacitor RAM, pair transistor RAM, eDRAM, EDO RAM, DDR RAM, EEPROM, NRAM, RRAM, SONOS, PRAM etc.) or the already described above of machine readable media any one or more of be able to achieve.Storage System further include bus 803 (for example, PCI, ISA, PCI-Express,Bus,Bus, NuBus etc.) and network interface 805 (for example, fiber channel interface, Ethernet interface, internet are small Type computer system interface, sonet interface, wireless interface etc.).System further includes storage shelf manager 811 and storage medium collection Close 815.Multiple expressions that shelf manager 811 creates data set backup are stored, and are permitted based on corresponding retention period and accumulation layer Perhaps the life cycle management of each of expression.When storage system is cache accumulation layer or low access delay accumulation layer When, storage shelf manager 811 will create multiple expressions, and storage each of can will be indicated into storage medium 815 Mutual exclusion storage medium, or the mutual exclusion logic container at least in storage medium set 815.Storage medium 815 can be disk array, sudden strain of a muscle Deposit the mixing array etc. of array, flash memory and disk unit.When removing composition data unit from accumulation layer, shelf manager is stored 811 save the composition data list shared due to duplicate removal by other expressions of other data set backups of managed accumulation layer Member.The information created by duplicate removal program code can be used to determine which composition data unit by managed in storage shelf manager 811 The backup for managing accumulation layer indicates shared and does not indicate shared by the backup.Any one of previously described function can part Ground (or fully) realize within hardware and/or on processor unit 801.For example, function can use application-specific IC reality It is existing, it is logically implemented in processor unit 801, or the coprocessor being implemented on peripheral equipment or card is medium.In addition, real It now may include the less or additional component that is not illustrated in Fig. 8 (for example, video card, audio card, additional network interfaces, peripheral equipment Deng).Processor unit 801 and network interface 805 are connected to bus 803.Although being illustrated as being connected to bus 803, storage Device 807 can be connected to processor unit 801.
Multiple examples are provided to for here depicted as the component of single instance, operation or structure.Finally, various Boundary between component, operation and data storage is arbitrary to a certain extent, and specific operation is illustrated in specific theory In the context of bright property configuration.Other distribution of function are conceived to and can belong to the scope of the present disclosure.In general, in example Property configuration in be rendered as the structure and function of separate part and can realize as composite structure or component.Similarly, it is rendered as single portion The structure and function of part can be realized as separate part.These and other variation, modification, addition, and improvement can belong to the disclosure Range.
Term
For efficiency and convenient for explaining, this is described, and use is related with cloud to write a Chinese character in simplified form term.When being related to " cloud ", this A description is just being related to the resource of cloud service supplier.For example, cloud can cover the server of cloud service supplier, virtual machine, and deposit Store up equipment.Term " cloud storage equipment " and " cloud storage layer " are related to the logical collection of " cloud target ".Term " cloud target " is related to having There is the entity of network address, the network address is used as the endpoint of network connection.Entity can be physical equipment (for example, clothes Business device) or can be pseudo-entity (for example, virtual server or virtual memory facilities).More generally, consumer is addressable Cloud service supplier resource is possessed/is managed by cloud service provider entity by being connected to the network addressable resource.In general, visiting Ask it is according to the Application Programming Interface or Software Development Kit that are provided by cloud service supplier.
Use of the phrase "...... at least one" together with conjunction "and" before enumerating is not construed as mutual exclusion column It lifts, and is not construed as enumerating for the type with a project from each type, unless otherwise stated.Listed items In only one, one or more of multiple and listed items in listed items and another unlisted project can It is disagreed with the subordinate sentence of narration " at least one of A, B and C ".

Claims (12)

1. a kind of method comprising:
Detect the related triggering that expires with the first retention period, first retention period is related to the first backup of data set Connection, wherein first backup is the backup of the first kind;With
After detecting the triggering,
Identify multiple composition data units of the data set in the first accumulation layer;
It determines in the multiple composition data unit not shared with another backup of the first kind of different data collection One or more of set;
The multiple composition data unit is moved into the second accumulation layer from first accumulation layer;With
The set of one or more of composition data units is removed from first accumulation layer.
2. the method as described in claim 1, wherein migration includes:
Object is created for each of the multiple composition data unit in second accumulation layer.
3. the method as described in any one of claims 1 to 2, wherein determining not shared one or more of composition datas The set of unit includes:
First accumulation layer is scanned to determine the reference to the multiple composition data unit;
Determine the backup for corresponding to the reference;With
Determine the type for corresponding to the backup of the reference.
4. the method as described in any one of claims 1 to 2, wherein determining not with another backup of the first kind altogether The set for the one or more of composition data units enjoyed includes:
Identify multiple backups that at least one of the multiple composition data unit is shared in first accumulation layer;With
For each of the multiple backup, determine whether the backup has the first kind.
5. the method as described in any one of claims 1 to 3 or claims 1 to 2 and 4,
Wherein identify that the multiple composition data unit includes creation data structure, the data structure includes the multiple composition The identifier of data cell;
Wherein determine the collection of one or more of composition data units not shared with another backup of the first kind Conjunction includes,
It determines by those of in the shared composition data unit of another backup of the first kind;
Identification is removed by the shared composition data unit of another backup of the first kind from the data structure Those of identifier.
6. method as claimed in claim 5, wherein determining described one shared not with another backup of the first kind The set of a or multiple composition data units further include:
It determines by those of in the shared the multiple composition data unit of another backup of Second Type;With
Marker recognition is by the knowledge those of in the shared the multiple composition data unit of another backup of Second Type Not Fu in those of,
The set for wherein migrating the composition data unit, which is included in from the data structure, removes identification by the first kind The shared composition data unit of another backup in those of the identifier after migrate still in the data The each composition data unit identified in structure.
7. method as claimed in claim 5, wherein the first kind corresponds to first retention period and the first access is prolonged When, and the second category corresponds to second retention period longer than first retention period and corresponds to than first visit Ask the second high access delay of delay.
8. the method as described in any preceding claims, further include modify the data set the second backup it is described to indicate Multiple composition data units of the migration in second accumulation layer rather than the multiple composition in first accumulation layer Data cell, wherein second backup is the backup of Second Type.
9. method according to claim 8 further includes the second backup after detecting the triggering by the modification Move to second accumulation layer.
10. a kind of machine readable media, having can be executed by processor to carry out the program of the method as described in claim 1 Code.
11. a kind of deduplication storage comprising:
Processor unit;With
Machine readable media comprising can be executed by the processor unit so that the deduplication storage performs the following operation Program code,
The multiple of the data set backup are created in the first accumulation layer for multiple retention periods associated with data set backup It indicates,
Wherein each of the multiple expression includes about the metadata of the data set backup and to the composition data Collect the reference of multiple data cells of backup,
Wherein first in the multiple expression indicates first corresponding to first accumulation layer and in the multiple retention period Retention period,
Wherein first accumulation layer corresponds to the deduplication storage;
Second in the multiple data cell and the multiple expression is indicated to copy to the second accumulation layer, wherein described second Accumulation layer corresponds to the second retention period in the multiple retention period;With
After first retention period expires, restore not by the expression of another data set backup of first accumulation layer The memory space occupied those of in the multiple data cell of reference.
12. deduplication storage as claimed in claim 11, wherein the said program code for being used to replicate includes can be by described Processor unit executes so that the program code that the deduplication storage performs the following operation:
The first set of one or more objects is created in second accumulation layer for the multiple data cell, wherein institute State the storage of the second accumulation layer objective for implementation;With
With the reference of the first set to one or more of objects rather than to described first in second accumulation layer The object that the reference creation described second of the multiple data cell in accumulation layer indicates.
CN201780031635.XA 2016-03-25 2017-03-24 Multiple data set backup versions across multiple tiers of storage Active CN109154905B (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US15/081,546 2016-03-25
US15/081,546 US10620834B2 (en) 2016-03-25 2016-03-25 Managing storage space based on multiple dataset backup versions
PCT/US2017/024156 WO2017165857A1 (en) 2016-03-25 2017-03-24 Multiple dataset backup versions across multi-tiered storage

Publications (2)

Publication Number Publication Date
CN109154905A true CN109154905A (en) 2019-01-04
CN109154905B CN109154905B (en) 2022-03-25

Family

ID=58548880

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201780031635.XA Active CN109154905B (en) 2016-03-25 2017-03-24 Multiple data set backup versions across multiple tiers of storage

Country Status (4)

Country Link
US (1) US10620834B2 (en)
EP (1) EP3433739B1 (en)
CN (1) CN109154905B (en)
WO (1) WO2017165857A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11310137B2 (en) 2017-02-05 2022-04-19 Veritas Technologies Llc System and method to propagate information across a connected set of entities irrespective of the specific entity type
US11429640B2 (en) 2020-02-28 2022-08-30 Veritas Technologies Llc Methods and systems for data resynchronization in a replication environment
US11531604B2 (en) 2020-02-28 2022-12-20 Veritas Technologies Llc Methods and systems for data resynchronization in a replication environment
US11928030B2 (en) * 2020-03-31 2024-03-12 Veritas Technologies Llc Optimize backup from universal share

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107870728B (en) * 2016-09-23 2021-02-09 伊姆西Ip控股有限责任公司 Method and apparatus for moving data
US11169960B2 (en) * 2017-06-29 2021-11-09 Ashish Govind Khurange Data transfer appliance method and system
US10721304B2 (en) 2017-09-14 2020-07-21 International Business Machines Corporation Storage system using cloud storage as a rank
US10372363B2 (en) 2017-09-14 2019-08-06 International Business Machines Corporation Thin provisioning using cloud based ranks
US10581969B2 (en) * 2017-09-14 2020-03-03 International Business Machines Corporation Storage system using cloud based ranks as replica storage
US10817204B1 (en) * 2017-10-11 2020-10-27 EMC IP Holding Company LLC Migration of versioned data between storage devices
US10936238B2 (en) * 2017-11-28 2021-03-02 Pure Storage, Inc. Hybrid data tiering
US10990282B1 (en) 2017-11-28 2021-04-27 Pure Storage, Inc. Hybrid data tiering with cloud storage
US11436344B1 (en) 2018-04-24 2022-09-06 Pure Storage, Inc. Secure encryption in deduplication cluster
US11392553B1 (en) 2018-04-24 2022-07-19 Pure Storage, Inc. Remote data management
WO2019209392A1 (en) * 2018-04-24 2019-10-31 Pure Storage, Inc. Hybrid data tiering
US11106378B2 (en) 2018-11-21 2021-08-31 At&T Intellectual Property I, L.P. Record information management based on self describing attributes
US11853575B1 (en) 2019-06-08 2023-12-26 Veritas Technologies Llc Method and system for data consistency across failure and recovery of infrastructure
US11593215B2 (en) * 2020-02-05 2023-02-28 EMC IP Holding Company LLC Method and system for generating immutable backups with configurable retention spans
US11593017B1 (en) * 2020-08-26 2023-02-28 Pure Storage, Inc. Protection of objects in an object store from deletion or overwriting
US11436103B2 (en) * 2020-10-13 2022-09-06 EMC IP Holding Company LLC Replication for cyber recovery for multiple tier data

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101499924A (en) * 2008-01-31 2009-08-05 杭州美创科技有限公司 On-line switchover method for computer production system
US20120095968A1 (en) * 2010-10-17 2012-04-19 Stephen Gold Storage tiers for different backup types
US20120117029A1 (en) * 2010-11-08 2012-05-10 Stephen Gold Backup policies for using different storage tiers
CN102917072A (en) * 2012-10-31 2013-02-06 北京奇虎科技有限公司 Device, system and method for carrying out data migration between data server clusters
CN102982085A (en) * 2012-10-31 2013-03-20 北京奇虎科技有限公司 System and method of data migration
CN103544075A (en) * 2011-12-31 2014-01-29 华为数字技术(成都)有限公司 Data processing method and system
US20150261792A1 (en) * 2014-03-17 2015-09-17 Commvault Systems, Inc. Maintaining a deduplication database

Family Cites Families (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5875481A (en) * 1997-01-30 1999-02-23 International Business Machines Corporation Dynamic reconfiguration of data storage devices to balance recycle throughput
US6088694A (en) 1998-03-31 2000-07-11 International Business Machines Corporation Continuous availability and efficient backup for externally referenced objects
US9075851B2 (en) 2003-12-09 2015-07-07 Emc Corporation Method and apparatus for data retention in a storage system
US7536424B2 (en) * 2004-05-02 2009-05-19 Yoram Barzilai System and methods for efficiently managing incremental data backup revisions
JP4377790B2 (en) * 2004-09-30 2009-12-02 株式会社日立製作所 Remote copy system and remote copy method
US8527468B1 (en) 2005-02-08 2013-09-03 Renew Data Corp. System and method for management of retention periods for content in a computing system
US8825971B1 (en) * 2007-12-31 2014-09-02 Emc Corporation Age-out selection in hash caches
US8484162B2 (en) * 2008-06-24 2013-07-09 Commvault Systems, Inc. De-duplication systems and methods for application-specific data
US20100199036A1 (en) * 2009-02-02 2010-08-05 Atrato, Inc. Systems and methods for block-level management of tiered storage
US20100274772A1 (en) 2009-04-23 2010-10-28 Allen Samuels Compressed data objects referenced via address references and compression references
US20100293147A1 (en) 2009-05-12 2010-11-18 Harvey Snow System and method for providing automated electronic information backup, storage and recovery
US8554735B1 (en) * 2009-05-27 2013-10-08 MiMedia LLC Systems and methods for data upload and download
US8356017B2 (en) * 2009-08-11 2013-01-15 International Business Machines Corporation Replication of deduplicated data
US8850142B2 (en) 2009-09-15 2014-09-30 Hewlett-Packard Development Company, L.P. Enhanced virtual storage replication
US8694469B2 (en) * 2009-12-28 2014-04-08 Riverbed Technology, Inc. Cloud synthetic backups
US8799413B2 (en) * 2010-05-03 2014-08-05 Panzura, Inc. Distributing data for a distributed filesystem across multiple cloud storage systems
US8473886B2 (en) 2010-09-10 2013-06-25 Synopsys, Inc. Parallel parasitic processing in static timing analysis
US9128948B1 (en) * 2010-09-15 2015-09-08 Symantec Corporation Integration of deduplicating backup server with cloud storage
US8909845B1 (en) * 2010-11-15 2014-12-09 Symantec Corporation Systems and methods for identifying candidate duplicate memory pages in a virtual environment
US8886901B1 (en) * 2010-12-31 2014-11-11 Emc Corporation Policy based storage tiering
US9715434B1 (en) * 2011-09-30 2017-07-25 EMC IP Holding Company LLC System and method for estimating storage space needed to store data migrated from a source storage to a target storage
US9262449B2 (en) 2012-03-08 2016-02-16 Commvault Systems, Inc. Automated, tiered data retention
US9116851B2 (en) 2012-12-28 2015-08-25 Futurewei Technologies, Inc. System and method for virtual tape library over S3
US9563517B1 (en) * 2013-12-30 2017-02-07 EMC IP Holding Company LLC Cloud snapshots
US11315197B2 (en) 2014-03-13 2022-04-26 Fannie Mae Dynamic display of representative property information with interactive access to source data
US10380072B2 (en) * 2014-03-17 2019-08-13 Commvault Systems, Inc. Managing deletions from a deduplication database
US10089185B2 (en) 2014-09-16 2018-10-02 Actifio, Inc. Multi-threaded smart copy
US11783898B2 (en) 2014-09-18 2023-10-10 Jonker Llc Ephemeral storage elements, circuits, and systems
US9659047B2 (en) * 2014-12-03 2017-05-23 Netapp, Inc. Data deduplication utilizing extent ID database

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101499924A (en) * 2008-01-31 2009-08-05 杭州美创科技有限公司 On-line switchover method for computer production system
US20120095968A1 (en) * 2010-10-17 2012-04-19 Stephen Gold Storage tiers for different backup types
US20120117029A1 (en) * 2010-11-08 2012-05-10 Stephen Gold Backup policies for using different storage tiers
CN103544075A (en) * 2011-12-31 2014-01-29 华为数字技术(成都)有限公司 Data processing method and system
CN102917072A (en) * 2012-10-31 2013-02-06 北京奇虎科技有限公司 Device, system and method for carrying out data migration between data server clusters
CN102982085A (en) * 2012-10-31 2013-03-20 北京奇虎科技有限公司 System and method of data migration
US20150261792A1 (en) * 2014-03-17 2015-09-17 Commvault Systems, Inc. Maintaining a deduplication database

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11310137B2 (en) 2017-02-05 2022-04-19 Veritas Technologies Llc System and method to propagate information across a connected set of entities irrespective of the specific entity type
US11429640B2 (en) 2020-02-28 2022-08-30 Veritas Technologies Llc Methods and systems for data resynchronization in a replication environment
US11531604B2 (en) 2020-02-28 2022-12-20 Veritas Technologies Llc Methods and systems for data resynchronization in a replication environment
US11847139B1 (en) 2020-02-28 2023-12-19 Veritas Technologies Llc Methods and systems for data resynchronization in a replication environment
US11928030B2 (en) * 2020-03-31 2024-03-12 Veritas Technologies Llc Optimize backup from universal share

Also Published As

Publication number Publication date
EP3433739B1 (en) 2020-02-05
US20170277435A1 (en) 2017-09-28
WO2017165857A1 (en) 2017-09-28
CN109154905B (en) 2022-03-25
EP3433739A1 (en) 2019-01-30
US10620834B2 (en) 2020-04-14

Similar Documents

Publication Publication Date Title
CN109154905A (en) Multiple data set backup versions of spanning multilayer storage
US11768803B2 (en) Snapshot metadata arrangement for efficient cloud integrated data management
US10489345B2 (en) Multiple retention period based representations of a dataset backup
US11188500B2 (en) Reducing stable data eviction with synthetic baseline snapshot and eviction state refresh
US20170277597A1 (en) Efficient creation of multiple retention period based representations of a dataset backup
JP6553822B2 (en) Dividing and moving ranges in distributed systems
CN104813321B (en) The content and metadata of uncoupling in distributed objects store the ecosystem
US9817835B2 (en) Efficient data synchronization for storage containers
CN109726044B (en) Efficient restoration of multiple files from deduplication storage based on data chunk names
US8464013B2 (en) Apparatus and method for on-demand in-memory database management platform
JP4414381B2 (en) File management program, file management apparatus, and file management method
CN104081391B (en) The single-instancing method cloned using file and the document storage system using this method
CN106775446A (en) Based on the distributed file system small documents access method that solid state hard disc accelerates
JP2009522677A (en) Method, system, and device for file system dump / restore by node numbering
US10915246B2 (en) Cloud storage format to enable space reclamation while minimizing data transfer
CN105046162B (en) The caching safeguarded in content addressable storage systems and father is mapped using son
US20220206991A1 (en) Storage system and data management method
Shmueli et al. The SURF System for Continuous Data and Applications Placement Across Clouds
CN112181899A (en) Metadata processing method and device and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: American California

Applicant after: NETAPP incorporated company

Address before: American California

Applicant before: Network Area Storage Technology Co., Ltd.

GR01 Patent grant
GR01 Patent grant