CN110226153A - Garbage collection system and process - Google Patents

Garbage collection system and process Download PDF

Info

Publication number
CN110226153A
CN110226153A CN201780073649.8A CN201780073649A CN110226153A CN 110226153 A CN110226153 A CN 110226153A CN 201780073649 A CN201780073649 A CN 201780073649A CN 110226153 A CN110226153 A CN 110226153A
Authority
CN
China
Prior art keywords
data
rear end
storage
block
cited
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201780073649.8A
Other languages
Chinese (zh)
Inventor
马克·莱斯利·考克斯
马克·亚力山大·休米·埃姆伯森
泰勒·韦恩·帕威尔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Pure Storage Inc
Original Assignee
Pure Storage Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US15/825,073 external-priority patent/US20180107404A1/en
Application filed by Pure Storage Inc filed Critical Pure Storage Inc
Publication of CN110226153A publication Critical patent/CN110226153A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0608Saving storage space on storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • G06F3/0641De-duplication techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0643Management of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/0652Erasing, e.g. deleting, data cleaning, moving of data to a wastebasket
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Disclose a kind of garbage collection process for data de-duplication storage system.In one implementation, a kind of method is disclosed to execute garbage collection, which efficiently works on extending out cluster and larger numbers of data.This method includes the data in the reference mapping of the data block in being stored by check object to determine which position in the rear end object in object storage is cited and which position is no longer by procedure reference, to compress the data extended out in the storage of the object in cluster.Rear end object in change object storage updates Hash location table to remove block number evidence from the position being no longer cited to remove the entry of removed block number evidence.

Description

Garbage collection system and process
Priority and related application
This application claims the U.S. Provisional Application No.62/427353 submitted on November 29th, 2016 and in 2017 The equity for the U.S. Provisional Application No.62/591197 that November 28 submitted;It and is in the U.S. submitted on May 19th, 2017 The part of number of patent application 15/600641 is continued, which submitted on October 20th, 2016 U.S. Patent Application No. 15/298897 part continue, the U.S. Patent Application No. 15/298897 require in November, 2015 The U.S. Provisional Application No.62/249885 that submits for 2nd, on August 10th, the 2016 U.S. Provisional Application No.62/ submitted 373328 and the U.S. Provisional Application No. 62/339090 submitted on May 20th, 2016 equity;The entirety of these applications It is incorporated herein by reference.
Technical field
The embodiment of these requirements protection is related to a kind of method for reducing data storage using data de-duplication, And more particularly, to the data that the repeated data in the memory to one or more Multi net voting capable servers are deleted Execute garbage collection.
Background technique
A kind of garbage collection system is disclosed, data object is stored in (one or more) using intermediary networking devices It is remotely located on object storage device.
Data de-duplication is a kind of special data compression technique, for eliminating duplicate data copy.It is usually used The storage equipment of special configuration completes the data de-duplication of data to reduce the carrying cost of data, which deposits Storage equipment has the data de-duplication engine for being internally directly connected to memory driver.
The data de-duplication engine in equipment is stored from outer equipment receiving data.Data de-duplication engine is according to depositing Reception data creation hash in Chu Biao.The table is scanned to determine whether previously to store identical hash in table.If It is no, then received data are stored in the storage of cloud object, and by the position indicator pointer of received data together with the received number of institute According to hash be collectively stored in the entry in table.When detecting the copy of received data, entry is stored in table, Index comprising hashing and being directed toward the position of cloud object stored memory storage repeated data.
The system has the data de-duplication engine for being directly coupled to storage inside driver, to maintain the low of hash table Delay and quick storage.However, data are stored in the storage of cloud object.
When the object by data de-duplication engine management is deleted by client, stored used in the storage of cloud object empty Between will not recycle immediately.Some block of informations may be by multiple object references, therefore the block being only no longer cited could be deleted by physics It removes and supplies its memory space.It was found that the block that is no longer the cited and process for discharging corresponding memory space is known as garbage collection.
It for data de-duplication engine is the problem of being most difficult to that garbage collection is executed in a manner of expanding to mass data One of.The complexity of propagation data exacerbates this difficulty in server cluster.
Summary of the invention
In one implementation, a kind of method for executing garbage collection is disclosed, this method is stored by compressed object In data block in data, have across the system (extending out cluster) being distributed on multiple servers and across larger numbers of data The work of effect ground.Compressed object storage in data include rear end object is stored in object storage in and check object storage draw With the data in mapping, with determine which position in the rear end object in object storage be cited in the map and which Position is no longer cited.Rear end object in change object storage is updated with removing block number evidence from the position being no longer cited Hash location table is to remove the entry of removed block number evidence.
This method describe a series of storage of message, data structure and data, can be used for for across extending out in cluster The data deduplication system of multiple server distributions executes garbage collection.
This method can be two stages process-tracking process, followed by compression process.Which position packet is tracking process determine Containing data that are still active or being cited.Compression process removes data from the position being no longer cited.
In another implementation, a kind of execution garbage collection is provided with the system of compressed data.The system includes depositing The object storage for storing up rear end object and one or more Multi net voting capable servers including object storage.The system is included in pair As the circuit of creation reference mapping in storage, reference mapping instruction is stored in which of the rear end object in object storage position It sets and is currently cited, and which position in the rear end object being stored in object storage is no longer cited.The system includes Change the rear end object that is stored in object storage with from being no longer cited in the rear end object being stored in object storage Position removes the circuit of block number evidence, and removal identifies the Hash location of the position of the block number evidence in the rear end object being removed The circuit of entry in table.
Detailed description of the invention
Specific embodiment is described with reference to the drawings.In the accompanying drawings, (one or more) leftmost number of reference label Mark first appears the figure of reference label.Similar or identical item is indicated using identical reference label in various figures.
Fig. 1 is that the simplification for the data de-duplication storage system that data de-duplication is executed using intermediary networking devices is shown It is intended to;
Fig. 2 is the rough schematic view and flow chart of storage system, and wherein the client application on client device passes through straight The application interface (API) for being connected to the storage of cloud object in succession is communicated;
Fig. 3 is the rough schematic view and flow chart of data de-duplication storage system, and wherein client application is via network It is communicated with the application interface (API) at the intermediate computing device for executing data de-duplication, and then will be counted via network It is stored according to storage to cloud object.
Fig. 3 A be substitute data de-duplication storage system rough schematic view and flow chart, wherein client application via Network is communicated with cluster is extended out, and it includes executing answering at multiple intermediate computing devices of data de-duplication that this, which extends out cluster, With interface (API), and then via transmitted data on network to be stored in the storage of cloud object.Fig. 3 A also shows intermediate computations and sets It is standby how garbage collection to be initiated by exchange message.
Fig. 4 is the rough schematic view of intermediate computing device shown in Fig. 3.
Fig. 5 is the process that data are carried out with storage and data de-duplication that intermediate computing device as shown in Figure 3 executes Flow chart.
Fig. 6 is the flow chart for showing the process that data are carried out with storage and data de-duplication.
Fig. 7 is to show the process that data are carried out with storage and data de-duplication executed on the client device of Fig. 3 Flow chart.
Fig. 8 is the datagram for illustrating how to divide data into block to be stored.
Fig. 9 is to show divided data block datagram how stored in memory.
Figure 10 is the datagram for showing the relationship between hash and data block stored in memory.
Figure 11 be show by file or object oriented be mapped to storage file location address file or Object table number According to figure.
Figure 12 is the rubbish receipts for showing the garbage collection for coordinating any selection StorReduce server in Fig. 3 A Collect the datagram of coordination process.
Figure 13 is in each key fragment (shard) shown on the StorReduce server in tracing figure 3A The datagram of the tracking process of reference.
Figure 14 is shown for each of compressing on storage StorReduce server in figure 3 a the number in piece fragment According to compression process datagram.
Figure 15 is the datagram for showing the compressed data process for compressing the data in the storage of cloud object, provides figure The more detailed view of processing shown in step 1414 in 14.
Specific embodiment
With reference to Fig. 1, data de-duplication storage system 100 is shown.Storage system 100 includes FTP client FTP 102, It is coupled to intermediary computing system 106 via network 104.Intermediary computing system 106 is coupled to the text of long range positioning via network 108 Part storage system 110.
Storage system 100 sends intermediary computing system 106 for data object via network 104.Intermediary computing system 106 Including being used to received data object being stored in document storage system 100, to be stored in file system 100 in data object The duplicate process of data object was reduced when upper.
Storage system 100 stores for the data on document storage system 110 and transmits the request to centre via network 104 Computing system 106.Intermediary computing system 106 is responded by the data that the repeated data obtained in file system 110 are deleted Request, and FTP client FTP 100 is sent by the data of acquisition.
With reference to Fig. 2, storage system 200 includes the client application 202 on client device 204, logical via network 206 Cross be directly connected to cloud object storage 204 application interface (API) 203 communicated.In one implementation, cloud object is deposited Storage can be the non-transient memory storage device with Coupled processors.
With reference to Fig. 3, data de-duplication storage system 300 is shown comprising client application 302, the client are answered The application interface (API) 311 at intermediate computing device 308 is transferred data to via network 304 with 302.It is set in intermediate computations Data de-duplication is carried out to data on standby 308, it then will be unique via network 310 and API 311 (API 203 in Fig. 2) Data are stored in the calculating equipment 312 of remote arrangement, for example, can usually be stored by the cloud object that object storage service manages System.
Exemplary network 304 and 310 include but is not limited to Ethernet local area network, wide area network, internet wireless local area network, The wireless wide area network of 802.11g standard network, WiFi network, the agreement of operation such as GSM, WiMAX or LTE etc.
The example of intermediate computing device 308 include but is not limited to physical server, personal computing devices, virtual server, Virtual Private Server, the network equipment and router/firewall.
The calculating equipment 312 of exemplary remote arrangement can include but is not limited to NetWare file server, object stores, is right As storage service, network attached device, with or without the Web server of WebDAV.
The example of cloud object storage includes but is not limited to OpenStack Swift, the storage of IBM cloud object and Cloudian HyperStore.The example of object storage service includes but is not limited toS3、AzureBlob clothes Business andCloud storage.
During operation, client application 302 is by providing the end API corresponding with the network address of intermediate equipment 308 Point (such as http://my-storereduce.com) 306 sends file via network 304 to be stored.Then, intermediate Equipment 308 carries out data de-duplication to file, as described herein.Then, intermediate equipment 308 will be through weight via API endpoint 311 The data that complex data is deleted are stored in the calculating equipment 312 of remote arrangement.In an example implementations, intermediate equipment On API endpoint 306 it is actually identical as the API endpoint 311 in the calculating equipment 312 of remote arrangement.
If client application needs to retrieve the data file of storage, client application 302 will send out the request of file It is sent to API endpoint 306.Intermediate equipment 308 is repeated by requesting via API endpoint 311 from the calculating equipment 312 of remote arrangement The data that data are deleted respond the request.Cloud object stores number of the 312 and API endpoint 311 by deleting repeated data The request is adapted to according to intermediate equipment 308 is returned to, the deletion of duplicate removal complex data is then carried out by intermediate equipment 308.Intermediate equipment File is returned to client application 302 via API 306 by 308.
In one implementation, equipment 308 and cloud object storage are present in the equipment 312 that same API is presented to network On.In one implementation, client application 302 stores and retrieves object using identical one group of operation.Preferably, in Between equipment 307 be almost transparent for client application.Client application 302 requires no knowledge about that there are intermediate 311 Hes of API Intermediate equipment 306.When the system (as shown in Figure 2) of never intermediate treatment facility 308 is moved to intermediate treatment facility When system, unique variation of client application 302 is in its configuration, and the position of the endpoint of its storing data has changed (example Such as, from http://objectstore to http://mystorreduce).The position of intermediate treatment facility can be physically Close to client application, to reduce the data volume across network 310, network 310 can be low bandwidth wide area network.
With reference to Fig. 3 A, substitution data de-duplication storage system 300a is shown comprising client application 302a, warp Storage reduction is transferred data to by network 304a and extends out cluster 305.Cluster 305 includes application programming interfaces (API) 306a, with And it is coupled to the load balancer 308 of 1 309a of server by server n 309n.1 309a of server to server n 309n is coupled to cloud object storage 312a via network 310a and API 311a.Calculate equipment 308 with can be exemplary network The load balancer of location http://my-storreduce.Server 309a-309n can be located at exemplary network address Http:// storreduce-1 to http://storreduce-n.
Data de-duplication is carried out to determine unique data to data using 1 309a of server to server n 309n.From The unique data that data de-duplication process determines is stored in far via network 310a and API 311a (API 211 in Fig. 2) On the calculating equipment 312a of journey arrangement, for example, providing the public cloud object storage system or private object of object storage service Storage system.
Exemplary network 304a and 310a include but is not limited to Ethernet local area network, wide area network, internet wireless local area network, The wireless wide area network of the agreement of 802.11g standard network, WiFi network, operation agreement such as GSM, WiMAX or LTE etc.
The example of load balancer 308a and server 309a-309n include but is not limited to physical server, individual calculus Equipment, virtual server, Virtual Private Server, the network equipment and router/firewall.
The calculating equipment 312a of exemplary remote arrangement can include but is not limited to NetWare file server, object stores, Object storage service, network attached device, the Web server with or without WebDAV.
The example of cloud object storage includes but is not limited to OpenStack Swift, the storage of IBM cloud object and Cloudian HyperStore.The example of object storage service includes but is not limited toS3、Azure Blob Storage andCloud storage.
During operation, client application 302a is by using the end API corresponding with 308 network address of load balancer Point (for example, http://my-storreduce.com) 306a sends file (request 1A) via network 304a to be stored. Load balancer 308 selects server to send request and forward request (1A), is sent to server 309a in this case. File is divided into data block and calculates each piece of hash by the coordination service device (309a).Each piece will be assigned based on its hash To fragment, and each fragment divides one be dispensed in server 309a-309n.Coordination service device sends each data block To the server (309a to 309n) for being responsible for the fragment, " key fragment and block fragment request " is shown as in figure.
Server 309a-309n respectively executes data de-duplication, (step as described herein to the data block for being sent to them Rapid 1b), and the data that repeated data are deleted are stored on the calculating equipment 312a of remote arrangement via API endpoint 311a (" 1C (fragment n) ") is arrived in the request " 1C (fragment 1) " in Fig. 3 A.API in an example implementations, in intermediate equipment Endpoint 306a is actually identical as the API endpoint 311a in the calculating equipment 312 of remote arrangement.
The location information of its block number evidence is respectively sent back coordination service device by server 309a-309n.Then, coordination service Device arrangement stores the location information.
If client application needs to retrieve the data file of storage, client application 302a is by the request to file (2A) is sent to API endpoint 306a.Load balancer 308 selects server to send request and forward request (2A), this In the case of be sent to server 309b.The coordination service device (309b) will acquire each piece in file of location information, including divide The fragment of each data block is matched.
In one implementation, coordination service device, which will be sent, requests to be responsible for the fragment to extract each data block (309a to 309n) is shown as " key fragment and block fragment request " to server in the figure.
Server 309a-309n is repeated by requesting via API endpoint 311a from the calculating equipment 312a of remote arrangement The data that data are deleted carry out response block fragment request, and (" 2B (fragment n) ") is arrived in the request " 2B (fragment 1) " in Fig. 3 A.Cloud object is deposited Storage 312a and API endpoint 311a returns to server 309a-309n by the data for deleting repeated data to adapt to request (" 2C (fragment n) ") is arrived in the response " 2C (fragment 1) " in Fig. 3 A.Block number evidence is returned to coordination service by server 309a-309n Device (is in this case server 309b).
Substitution implementation in, coordination service device by via API endpoint 311a directly from the calculating equipment of remote arrangement 312a obtains each data block.Cloud object stores 312a and API endpoint 311a and is returned by the data for deleting repeated data It adapts to request to coordination service device.
Then, coordination service device carries out the deletion of duplicate removal complex data to data.File (2E) generated is returned to load Balancer (308), then file is returned to client application 302a via API 306a by load balancer (308).
In one implementation, the cloud object on equipment 309a and equipment 312a stores to network and same API is presented.? In one implementation, client application 302a stores and retrieves object using identical one group of operation.Preferably, intermediate outer Group expanding collection 300a is almost transparent for client application.Client application 302a requires no knowledge about that there are intermediate API 306a Cluster 300a is extended out with centre.When the never intermediate system (as shown in Figure 2) for extending out cluster 300a is moved to middle When managing the system of equipment, unique variation of client application 302a is in its configuration, and the position of the endpoint of its storing data is Varied (for example, fromhttp://objectstoreTo http://mystorreduce).Centre extends out the position of cluster 300a Can be physically close to client application, to reduce the data volume of across a network 310, network 310 can be low bandwidth wide area network.
Respectively there is object key by the object that system 300a is managed, and these keys are referred to as close for object to be divided into The set of key fragment.Each key fragment is assigned to the server in cluster, which is then responsible for managing the key point The information of each object in piece.Specifically, it is taken about the information for the chunking for constituting object data by the key fragment of the object Business device management.
The unique data block managed by system 300 uses Secure Hash Algorithm to pass through their hash respectively to identify.It dissipates Column name space is divided into subset, referred to as block fragment.Each piece of fragment is assigned to the server in cluster, and the server is right The block for being responsible for hashing it subset for falling into hash name space afterwards operates.Specifically, block sliced service device can answer Problem " does is this block with the hash new/unique? or we have been stored? ".Block sliced service device is also responsible for Storage and retrieval hash fall into the block of its subset for hashing name space.During garbage collection, block sliced service device is collected simultaneously Merge the reference from each key fragment and map (as shown in figure 14), runs compression process (as shown in figure 15) then to remove The block being no longer cited.
Each piece of fragment, which is responsible for storing block to underlying object, stores (also referred to as " storage of cloud object ").It can be by multiple pieces It is grouped together into polymeric block, in this case, all pieces in the polymeric block lists being stored in underlying object storage In a " file " (object).
When by object writing system, each piece is hashed and is sent to appropriate piece of fragment, which will search block Hash, the reference of memory block data and return to block if block number evidence is uniquely.After all pieces of storage, from each Reference is arranged in a block fragment.Key, and reference of the corresponding key fragment storage to the block for constituting object are distributed for object List.
When reading back object from system, key is provided by client, and corresponding key fragment is provided to composition object Block reference list.For each reference, from cloud object memory scan block number evidence.Then all pieces of data are assembled and are incited somebody to action It returns to client.
When deleting object, key is provided by client, and corresponding key fragment deletes the information in relation to this object, The list of reference including the block to composition object.Without carrying out any change in the block fragment of these blocks.
After deleting object, each piece may not or may not still delete by other object references, therefore in this stage Any piece, also not recycling any memory space-, this is the purpose of garbage collection process.Object is deleted only from the key of object point The reference to the object to its data block is removed in piece.
Example Computing Device framework
In fig. 4 it is shown that using process 500 and 600 shown in Fig. 5-6 to store and retrieve repeated data respectively Selected module in the calculating equipment 400 of the data object of deletion.Equipment 400 is calculated (for example, intermediate computations shown in Fig. 3 Intermediate computing device 309a-n shown in equipment 308 and Fig. 3 A) it include processing equipment 404 and memory 412.Calculate equipment 400 may include one or more microprocessors, microcontroller or for accessing memory 412 (also referred to as non-state medium) Any such equipment and hardware 422.Calculating equipment 400, which has, to be suitable for storing and executing computer executable instructions Processing capacity and memory.
It calculates equipment 400 and executes the instruction being stored in memory 412, and in response to this, handle from hardware 422 Signal.Hardware 422 may include optional display 424, optional input equipment 426 and I/O communication equipment 428.I/O communication equipment 428 may include the network and telecommunication circuit for communicating with network 304,310 or external memory storage device.
Optional input equipment 426 receives the input from the user for calculating equipment 400, and may include keyboard, mouse, Tracking plate, microphone, audio input device, video input apparatus or touch-screen display.Optional display equipment 424 may include LED, LCD, CRT or any kind of display equipment enable a user to the letter that preview is stored or handled by calculating equipment 404 Breath.
Memory 412 may include the volatile and non-volatile that any method or technique of information for storage is realized Memory, removable and nonremovable medium, for example, computer readable instructions, data structure, program module or other data. This memory includes but is not limited to RAM, ROM, EEPROM, flash memory or other memory technologies, CD-ROM, digital versatile disc (DVD) or other light storage devices, cassette, tape, disk storage device or other magnetic storage devices, RAID storage are System can be used for storing information needed and can be by any other medium of computer system accesses.
Being stored in the memory 412 for calculating equipment 400 may include operating system 414, data deduplication system Using 420 and library or the database 416 of other application.Operating system 414 can be used using 420 to control and calculate equipment 400 Interior hardware and various component softwares.Operating system 414 may include the drive communicated for equipment 400 with I/O communication equipment 428 Dynamic program.Database or library 418 may include preconfigured parameter (or being arranged before or after initial operation by user), For example, server operation parameter, server library, the library HTML, API and configuration.Optional graphic user interface or life can be provided Row interface 423 is enabled so that can communicate with display 424 using 420.
It include receiver module 430, divider module 432, hashed value builder module 434, determiner/ratio using 420 Compared with device module 438 and memory module 436.
Receiver module 430 includes receiving one or more files from the calculating equipment 302 of remote arrangement via network 304 Instruction.Divider module 432 includes that one or more received files are divided into the instruction of one or more data objects. Hashed value builder module 434 includes that the instruction of one or more hashed values is created for one or more data objects.For creating Build hashed value exemplary algorithm include but is not limited to MD2, MD4, MD5, SHA1, SHA2, SHA3, RIPEMD, WHIRLPOOL, SKEIN, Buzhash, cyclic redundancy check (CRC), CRC32, CRC64 and Adler-32.
Determiner/comparator module 438 comprises instructions that, the instruction in response to from networked computing device (for example, setting Standby hosts applications 302) one of one or more appended documents including one or more data objects are received, by by this Or one or more of one or more hashed values of multiple data objects and the one or more records for being stored in storage table Hashed value is compared, come determine the one or more data object whether be previously stored in one or more remote arrangements One or more data objects in storage system (for example, equipment 312) are identical.
Memory module 436 includes the storage for one or more data objects to be stored in one or more remote arrangements Instruction at one or more location address of system (for example, using the calculating equipment 312 of the remote arrangement of API 311), with And for for each of one or more data objects, by one or more hashed values and one or more corresponding A location address is stored in the instruction in one or more records of storage table.Memory module further includes such instruction, if It one or more data objects and is previously stored in the storage system (for example, equipment 312) of one or more remote arrangements One or more data objects are identical, then by one or more hashed values of received one or more data objects and opposite The one or more location address answered are stored in one of the storage table for each received one or more data objects Or in multiple records, without by the received one or more numbers of institute identical with previously stored one or more data objects It is stored in the storage system (for example, equipment 312) of one or more remote arrangements according to object.
Fig. 5 and Fig. 6 shows the example process 500 and 600 for carrying out data de-duplication to storage for across a network.This The example process 500 and 600 of sample can be the set of the frame in logical flow chart, indicate can with hardware, software and its Combine the sequence for the operation realized.In the context of software, frame indicates computer executable instructions, when by one or more When managing device execution, the operation is executed.In general, computer executable instructions include executing specific function or realization specific abstract number According to the routine of type, programs, objects, component, data structure etc..It operates the sequence being described and is not intended to and be interpreted to limit, and And in any order and/or any amount of described frame can be combined in parallel to realize the process.For the mesh of discussion , these processes are described with reference to Fig. 4, but can realize in other systems framework.
With reference to Fig. 5, shows and executed by data de-duplication using 420 (referring to fig. 4) (hereinafter also referred to " using 420 ") Process 500 flow chart.In one implementation, process 400 executes in calculating equipment, such as intermediate computing device 308 (Fig. 3).When being executed by processing equipment, processor 404 and module 416-438 shown in Fig. 4 are used using 420.
In frame 502, calculate application 420 in equipment 308 via network 304 from the calculating equipment of remote arrangement (for example, Equipment hosts applications 302) receive one or more files.
In frame 503, the received file of institute is divided into data object by application 420, is data object or part thereof creation Hashed value, and hashed value is stored in the memory in intermediate computing device (for example, external computing device or system 312) In storage table.
In frame 504, application 420 is stored one or more files to remote arrangement by network 310 via API 311 Storage system 312 on.
In frame 505, optionally, the API in system 312 by hashed value and identifies storing data object in system 312 The corresponding location address of network site is stored in the record of the storage table in the system of being arranged in 312.
In frame 518, for each of one or more data objects, application 420 is by one or more hashed values It is stored in corresponding one or more network sites address and is arranged in intermediate equipment 308 or aided remote storage system (not Show) on storage table one or more records in.Using 420 also by the title of file received in frame 502 and in frame The location address created at 505 is stored in file table (Figure 11).
In one implementation, for each of one or more data objects, the one or more of storage table One or more hashed values of records store data object and corresponding one or more location address, without will be identical Data object is stored in the storage system of one or more remote arrangements.In another implementation, one or more is hashed Value is sent to the storage system of remote arrangement to store together with one or more data objects.Hashed value and one corresponding Or multiple new location address can store in one or more records of storage table.In addition, one or more data objects can To be collectively stored in one or more positions in the storage system of one or more remote arrangements with one or more hashed values At address.
In frame 520, application 420 receives another in one or more files from networked computing device.
In frame 522, in response to receiving the one or more including one or more data objects from networked computing device Another in file, application 420 by by one or more hashed values of data object be stored in one of storage table or One or more hashed values in multiple records are compared, to determine whether one or more data objects are previously stored in one In the storage system 312 of a or multiple remote arrangements.
In one implementation, application 420 can be by including reading to be stored in the storage system of remote arrangement Instructing to carry out data de-duplication to the data object being previously stored in any storage system for one or more files, will One or more files are divided into one or more data objects, and create one or more of one or more file data objects A hashed value.Once creating hashed value, then one or more data objects can be stored in using 420 one or more remote It, will for each of one or more data objects at one or more location address in the storage system of journey arrangement One or more hashed values and corresponding one or more location address are stored in one or more records of storage table, and And in response to being received from networked computing device include another in one or more files of one or more data objects, lead to It crosses and dissipates one or more hashed values of data object with one or more of the one or more records for being stored in storage table Train value is compared, to determine whether one or more data objects are previously stored in the storage system of one or more remote arrangements On system.The location address of the filename and data de-duplication data object (come from the first file) of file and from file only The location address of one data object is collectively stored in file table (Figure 11).
With reference to Fig. 6, the alternate embodiment of system architecture diagram is shown, is shown for storing with data de-duplication The process 600 of data object.The application 420 in intermediate computing device 308 shown in Fig. 3 can be used to realize process 600.
In block 602, which includes applying (for example, using 420), from client (for example, " client in Fig. 1 System ") receive storage object (for example, file) request.The request generally includes object key (for example, filename), object Data (byte stream) and some metadata.
In block 604, data flow is divided using block partitioning algorithm blocking.In one implementation, block is divided Variable-length block can be generated in algorithm, the algorithm as described in U.S. Patent number 5,990,810 (its is incorporated herein by reference), Perhaps the fixed-length block of predefined size can be generated or the height for generating and having and matching with stored piece can be used generally Some other algorithms of the block of rate.When finding block boundary in a stream, next stage is sent by block.The block almost can be with It is any size.
In frame 606, the encryption of such as MD5, SHA1 or SHA2 (or one of other previously mentioned algorithms) etc are used Hashing algorithm hashes each piece.Preferably, constraint is that the identical probability of hash of different masses must be very low.
In block 608, each of the data block location encountered in the storage of cloud object is searched in table mapping block hash Data block hashes (for example, Hash location table (hash-to-location table)).If finding hash, the block position is recorded It sets, abandon data block and runs frame 616.If not finding hash in table, in block 610, calculated using lossless Text compression Method is (for example, Deflate United States Patent (USP) 5,051,745 or LZW United States Patent (USP) 4, algorithm described in 558,302, content pass through Reference is hereby incorporated by) compression data block.
In frame 612, data block is optionally aggregated into a series of larger aggregated data blocks to realize efficient storage.In frame In 614, then block (or polymeric block) storage is stored in 618 (" storages of cloud object " 312 in Fig. 3) to underlying object.It is depositing Chu Shi, by naming data block to be ranked up them with the number of monotonic increase in object storage 618.
In block 616, after being stored in data block in cloud object storage 618, Hash location table is updated, by each piece Hash and its position be added to cloud object storage 618 in.
Hash location table (quoting herein and in block 608) is stored in database (for example, database 620), the data Library is stored in again in quick, the insecure storage device for being directly attached to receive the computer of request.Block position is using as follows Form: the length of offset and block of the number, block of the polymeric block stored in frame 614 in polymerization;Alternatively, being deposited in frame 614 The number of the block of storage.
In block 616, the list of the network site from block 608-614, which can store, to be directly attached to receive request Object key position table (object-key-to-location table) in quick, insecure storage device of computer In (Figure 11).Preferably, identical monotone increasing nomenclature scheme is recorded by object key and the storage of block position to cloud using with block In object storage 618.The each file for being sent to system is identified by object key.For each file, object key position Table includes the list of the position of the block of composing document.Each of these positions are all referred to as " reference " to relevant block.It dissipates Column position table is independently of object key position table.It includes the entry of storage each of in systems piece, and no matter whether it is right As being cited in cipher key location table.
Then, which may return to frame 602, wherein and response is sent to client device (mentioning in block 602), Designation date object is stored.
Fig. 7, which is shown, deletes storage progress repeated data for across a network by what client application 302 (referring to Fig. 3) was realized The example process 700 removed.Such example process 700 can be the set of the frame in logical flow chart, and indicating can be with With the sequence for the operation that hardware, software and combinations thereof are realized.In the context of software, frame indicates computer executable instructions, When executed by one or more processors, the operation is executed.In general, computer executable instructions include executing specific function Or realize routine, programs, objects, component, the data structure etc. of particular abstract data type.The sequence being described is operated to be not intended to It is interpreted to limit, and in any order and/or any amount of described frame can be combined in parallel to realize the mistake Journey.For purposes of discussion, these processes are described with reference to Fig. 3, but can be realized in other systems framework.
In block 702, the preparation of client application 302 is used for transmission intermediate computing device 308 with storing data object Request.In block 704, client application 302 sends intermediate computing device 308 for data object with storing data object.
In frame 706, process 500 or 600 is executed by equipment 308 with storing data object.
In frame 708, client application receives designation date object stored response notice from intermediary computing system.
With reference to Fig. 8, the exemplary aggregated data object 800 generated by frame 612 is shown.Data object includes head 802n-802nm and block number 804n-804nm and offset instruction 806n-806nm, and including data block.
With reference to Fig. 9, show for one group stored in memory exemplary aggregated data object 902a-902n.Number It respectively include head (for example, 904a) (in conjunction with as described in Fig. 8) and data block (for example, 906a) according to object 902a-902n.
With reference to Figure 10, hash (for example, H1-H8) (being stored in individual data de-duplication table) and two lists are shown Exemplary relation between only data object D1 and D2.Respectively, the part in the block B1-B4 of file D1 is to hash H1-H4 It indicates, and the part in block B1, B2, B4, B7 and B8 of file D2 is indicated with hashing H1, H2, H4, H6, H7 and H8.Note that The part of data object with same hashed value is merely stored in memory once, and its storage location in memory It is recorded in together with hashed value in data de-duplication table.
With reference to Figure 11, table 1100 is shown, wherein the filename (" filename 1 "-" filename N ") of file is with it for text The data object of the network site address of part is collectively stored in file table.The exemplary data object of filename 1 is stored in net At network location address 1-5.The exemplary data object of filename 2 is stored at location address 6,7,3,4,8 and 9." filename 2 " Data object be stored in " filename 1 " share location address 3 and 4 at." filename 3 " is the clone of " filename 1 ", altogether Enjoy the data object at location address 1,2,3,4 and 5." filename N " and " filename 1 " and " filename 2 " sharing position address 7, the data object at 3 and 9.
Figure 12 is shown by server 309a-309n (referring to Fig. 3 a) and garbage collection coordinator module 438 (Fig. 4) realization For across a network to storage carry out data de-duplication and garbage collection example process 1200.Server 309a-309n One of in garbage collection coordinator module 438 be designated as arranging appointing for load balancer forwarding " start garbage collection " request The garbage collection process of what server.This below with " GC coordinator " will be abbreviated as in Figure 12 to 15.Such exemplary mistake Journey 1200 can be the set of the frame in logical flow chart, indicate that the operation that hardware, software and combinations thereof are realized can be used Sequence.In the context of software, frame indicates computer executable instructions, when executed by one or more processors, executes The operation.In general, computer executable instructions include routine, the journey for executing specific function or realizing particular abstract data type Sequence, object, component, data structure etc..It operates the sequence being described and is not intended to and be interpreted to limit, and can be in any order And/or any amount of described frame is combined in parallel to realize the process.For purposes of discussion, it is described with reference to Fig. 3 a These processes, but can be realized in other systems framework.
Each key fragment is assigned to the particular server from 309a to 309n, referred to as the key sliced service of the fragment Device.Each piece of fragment is assigned to the particular server from 309a to 309n, referred to as the block sliced service device of the fragment.In order to make It is discussed below brief and concise, we mean that sending a message to " block fragment " or " key fragment ".In each case, Message is actually sent to the key sliced service device or block sliced service device (309a-309n) of the fragment, then the message The key fragment component or block fragment component of the fragment in the server are routed in inside.Reference mapping is for record pair The data structure of one group of reference of specific piece position, to determine which block is " being used ", relative to that can be deleted A little blocks.Various data structures can be used to realize reference mapping.
GC coordinator sends message to each key fragment, to start the operation of the tracking to the key fragment.Each request All by the block range information comprising each piece of fragment.Tracking operation will find correspondence in all pieces of fragments prevents these blocks from being deleted The all references of the block removed.
Specifically, in frame 1202, cluster is extended out via load balancer arrival to the incoming request for starting garbage collection. In frame 1202, each piece of fragment (in server 309a-309n) prepared by hair message informing garbage collection (referring to 1402)。
In frame 1204, GC coordinator waits from each piece of fragment and receives " confirmation gets out garbage collection " message (ginseng See 1406).This message is by the block range comprising fragment.
In frame 1206, to each key fragment (in server 309a-309n) send message with start tracking (referring to 1302), and in frame 1208, coordinator waits the confirmation of the tracking completion from each key fragment (referring to 1306).
In frame 1210, coordinator sends message to each piece of fragment to execute compression (referring to 1414).
In frame 1212, coordinator waits the confirmation completed of the compression from each piece of fragment (referring to 1416).
Figure 13, which is shown, to be used for by what the key fragment module in server 309a-309n (Fig. 3 a) was realized in across a network The example process 1300 of tracking operation is executed during garbage collection process.Such example process 1300 can be logic flow The set of frame in journey figure indicates the sequence for the operation that can be realized with hardware, software and combinations thereof.In the context of software In, frame indicates that computer executable instructions execute the operation when executed by one or more processors.In general, computer Executable instruction includes routine, the programs, objects, component, data knot for executing specific function or realizing particular abstract data type Structure etc..It operates the sequence that is described and is not intended to and is interpreted to limit, and can in any order and/or be combined in parallel any The described frame of quantity is to realize the process.For purposes of discussion, these processes are described with reference to Fig. 3 a, but can be at it It is realized in his system architecture.
Key sliced service device executes following tracking process:
A) it separates with mapping for each piece of fragment establishment portion to record the reference found.Each piece of reference (still using) Position as a part of file be recorded in reference mapping in.Purpose is to find the block that is no longer cited so as to delete They.Key sliced service device tracks each entry in the object key position table of each fragment, and collects all references.It can The block no longer needed is searched (because of the file for quoting them so that reference to be compared with the list of the block managed It is removed).
B) key fragment traverses the object key position table for all objects that it is managed, and will exist to each reference record of block In reference mapping in part appropriate
C) after key fragment completes record reference, its corresponding piece of fragment is sent by the reference mapping of each part Server.
D) after sending all references mapping, key sliced service device responds GC coordinator, confirms the key fragment Tracking operation is completed.
Specifically, in frame 1302, in incoming message of the waiting from garbage collection coordinator (referring to 1206) to start After process 1300, track all object keys in the key fragment, and use object key position table (referring to Figure 11) The reference mapping of each piece of fragment is constructed, and is stored in the reference mapping of part.
In frame 1304, key fragment reads the part reference mapping of each piece of fragment, and each part is quoted and is mapped It is sent to corresponding piece of fragment (referring to 1410).
In frame 1306, the confirmation for tracking completion is sent to garbage collection coordinator (referring to 1208).Once completing institute There is tracking to operate, garbage collection coordinator can start squeeze operation.
Figure 14 shows the rubbish being used in across a network realized by the block fragment module in server 309a-309n (Fig. 3 a) The example process 1400 of squeeze operation is executed during rubbish collection process.Such example process 1400 can be logic flow The set of frame in figure indicates the sequence for the operation that can be realized with hardware, software and combinations thereof.In the context of software In, frame indicates that computer executable instructions execute the operation when executed by one or more processors.In general, computer Executable instruction includes routine, the programs, objects, component, data knot for executing specific function or realizing particular abstract data type Structure etc..It operates the sequence that is described and is not intended to and is interpreted to limit, and can in any order and/or be combined in parallel any The described frame of quantity is to realize the process.For purposes of discussion, these processes are described with reference to Fig. 3 a, but can be at it It is realized in his system architecture.
For each piece of fragment, corresponding piece of sliced service device executes following procedure:
A the current largest block position of fragment) is recorded.It is that GC operation will that define the block position range of the fragment The set for the block position covered.
B the blank for) creating coverage block range quotes mapping.The part reference mapping generated during tracking operation will merge Into this reference mapping.
C) block sliced service device responds GC coordinator, confirms that it is now ready for for GC and provides about GC operation The information of the block range covered.
For each piece of fragment, block sliced service device will be quoted from each key server receiving portion to be mapped, it includes The tracking operating result of the key server.Each incoming part reference mapping and the existing reference of block fragment are mapped and are closed And it to provide more to the reference of block.Once the part reference mapping from all key sliced service devices is received and merges, It is resulting to map the full list comprising the reference to the block in the block fragment (in block position range).
Specifically, in frame 1402, block fragment module waits the incoming message from GC coordinator, and defines rubbish receipts Collect the block position range of operation, quotes Hash location table.
In frame 1404, the creation blank reference mapping in reference mapping table of block fragment module, and in frame 1406, block Fragment module is sent to GC coordinator to be confirmed.
In frame 1408, block fragment module wait from each key fragment incoming part reference mapping (referring to 1304), then, in frame 1410, the part reference mapping of each entrance is merged into the existing reference mapping of fragment.Make In the case where realizing reference mapping with bitmap, by executing step-by-step OR operation to the corresponding position of each of two bitmaps come real Existing union operation, to merge two groups of references.
In frame 1412, it is determined whether receive incoming part reference mapping from all key fragments.If it is not, then Repeat block 1408-1410.If all incoming reference mappings have been received, and receives and " open from GC coordinator Begin to compress " message (referring to 1210), then in frame 1414, (more details are referring to figure for execution data compression in the storage of cloud object 15)。
In the storage of cloud object after compressed data, in frame 1416, send an acknowledgement to GC coordinator (referring to 1212).
Figure 15, which is shown, to be used for by what the block fragment module in server 309a-309n (Fig. 3 a) was realized in the squeeze operation phase Between compression cloud object storage in data example process 1500, especially for garbage collection process frame 1414 (figure 14).Such example process 1500 can be the set of the frame in logical flow chart, indicate can with hardware, software and The sequence for the operation that a combination thereof is realized.In the context of software, frame indicates computer executable instructions, when by one or more When processor executes, the operation is executed.In general, computer executable instructions include executing specific function or realization specific abstract Routine, programs, objects, component, data structure of data type etc..It operates the sequence being described and is not intended to and be interpreted to limit, And in any order and/or any amount of described frame can be combined in parallel to realize the process.For discussion Purpose describes these processes with reference to Fig. 3 a and Figure 14, but can realize in other systems framework.
For each piece of fragment, block sliced service device executes following compression process: block sliced service device is traversed by fragment pipe Each rear end object in the cloud object storage of reason.Each rear end object may include one or more data blocks, therefore can be with Across multiple positions in block fragment.
Following procedure can be used and compress each rear end object:
A. check reference mapping with which position determined in the object of rear end is cited and which position is no longer drawn With.
B. change rear end object in the storage of cloud object, to remove block number evidence from the position being no longer cited.Only retain The block number evidence being still cited.
C. Hash location table is updated to remove the entry in the removed block of compression process device.
D. after each rear end object in the cloud object storage of the fragment is compressed, the reference of block fragment can be deleted Mapping.
E. block sliced service device responds GC coordinator, confirms that the squeeze operation of the block fragment is completed.
Specifically, in frame 1502, after the incoming message for waiting the compression fragment from GC coordinator (referring to 1210) the rear end object, to be compressed using Hash location table determination.
In frame 1504, which of rear end object block is determined still using the information from Hash location table and reference mapping So it is cited.
In frame 1506, modifies rear end object or be rewritten in the storage of cloud object to remove not used piece.It can be with Rear end object is modified, or rear end object can be rewritten by writing the object for the new version for replacing legacy version.The new version Object the data block no longer needed is omitted.
For example, if rear end object includes illustrated blocks 1,2,3,4,5 and 6, and system determines that block 3 and 4 is no longer drawn With and can delete, then system will re-write rear end object, so that it only includes block 1,2,5 and 6.This changes memory block Offset in 5 and 6 rear end object;They are now closer to the beginning of rear end object.The offset of block 1 and 2 does not change.Rear end pair As required amount of storage reduction, because it no longer includes block 3 and 4.
Each position is the offset in specific back end object.(for example, the fragment 5 of the beginning from object, object number 1, 234,567,20,000 bytes are deviated).In one implementation, this is that the byte of composition data block is stored in object storage Position.
In frame 1508, Hash location table is updated to remove the entry of the block removed from the storage of cloud object.
In frame 1512, determines for compressed data processing, whether there is more rear end objects in block position range. If there is more rear end objects, then repeat block 1504- frame 1508.If the process is completed without other objects.
Although detailed description above is had been shown, described and identified applied to the of the invention several of preferred embodiment Novel feature, it is to be understood that those skilled in the art can be without departing from the spirit of the invention, to described The form and details of embodiment carry out various omissions, substitutions and changes.Therefore, the scope of the present invention should not necessarily be limited by discussed above, And it should be defined by the following claims.

Claims (15)

1. a kind of data in memory for executing garbage collection to compress one or more Multi net voting capable servers Method, comprising:
One or more rear end objects are stored in object storage;
The data in the reference mapping of object storage are created, after indicating one or more of in the object storage Hold which position in object currently by which in object key position table reference and one or more of rear end objects Position is no longer cited;
Change one or more of rear end objects in object storage, with out of one or more of rear end objects not The position being cited again removes block number evidence;And
Hash location table is updated to remove in table with removed block number according to corresponding entry.
2. according to the method described in claim 1, further including being quoted using the Hash location table in the object storage Position in the rear end object.
3. according to the method described in claim 1, further including determining which position includes to work as in the rear end object by operation The tracking process of the preceding data being still cited, it is current to identify which position in the rear end object in the object storage It is cited and which position is no longer cited.
4. according to the method described in claim 3, wherein, the tracking process includes:
It separates for each piece of fragment establishment portion with mapping, to record found reference;
It is iterated in each key fragment by the object key position table of the object by key management by district, and And for each of object key position table piece position is appeared in, record reference in mapping is quoted in the part;With And
Corresponding block sliced service device is sent by part reference mapping.
5. according to the method described in claim 1, further include:
Have been used to update the Hash location table in reference mapping to remove in table and from object storage After the block number of removal is according to corresponding all entries, the reference mapping is deleted.
6. according to the method described in claim 4, further include:
The reference from each key fragment is collected using described piece of sliced service device to map;And
The block being no longer cited is removed using described piece of sliced service device.
7. a kind of execution garbage collection is with the system of compressed data, the system comprises:
Object storage, stores rear end object;
One or more Multi net voting capable servers, including memory;
Reference mapping, reference mapping create in the memory, the reference map be used to indicate be stored in it is described right As which position in the rear end object in storage is currently cited, and the rear end pair being stored in object storage As which interior position is no longer cited;
For changing the rear end object being stored in object storage with described in be stored in the object storage The position being no longer cited in the object of rear end removes the circuit of block number evidence;And
For removing the entry of the position of block number evidence removed in the mark rear end object in Hash location table Circuit.
8. system according to claim 7, further includes:
For in removing the Hash location table with removed block number according to corresponding all entries after delete institute State the circuit of reference mapping.
9. system according to claim 7, further includes:
For running the circuit of tracking process, which position in rear end object described in the tracking process identifier include it is current still The data being cited and which position are no longer cited.
10. system according to claim 9, wherein the circuit for running the tracking process includes:
For separating the circuit for recording the reference found with mapping by each piece of fragment establishment portion;
Circuit for performing the following operations: by the object key position table of the object by key management by district come to institute It states key fragment to be iterated, and for each of object key position table piece position is appeared in, in the part Record reference in reference mapping;And
For sending part reference mapping in the circuit of corresponding block sliced service device.
11. a kind of device, comprising:
At least one non-state medium, for being executed by the processor in server, at least one described non-state medium is at least Include:
For the data in creating the reference mapping of memory to indicate which position in the rear end object in object storage is worked as Before be cited and which position is no longer cited one or more instructions;
For changing the rear end object in object storage to remove one of block number evidence from the position being no longer cited Or multiple instruction;And
The Hash location table of position for the block number evidence in rear end object described in more new logo with remove identified in table by One or more instructions of the entry of the position of the block number evidence of removal.
12. device according to claim 11, wherein at least one described non-state medium includes at least:
The instruction of the position in the rear end object in the object storage is quoted for using the Hash location table.
13. device according to claim 12, wherein at least one described non-state medium includes at least:
Instructions for performing the following operations: by operation tracking process instruction with which position in the determination rear end object Comprising the data being currently still cited, currently it is cited to identify which position in the rear end object in object storage, And which position is no longer cited.
14. device according to claim 12, wherein the tracking process instruction includes:
For separating the instruction for recording the reference found with mapping by each piece of fragment establishment portion;
Instructions for performing the following operations: by the object key position table of the object by key management by district come to close Key fragment is iterated, and for each of object key position table piece position is appeared in, is quoted in the part Record reference in mapping;And
For sending part reference mapping to the instruction of corresponding block sliced service device.
15. device according to claim 11, wherein at least one described non-state medium includes at least:
For in response to updating the Hash location table to remove in table with removed block number according to corresponding all Mesh and delete it is described reference mapping instruction.
CN201780073649.8A 2016-11-29 2017-11-29 Garbage collection system and process Pending CN110226153A (en)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US201662427353P 2016-11-29 2016-11-29
US62/427,353 2016-11-29
US15/825,073 US20180107404A1 (en) 2015-11-02 2017-11-28 Garbage collection system and process
US15/825,073 2017-11-28
PCT/US2017/063673 WO2018102392A1 (en) 2016-11-29 2017-11-29 Garbage collection system and process

Publications (1)

Publication Number Publication Date
CN110226153A true CN110226153A (en) 2019-09-10

Family

ID=62242710

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201780073649.8A Pending CN110226153A (en) 2016-11-29 2017-11-29 Garbage collection system and process

Country Status (3)

Country Link
EP (1) EP3532939A4 (en)
CN (1) CN110226153A (en)
WO (1) WO2018102392A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112597070A (en) * 2020-11-16 2021-04-02 新华三大数据技术有限公司 Object recovery method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060173939A1 (en) * 2005-01-31 2006-08-03 Baolin Yin Garbage collection and compaction
CN102567218A (en) * 2010-12-17 2012-07-11 微软公司 Garbage collection and hotspots relief for a data deduplication chunk store
US8224875B1 (en) * 2010-01-05 2012-07-17 Symantec Corporation Systems and methods for removing unreferenced data segments from deduplicated data systems
US20130138902A1 (en) * 2011-11-30 2013-05-30 International Business Machines Corporation Optimizing Migration/Copy of De-Duplicated Data

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7085789B1 (en) * 2001-07-30 2006-08-01 Microsoft Corporation Compact garbage collection tables
US20080270436A1 (en) * 2007-04-27 2008-10-30 Fineberg Samuel A Storing chunks within a file system
US8880775B2 (en) * 2008-06-20 2014-11-04 Seagate Technology Llc System and method of garbage collection in a memory device
US8930648B1 (en) * 2012-05-23 2015-01-06 Netapp, Inc. Distributed deduplication using global chunk data structure and epochs
US9208080B2 (en) * 2013-05-30 2015-12-08 Hewlett Packard Enterprise Development Lp Persistent memory garbage collection
US9268806B1 (en) * 2013-07-26 2016-02-23 Google Inc. Efficient reference counting in content addressable storage

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060173939A1 (en) * 2005-01-31 2006-08-03 Baolin Yin Garbage collection and compaction
US8224875B1 (en) * 2010-01-05 2012-07-17 Symantec Corporation Systems and methods for removing unreferenced data segments from deduplicated data systems
CN102567218A (en) * 2010-12-17 2012-07-11 微软公司 Garbage collection and hotspots relief for a data deduplication chunk store
US20130138902A1 (en) * 2011-11-30 2013-05-30 International Business Machines Corporation Optimizing Migration/Copy of De-Duplicated Data

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
贾威威等: "云存储日志文件系统中垃圾数据回收的设计与实现", 《计算机应用与软件》 *
贾威威等: "云存储日志文件系统中垃圾数据回收的设计与实现", 《计算机应用与软件》, no. 08, 15 August 2016 (2016-08-15) *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112597070A (en) * 2020-11-16 2021-04-02 新华三大数据技术有限公司 Object recovery method and device
CN112597070B (en) * 2020-11-16 2022-10-21 新华三大数据技术有限公司 Object recovery method and device

Also Published As

Publication number Publication date
EP3532939A1 (en) 2019-09-04
WO2018102392A1 (en) 2018-06-07
EP3532939A4 (en) 2020-06-17

Similar Documents

Publication Publication Date Title
TWI682274B (en) Key-value store tree
CN102591946B (en) It is divided using index and coordinates to carry out data deduplication
US9262280B1 (en) Age-out selection in hash caches
US20120089579A1 (en) Compression pipeline for storing data in a storage cloud
US8443000B2 (en) Storage of data with composite hashes in backup systems
US20120089775A1 (en) Method and apparatus for selecting references to use in data compression
US9298726B1 (en) Techniques for using a bloom filter in a duplication operation
US7454405B2 (en) File management program, file management process, and file management apparatus
JP2020038623A (en) Method, device, and system for storing data
US8738572B2 (en) System and method for storing data streams in a distributed environment
US8706710B2 (en) Methods for storing data streams in a distributed environment
US20120005307A1 (en) Storage virtualization
AU2013210018B2 (en) Location independent files
KR20170054299A (en) Reference block aggregating into a reference set for deduplication in memory management
US20180107404A1 (en) Garbage collection system and process
CN107958079A (en) Aggregate file delet method, system, device and readable storage medium storing program for executing
US11734229B2 (en) Reducing database fragmentation
CN108415671B (en) Method and system for deleting repeated data facing green cloud computing
CN104915270B (en) The system and method for synchronization history data for compression & decompression
US11860739B2 (en) Methods for managing snapshots in a distributed de-duplication system and devices thereof
WO2023060046A1 (en) Errors monitoring in public and private blockchain by a data intake system
CN110226153A (en) Garbage collection system and process
JP2015176407A (en) Search device, search method, search program and search data structure
JP6193491B2 (en) Computer system
US20240036983A1 (en) Server-side inline generation of virtual synthetic backups using group fingerprints

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination