CN101183323A - Data stand-by system based on finger print - Google Patents

Data stand-by system based on finger print Download PDF

Info

Publication number
CN101183323A
CN101183323A CNA2007101687158A CN200710168715A CN101183323A CN 101183323 A CN101183323 A CN 101183323A CN A2007101687158 A CNA2007101687158 A CN A2007101687158A CN 200710168715 A CN200710168715 A CN 200710168715A CN 101183323 A CN101183323 A CN 101183323A
Authority
CN
China
Prior art keywords
backup
job
server
file
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA2007101687158A
Other languages
Chinese (zh)
Other versions
CN100547555C (en
Inventor
冯丹
刘景宁
杨天明
周可
牛中盈
张航
刘高
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CNB2007101687158A priority Critical patent/CN100547555C/en
Publication of CN101183323A publication Critical patent/CN101183323A/en
Application granted granted Critical
Publication of CN100547555C publication Critical patent/CN100547555C/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention relates to a data backup system based on fingerprints, belonging to the technical field of computer storage backup, which aims at reducing management, storage and network overhead of data backup and improving backup performance. The invention comprises a backup server, a backup agent, a storage server and a Web server which complete data backup and recovery through mutual network communication. The invention is characterized in that redundant data of backup files is recognized with the file segmentation technology based on anchors, thus the invention has the advantages that the modification stability is good and computation cost is low; data segmentations with fingerprints as the index are stored on a plurality of disk arrays of the storage server so as to eliminate backup of redundant data and save disk storage space; the data segmentations are not erased once stored and can be continuously appended on the disk so as to eliminate disk storage fragments; since the effective backup buffer strategy is adopted, the invention also has the advantages of reduced network overhead of backup, increased data backup speed and lowered backup influence to application servers.

Description

A kind of data backup system based on fingerprint
Technical field
The invention belongs to Computer Storage backup field, be specifically related to a kind of data backup system.
Background technology
In the information age of current this knowledge explosion, no matter concerning enterprise still the individual, data all are precious resources.Loss of data gently then influences the business event continuity, makes it lose competitive edge for the moment, heavy business failure is closed down.The reason that causes loss of data is a lot, comprises system hardware and software fault, human operational error or destruction and irresistible power (disaster, war) etc.For protected data exempts from accident, traditional method is periodically to copy data to movably media such as on tape, the CD, and then off-line is transported to a comparatively safe place so that recover these data where necessary.Should be understood that; there are some significant disadvantages in this traditional data guard method: (1), movably storage medium can occur wearing and tearing or damages the long-term storage media that makes its memory reliability reduce thereby be not suitable for doing data such as tape, CD etc. as time goes by.(2), the tape of the storage medium commonly used of conduct backup Large Volume Data; its read or write speed is often very slow, and owing to be sequential storage device, can occur frequent mechanical rewinding operation usually when restore data; if Backup Data is distributed on several tapes, the charge and discharge operations when also costing.This makes that utilizing tape to carry out data backup and recover is a job quite consuming time.(3), need employ the special messenger Backup Data is transported to remote site, and guarantee data security in transportation and the storage process.This shows that traditional data backup needs artificial the intervention to finish many tasks, is of a high price, a loaded down with trivial details job.In order to improve the efficient of data backup and recovery, overcome the shortcoming of traditional data protection technology, recent two decades comes, and some well-known IT enterprises or research institution have developed panoramic data backup system in the world.The TotalStorage that comprises IBM, OpenView memory image software, CASA, XPCA and the EVACA of HP, the SRDF of EMC and MirrorView, NetBackup of VERITAS or the like.These business systems do not have the data de-duplication function, in order to be stored in the mass of redundancy data that produces in the backup, often need to use disk to tape (D2T) technology, promptly use hyperdisk as backup buffers to improve online backup efficient, on the backstage Backup Data in the disk buffer is moved on the jumbo storage mediums of low speed such as tape library or CD server then, carry out daily servicing so its backstage memory device still needs to expend lot of manpower and material resources.Because disk storage has advantages such as convenient management, access speed be fast than tape storage, along with the development of disk storage technology, uses the standby system of disk storage data more and more to come into one's own.Present disk storage technology can be easy to build the disk storage system of a TB even PB level.The price of every bit disk storage more and more cheaply makes utilizes the permanent filing data of the disk reality that becomes.For a data backup system based on disk; the Backup Data permanent storage is not wiped in disk has many advantages: at first; data can be write on the disk continuously; can not produce disk fragments because of space reclamation, secondly, user's data history obtains complete preservation; the user is arbitrary old version of browser document easily; the 3rd, help protecting user's Backup Data, avoid user misoperation and deleted important data.Yet for the standby system based on disk of a permanent storage, maximum challenge derives from the ever-increasing Backup Data of user.Usually, the data of enterprise have the redundancy of height, and also there are a large amount of repeated content in a large amount of data that repeat and file storage between a plurality of edit versions of a file in system.Widely used redundancy technique based on file can not be discerned the redundant data between the file at present, cause increasing repeating data to backup in the system, not only reduced the disk space utilization factor of standby system, and for no reason by Network Transmission mass of redundancy data, increase the network overhead of data backup, prolonged the data backup time.
This shows, the standby system based on disk of a permanent storage of exploitation, and adopt new data backup technology to remove the redundant data of backup, and improve the storage efficiency of system, have positive effect.
Summary of the invention
The present invention proposes a kind of data backup system based on fingerprint, system adopts disk permanent storage Backup Data and adopts data backup technology based on fingerprint with the redundant data in the deletion backup, purpose is to reduce management, storage and the network overhead of data backup, improves backup performance.
A kind of data backup system based on fingerprint of the present invention comprises backup server, backup agent, storage server and Web server, and they intercom mutually by network and finish data backup and recovery, it is characterized in that:
Described backup server is equipped with configuration file and catalog data base, the manipulating object of recording user definition in the configuration file of backup server, manipulating object comprises the attribute of appointing system operation task operation, and backup server is being controlled the process of whole data backup and recovery by manipulating object; Catalog data base storage operation record, charge book is preserved the management information of manipulating object operation;
Described backup agent unit is installed on that each needs on the main frame of Backup Data in the network, from the file system of place main frame, read the file that needs backup by the backup agent unit during backup, file is carried out based on the piecemeal of anchor and calculates the fingerprint of piecemeal, and the block data that fingerprint and part are needed is sent to storage server by network; The backup agent unit receives file data and writes the file system of place main frame under the designated directory from storage server by network during recovery;
Described storage server is equipped with the large capacity disc array, the large capacity disc array is the destination of data backup, receive fingerprint or deblocking from corresponding backup agent unit by network during backup, deblocking is stored on the disk, and set up the index of file; During recovery then from the large capacity disc array according to file index reconstruct file, and file data delivered to corresponding backup agent by network;
Described Web server is the B-S pattern webpage subscriber administration interface of native system, by the login Web server, the user both can appointing system finishes the ruuning situation of interactively back up or restore operation, the operation of surveillance scheduling type automatically, can also revise configuration file, the customization manipulating object of backup server, carry out equipment control.
Described data backup system based on fingerprint is characterized in that, described backup server comprises backup server initialization module, order monitoring module, command processing module, operation processing module and network communication module;
Described backup server initialization module is carried out initial work, comprises reading configuration file, set up resource chained list in the internal memory, check catalog data base state, the data consistency that guarantees configuration file and catalog data base and integrality, startup command policing port, accepting user command, initialization job queue and user command formation, load operations object, initiating task and network monitoring service in job queue from Web server;
It is a network monitoring thread that is generated by system that module is monitored in described order, connection request to Web server authenticates, assurance has only the Web server ability connected system through system authorization, monitors the command request of having sent by the Web server that authenticates; Receive orders when asking, command request is joined in the user command formation wait for system handles;
Described command processing module comprises a user command formation and N command job thread, and when the user command formation was overflowed, order was monitored module and changed sleep state over to; Constantly reading order and the execution from the user command formation of these command job threads finished different functions according to the difference of performed order; When order is monitored module adds an order in the user command formation,, just generate a new command job thread if when the number of current command job thread that does not have a free time and active command job thread does not reach N; The command job thread all checks from the user command formation that at every turn order monitors the state of module during reading order, if it is in sleep state then wakes it up;
Described operation processing module comprises that a job queue, a L work operations thread and a job queue load thread, and when the operation formation was overflowed, job queue loaded thread and enters sleep state; The work operations thread is constantly got manipulating object and is carried out from job queue, call different resources, realize different functions according to the difference of manipulating object attribute; Job queue loads thread and carries out job scheduling, check the scheduling strategy attribute of each manipulating object in the operation resource chain, the manipulating object that needs management and running is added in the job queue, if when the current number that does not have idle work operations thread and an active work operations thread does not reach L, just generate a new work operations thread; The work operations thread all checks from job queue that at every turn job queue loads the state of thread during the reading operation object, if it is in sleep state then wakes it up;
Described network communication module encapsulates the network communication applications DLL (dynamic link library) of standard, provide network communication interface to command job thread and work operations thread, network communication interface is realized the Data Transport Protocol between backup server, backup agent and the storage server.
Described data backup system based on fingerprint is characterized in that, described backup agent comprises backup agent initialization module, request monitoring module, operation processing module, file block module and network communication module;
Described backup agent initialization module is carried out initial work, comprises reading the backup agent configuration file, setting up the memory source chained list, the initialization job queue, start backup server requests and monitor module;
Described request is monitored the connection request that module is monitored backup server on the network, authenticates the backup server of connection, and authentication is communicated by letter with this backup server by a network connection of back generation socket and added in the job queue;
Described operation processing module comprises a job queue and M work operations thread, and when the operation formation was overflowed, request monitoring module changed sleep state over to; The work operations thread takes out a network connection socket from job queue after, at first set up a job control record for operation, network is connected the socket chain goes in the member variable of job control record, connect socket by this network then and backup server mutual, the relevant attribute of backup server manipulating object by conversion after assignment to the corresponding member variable of job control record; Use the operation bill ticket that obtains from backup server to connect corresponding storage server then, produce a network and be connected socket and it chain is gone in the member variable of job control record with storage server communication; When request monitoring module adds network when connecting socket in job queue,, just generate a new work operations thread if when the current number that does not have idle work operations thread and an active work operations thread does not reach M; The work operations thread is got the state of all checking request monitoring module when a network connects socket at every turn from job queue, if it is in sleep state then wakes it up;
Described file block module is accepted the file block task of the command execution backup job of work operations thread in the operation processing module, each file that on client file systems, opens file concentrated, file is carried out based on the piecemeal of anchor and calculates the piecemeal fingerprint and corresponding storage server coordinates to carry out the backup algorithm of first backup procedure;
Described network communication module is made up of the network connection socket of operation, and each operation of backup agent all has two networks and connects sockets, is respectively applied for the backup server operation and the storage server operation of this operation correspondence and communicates by letter.
Described data backup system based on fingerprint, it is characterized in that, described storage server comprises the storage server initialization module, connects monitoring module, operation bill table, operation processing module and network communication module, and index buffer zone, blocking and buffering district, piecemeal Hash table and Disk Logs;
Described storage server initialization module is carried out initial work, comprises resolving the storage server configuration file, sets up the memory source chained list, starts the related service thread;
The connection request of described connection monitoring module monitoring backup server and backup agent authenticates the backup server that connects, and authentication generates a network by the back and connects socket and communicate by letter with this backup server and add in the job queue; To the backup agent that connects, then check operation bill table so that it is authenticated according to its operation bill ticket that shows, authentication is communicated by letter with this backup agent by a network connection of back generation socket and is linked in the member variable of corresponding job control record;
Described operation bill table is used to store the bill that operation authenticates to backup agent;
Described operation processing module comprises a job queue and W work operations thread, when the operation formation is overflowed, connects monitoring module and changes " refusal backup server connection request " state over to; The work operations thread takes out a network connection socket from job queue after, at first set up a job control record for operation, network is connected the socket chain goes in the member variable of job control record, mutual by this network connection socket and backup server then, the relevant attribute of backup server manipulating object by conversion after assignment give the corresponding member variable of job control record, and generate an operation bill ticket at random and register in the operation bill table and and transmit this operation bill ticket to the backup server manipulating object; In job queue, adds a network when connecting socket when connecting monitoring module,, just generate a new work operations thread if when the number of current work operations thread that does not have a free time and active work operations thread does not reach W; The work operations thread is got from job queue at every turn and is all checked the state that connects monitoring module when a network connects socket, if it is in " refusal backup server connection request " state then cancels this state so that it accepts the backup server connection request;
Described network communication module is made up of the network connection socket of operation, and each operation of storage server all has two networks and connects sockets, is respectively applied for the backup server operation and the backup agent operation of this operation correspondence and communicates by letter;
Described index buffer zone is the infrastructure that first backup procedure and second backup procedure are carried out in the storage server operation, and the index buffer zone is realized with an internal memory Hash table, is used for storing this job instances of this activity chain Job x(t n) previous job instances Job x(t N-1) all fingerprints that comprise and newly-generated fingerprint in this job run process;
Described blocking and buffering district is the infrastructure that first backup procedure and second backup procedure are carried out in the storage server operation, the blocking and buffering district does not have found deblocking with an independently disk array realization in order to its fingerprint in interim storage first backup procedure in the index buffer zone;
Described piecemeal Hash table is the infrastructure that second backup procedure is carried out in the storage server operation, and the piecemeal Hash table is with an independently disk array realization, in order to set up the piecemeal fingerprint to the mapping of this piecemeal in the memory address of Disk Logs;
Described Disk Logs is the infrastructure that second backup procedure is carried out in the storage server operation, and Disk Logs is with an independently disk array realization, in order to the file index of storing deblocking and storing with block form.
Advantage of the present invention is:
1, adopt file block technology file to be divided into the piece of elongated size with the redundant data between identification file inside or the file based on anchor, has the stability of modification, a file modifying is only influenced data block adjacent in the modifier area, and the border of other data blocks can not be moved.When a file was carried out incremental backup, several data blocks of only revising needed backup like this, and other data block can be shared with former backup file; Use window to slide and calculate, computing cost is little.
2, deblocking is that index stores is on the disk array of storage server with its fingerprint, address data memory and relevance are got up, change the traditional concept that address data memory and content are separated, eliminated the backup of redundant data, saved disk storage space;
3, deblocking is in case storage is just no longer wiped, and deblocking can append on disk continuously, has eliminated the disk storage fragment; User's data history obtains complete preservation, and the user is arbitrary old version of browser document easily; Avoid user misoperation and deleted significant data.
4, adopt effectively backup buffering strategy, reduced the network overhead of backup, improved data backup speed, reduced the influence of backup application server.
Description of drawings
Fig. 1 is a structural representation of the present invention;
Fig. 2 is the backup server structural representation;
Fig. 3 is the backup agent structural representation;
Fig. 4 is the storage server structural representation;
Fig. 5 is the storage synoptic diagram of file on Disk Logs;
Fig. 6 is a plurality of file-sharing deblockings/index block synoptic diagram on the Disk Logs;
Fig. 7 is an index buffer zone structural drawing of the present invention;
Fig. 8 is in the file block technology based on anchor, the file block synoptic diagram.
Embodiment
The present invention is described in more detail below in conjunction with drawings and Examples.
1, system global structure
Fig. 1 is a system of systems synoptic diagram of the present invention, the present invention includes backup server, backup agent, storage server and Web server, and they intercom mutually by network and finish data backup and recovery.
Fig. 2 is the backup server structural representation; Backup server comprises backup server initialization module, order monitoring module, command processing module, operation processing module and network communication module; Configuration file and catalog data base also are housed.
Backup server is commander's maincenter of whole network backup system, and it is controlling the process of whole data backup and recovery by manipulating object.The manipulating object of backup server provides the window of a customization backup/resume operation to the user.Manipulating object has comprised many attributes, and these attributes have been specified system, and how operation task moves.Operation backup/restoration data from which platform main frame have been specified as the backup agent attribute; The file set attribute has specified operation to want the catalogue of backup/restoration; The scheduling strategy attribute has been specified strategy of this job run of system call or the like.Remember that a manipulating object is Job x, manipulating object produces a running example Job when moment t is scheduled operation x(t).Manipulating object Job xA chronological sequence running example Job x(t 0), Job x(t 1) ... Job x(t n) (t 0<t 1<...<t n) formed the activity chain of this manipulating object, be designated as Job x(t 0, t 1... t n).Described backup server is safeguarding that simultaneously a catalog data base is used to write down Job x(t) management information.Specifically, Job x(t) management information is stored in the charge book Job of this operation in the catalog data base x(t) among the .Record.
Catalog data base: be used for the management information of storage operation operation, i.e. Job x(t) .Record.Job x(t) .Record mainly stores the root piece of the file that this operation comprises, the file fingerprint Job of this operation x(t) .FF etc.The operation Job that each operation is finished x(t) all in catalog data base, preserve a file fingerprint Job x(t) .FF, Job x(t) .FF storage operation Job x(t) all fingerprints that comprised.Job x(t n) .FF is used for operation Job x(t N+1) the index buffer zone carry out initialization.
Fig. 3 is the backup agent structural representation; Backup agent comprises backup agent initialization module, request monitoring module, operation processing module, file block module and network communication module.
Fig. 4 is the storage server structural representation; Storage server comprises the storage server initialization module, connects monitoring module, operation bill table, operation processing module and network communication module, and index buffer zone, blocking and buffering district, piecemeal Hash table and Disk Logs.
The stores service management a jumbo disk array (RAID) in order to the storage deblocking.Piecemeal is that index stores is on disk array with its fingerprint.In a single day deblocking is write on the disk and is just no longer wiped, and the whole magnetic disk array is just as a daily record like this, and deblocking does not have the compartment of terrain and appends on disk, has eliminated the fragment of disk storage.The disk that is used to store deblocking is called as Disk Logs.Storage server uses the disk array memory partitioning Hash table of a special use, and the piecemeal Hash table is in order to set up the piecemeal fingerprint to the mapping of this piecemeal in the memory address of Disk Logs.All deblockings of backup file carry out index by index block, and all index blocks of a file have been formed an index tree.Each file all has a unique piecemeal the root piece simultaneously, the index of the root of the index tree of root piece storage file, and the metadata and some management information of file also are stored on the root piece simultaneously.The root piece and the index block of file also are stored on the Disk Logs as deblocking.Storage server adopts the backup buffering strategy to improve the data backup speed of system.Be specially: (1) adopts this job instances Job in this activity chain of internal memory index buffer stores x(t n) previous job instances Job x(t N-1) all fingerprints that comprise and newly-generated fingerprint in this job run process.(2) adopt the disk array of a special use in the index buffer zone, not have found deblocking in order to its fingerprint in the interim storage backup process as the blocking and buffering district.The backup procedure of (3) operations is divided into two stages to be finished, and these two stages are designated as first backup procedure and second backup procedure respectively.First backup procedure is finished the backup of file block mutually alternately by backup agent and storage server, makes the index of reference buffer zone search the piecemeal fingerprint, uses the blocking and buffering district to be stored in the deblocking of not finding its fingerprint in the index buffer zone search procedure.Concerning backup agent, even if the backup procedure of operation was through with after first backup procedure was finished.Because this process uses internal memory index buffer zone to carry out the fingerprint inquiry, removed time-consuming piecemeal Hash table inquiry from, so speed is very fast.Second backup procedure is by storage server operation when system is idle relatively.This process dumps to the deblocking of interim storage in the blocking and buffering district on the Disk Logs, uses the piecemeal Hash table to carry out the fingerprint inquiry.This process is set up the index tree of file on Disk Logs simultaneously.Because second backup procedure is to be finished alone by storage server on the backstage, so the application server of operation backup agent is not influenced.During recovery file, storage server is delivered to corresponding backup agent according to file index reconstruct file and file data by network.
Web server: the present invention adopts the B-S pattern that web user interface is provided.The user can be anywhere administration interface by the Web browser login system finish the ruuning situation of interactively back up or restore operation, the operation of surveillance scheduling type automatically with appointing system, can also customize operation, configuration backup server, carry out equipment control etc.
2, storage server Disk Logs
Backup Data piecemeal of the present invention is that index stores is on the Disk Logs of storage server with its fingerprint.Guaranteeing does not like this have two identical piecemeals to be stored on the disk simultaneously, thereby has eliminated the backup of redundant data.Piecemeal makes appending on Disk Logs that piecemeal can be continuous in case storage is just no longer wiped, and has eliminated the disk storage fragment.Data block under the backup file is index with the index block.The index block of file also is stored on the Disk Logs.
2.1, the piecemeal build
For the aspect management, the front of each deblocking has all added a build.Build is system management, comprises that the reconstruct of integrity detection, file index and piecemeal Hash table provides necessary information.Build is 39 bytes altogether, by forming with the lower part:
The build sign of magic:6 character;
Fingerprint: the fingerprint of this piecemeal, totally 20 bytes;
Type: the type of notebook data piecemeal, have three kinds of data of different types piecemeals, promptly the root piece of data block, index block and file is designated as respectively: dc, ic, rc;
Size: the size of notebook data piecemeal does not comprise build.To index block, system stipulates that its size can not surpass 16KB;
Offset: the memory address of notebook data piecemeal on Disk Logs.
2.2, file index
Figure 5 shows that the storage organization of file on Disk Logs.Data block under the file is index with the index block, and index block also is stored on the Disk Logs, and all index blocks of a file have been formed an index tree; Each file all stores a unique root piece on Disk Logs, the metadata of storage file and some management information of presents gone back simultaneously in the index of the root of storage file index tree in the root piece.After file backup was finished, its root piece also stored in the charge book of catalog data base as the management information of operation simultaneously.Among Fig. 5, F 0Represent a file, D iThe expression data block, I iThe expression index block, index block is made up of index entry, P (X) represents an index entry, and it is a tlv triple<H (X), offset, type 〉, wherein X is indexed deblocking, the fingerprint of H (X) expression deblocking X, and offset represents the memory address of deblocking X on Disk Logs, type represents the type of deblocking X, and X can be an index block I i, also can be a data block D i, the arrow among the figure is represented the corresponding relation of indexed and its index entry, M (F 0) expression file F 0Metadata and some management information, index block I 0, I 1And I 2Formed file F 0Index tree, index block I 0The root of index tree for this reason, R 0Expression file F 0The root piece, it is by M (F 0) and a root I who points to the index tree of file 0Index entry P (I 0) form.All data blocks on the Disk Logs can be shared by different files with index block.Figure 6 shows that the situation of different file-sharing data blocks and index block, the meaning that each mark is represented among the figure is identical with Fig. 5.
3, storage server piecemeal Hash table
Storage server piecemeal Hash table of the present invention is in order to set up the piecemeal fingerprint to the mapping of this piecemeal in the memory address of Disk Logs, and the piecemeal Hash table is made up of the bucket of identical size.The barrelage that the piecemeal Hash table is comprised is to determine according to the size of Disk Logs, and the capacity of Disk Logs is big more, and then the barrelage that comprised of piecemeal Hash table is just many more, with the probability of the hash-collision that reduces bucket.System is mapped to fingerprint in the corresponding bucket of Hash table as barrel number according to the preceding n position of the barrelage print of Hash table.Each fingerprint is with tlv triple<fingerprint, offset, type〉form be stored in the bucket, wherein fingerprint represents the fingerprint of this piecemeal, offset represents the memory address of piecemeal on Disk Logs of this fingerprint correspondence, and type represents the type of the piecemeal of this fingerprint correspondence.If hash-collision takes place in bucket, then the triple store of fingerprint in an adjacent bucket.
4, storage server index buffer zone
Figure 7 shows that the structure of index buffer zone.The index buffer zone is an internal memory Hash table, it is made up of a bucket group and many data link tables, bucket group one total 1024*1024 bucket, the numbering of bucket is from 00000H to FFFFFH, bucket may be sky, the bucket if not empty, and then the inside comprises a pointer that points to data link table, corresponding data chained list, the list item storage of data link table is hashing onto the finger print information in this barrel.During Hash, preceding 20 bits of print are hashing onto this fingerprint in the corresponding bucket data link table pointed as barrel number.
The list item structure of data link table is:
Tag: identifier accounts for 4 bits, in order to the state of indication this fingerprint in first backup procedure and second backup procedure;
FingerprintTail: back 140 bits of the fingerprint of this piecemeal, because preceding 20 bits lie in the barrel number, so only need back 140 bits of storage fingerprint here;
Offset: memory address accounts for 64 bits, if this non-NULL is then represented the memory address of the deblocking of this fingerprint correspondence at Disk Logs;
Next: account for 32 bits, point to the pointer of next list item.
" fingerprint " is depicted as the situation that a fingerprint 7E54F36A4EC62...3B is hashing onto the index buffer zone among Fig. 7, preceding 20 bits " 7E54F " of (1) step with fingerprint find the bucket that is numbered 7E54FH as barrel number (bucketNo), (2) step was looked for fingerprintTail in the data link table of this barrel indication be the list item of " 36A4EC62...3B ", if find then show that fingerprint 7E54F36A4EC62...3B has been stored in the index buffer zone, if do not find, then set up the information that a new list item is stored this fingerprint.
The tag of the data link table list item of index buffer zone has three different numerical value, and the meaning of its expression is as follows:
0000: fingerprint derives from the file fingerprint of previous operation, and is not hit in this backup procedure;
1000: fingerprint derives from the file fingerprint of previous operation, and is hit in this backup procedure;
1100: fingerprint is new the generation in this backup procedure.
A backup job Jobx (t N-1) finish after, all fingerprints that this operation comprised are with two tuples<fingerprint, offset〉form of (wherein fingerprint represents the fingerprint of piecemeal, and offset represents the memory address of piecemeal on Disk Logs) is stored in file Jobx (t N-1) among the .FF, file Jobx (t N-1) .FF is stored in the charge book Jobx (t of catalog data base N-1) among the .Record.Jobx (t N-1) .FF is used to initialization operation Jobx (t n) the index buffer zone.Because a large amount of file or data are shared in the adjacent operation of same activity chain usually, use Jobx (t N-1) .FF initialization operation Jobx (t n) the index buffer zone can improve the fingerprint hit rate of buffer zone.
5, backup procedure
For simplicity, be defined as follows mark:
BS: backup server work operations thread;
BA: backup agent work operations thread;
SS: storage server work operations thread;
F a: file;
H a: fingerprint;
M (F): the metadata of file F;
R (F): the root piece of file F;
H (D): the fingerprint of deblocking D;
D (H): the pairing data block/index block of fingerprint H;
F.Index: the core buffer that makes up the index tree of file F;
Index cache: index buffer zone;
Chunk cache: blocking and buffering district;
Hash table: piecemeal Hash table;
Job x(t n) .FileSet: manipulating object Job x(t n) file set;
I (F, level): the set of the index block that index tree F.Index level layer comprises.The leaf of index tree is defined by 0 layer, and the father node of leafy node is the 1st layer of tree, and the like.
I w(F, level): I (F currently in level) is used to store tlv triple<H, offset, type〉the work node;
<H, offset, type 〉: tlv triple, H: fingerprint, offset: the memory address of piecemeal D (H) on Disk Logs, type: the type of piecemeal D (H);
5.1, first backup procedure
First backup procedure is mainly finished by backup agent work operations thread and the cooperation of storage server work operations thread, the steps include:
(1) SS: use Job x(t N-1) .FF initialization index cache;
(2) BA:if (Job x(t n) .FileSet is empty) changeing (20), else is from Job x(t n) read a file F among the .FileSet i
(3) BA: transmit M (F i) to SS;
(4) SS: M (F i) be cached to chunk cache;
(5) BA: to F iCarry out file block based on anchor;
(6) BA: calculate the fingerprint of each piecemeal and the fingerprint set that these fingerprints are formed is sent to SS;
(7) SS:if (the fingerprint set is for empty) changes (17), and else takes out a fingerprint H in the fingerprint set jAnd in index cache the inquiry this fingerprint;
(8) SS:if (finds fingerprint H at index cache j)
(9) SS:if (tag==0000) { tag=1000;<H j, offset〉and be cached to chunkcache; }
(10) SS:else if (tag==1000) is<H j, offset〉and be cached to chunkcache;
(11) SS:else if (tag==1100) is<H j, null〉and be cached to chunk cache; }
(12) SS:else{ is H jBe cached to index cache, tag=1100, offset=null;
(13) SS: request BA transmits D (H j);
(14) BA: transmit D (H j) to SS;
(15) SS:<H K, D (H K) be cached to chunk cache; }
(16) SS: return step (7);
(17) SS: notice BA backs up next file;
(18) BA: return step (2);
(19) BA: to BS and SS report operation Job x(t n) done state withdraw from then.
(20) SS: after receiving the end of job signal of BA, finish first backup procedure, change second backup procedure over to;
(21) BS: after receiving the end of job signal of BA, disconnection and BA are connected, and wait for that SS carries out second backup procedure.
5.1.1 file block based on anchor
In the step (5) of first backup procedure, finish by backup agent work operations thread dispatching backup agent file block module based on the file block of anchor, the steps include:
(1) with the beginning 48 byte b of file 1, b 2..., b 48Be a window, with formula H 1=(b 1* p 47+ b 2* p 46+ ...+b 48) cryptographic hash of first window of mod M calculation document.P is certain prime number in the following formula, and is desirable 17, and M is a constant, desirable 2 32Cryptographic hash is stored in variable H 1In.
(2) slide backward a byte, with formula H 2=(p*H 1+ b 49-b 1* p 48) second window b of mod M calculation document 2, b 3..., b 49Cryptographic hash be stored in variable H 2In.
(3) by that analogy, the cryptographic hash of all windows of calculation document.
(4), get its low 13 and form a binary number, if certain number (such as 61) that this number equals to be scheduled to determines that then its corresponding window is an anchor to the cryptographic hash of each window.
(5) be that the border is divided into data block not of uniform size to file with the anchor.
Above-mentioned file block based on anchor is observed following three agreements: if a) file is less than 48 bytes, then withdraw from the file block algorithm based on anchor, whole file is a data block; B), then give up some anchors and make minimum piecemeal be not less than 2KB (piecemeal of end of file be unique may less than the piecemeal of 2KB) if in a certain section byte stream, comprise too much anchor; C) if all do not have anchor in the byte stream of continuous 64KB, then getting this 64KB is a piecemeal;
File block based on anchor among the present invention has following two characteristics: (1) has the stability of modification, and that is to say only influences data block adjacent in the modifier area to a file modifying, and the border of other data blocks can not be moved.When a file was carried out incremental backup, several data blocks of only revising needed backup like this, and other data block can be shared with former backup file.Revise stability and guaranteed that also the data similarity between file inside and the file is not omitted because of bit offset, thereby detect the repeating data of file to greatest extent.(2) moving window has the advantage of convenience of calculation, the cryptographic hash of its next window can be easy to calculate from the basis of the cryptographic hash of previous window, thereby make the file block based on anchor have the little advantage of computing cost, the time complexity of whole algorithm is O (n), and wherein n is the byte number that file comprises.
The situation of change of this file block when Figure 8 shows that behind the file block again to the file editor.As can be seen from the figure, have the stability of modification based on the file block of anchor, that is to say only influences data block adjacent in the modifier area to a file modifying, and the border of other data blocks can not be moved.A is capable to be depicted as a file and to be divided into B by anchor 1~B 88 not of uniform size, the part of the boundary strip line tooth of each piece is the anchor of 48 bytes.After b, c, d behavior are carried out revising for the 1st, 2,3 time to file, the situation of change of piecemeal, the part of band shade is the part that was modified.B is capable: the 1st modification to file occurs in piece B 4In, do not produce new piece after the modification, only make piece B 4Become piece B 9, other piece does not all change.File backup at this time just only need be piece B 9Backup substitutes original piece B in the past 4That's all.C is capable: the 2nd modification to file occurs in piece B 5In, produced new anchor after the modification, piece B 5Two B have been divided into 10And B 11, other piece does not all change.File backup at this time just only need be piece B 10And B 11Backup replaces original piece B in the past 5Just.D is capable: the 3rd modification to file occurs in piece B 2And B 3Boundary, the result makes B 2And B 3Between anchor lose, two merging become a piece B 12File backup at this time only needs piece B 12Backup replaces original piece B in the past 2And B 3
5.2, second backup procedure
Second backup procedure is mainly finished when system is idle relatively by storage server work operations thread, the steps include:
(1) SS:if (Job x(t n) .FileSet is empty) changeing (19), else is from Job x(t n) get a filename F among the .FileSet i
(2) SS: be file F iCreate core buffer F i.Index, and at F i.Index create R (F in i), then the M (F among the chunk cache i) deposit R (F i);
(3) SS:if (does not have and F among the chunk cache iRelevant tuple) change (14), else reads one and F from chunk cache iRelevant tuple;
(4) SS:if (is<H j, offset 〉), change step (12);
(5) SS:else if (is<H j, D (H j))
(6) SS: in hash table, inquire about H j
(7) SS:if (finding) writes " offset " value the H of index cache neutralization jIn the corresponding list item, change step (12);
(8) SS:else{ is D (H j) be appended to Disk Logs, upgrade hash table simultaneously;
(9) SS: the H that " offset " value is write index cache neutralization jIn the corresponding list item, change step (12);
(10) SS:else if (is<H j, null 〉)
(11) SS: H from index cache jRead " offset " value in the corresponding list item;
(12)SS:insert(<H j,offset,dc>,0,F i.Index);
(13) SS: return step (3);
(14)SS:storeRemain(F i.Index,R(F i));
(15) SS: R (F i) be appended to Disk Logs, upgrade hash table simultaneously;
(16) SS: R (F i) send BS to;
(17) BS: R (F i) be sent to catalog data base and be stored in Job x(t n) among the .Record;
(18) SS: return step (1);
(19) SS: create file Job x(t n) .FF;
(20) SS: read index cache,,, offset<H to the list item of each eligible (tag==1000ortag==1100)〉write file Job x(t n) among the .FF;
(21) SS: file Job x(t n) .FF sends BS to;
(22) BS: file Job x(t n) .FF is sent to catalog data base and is stored in Job x(t n) among the .Record;
(23) SS: to BS report operation Job x(t n) done state;
(24) BS: interruption and SS are connected, operation Job x(t n) done state write the Job of catalog data base x(t n) among the .Record, and finish operation Job x(t n) operation.
In above-mentioned algorithm, the algorithm of step (12) and (14) two functions is as follows:
Step (12) algorithm
insert(<H,offset,type>,level,F.Index)
∥ stores tlv triple<H, offset, type〉to F.Index.
//level: the storage tlv triple<H, offset, type〉the level number of index node in index tree F.Index.
if(I(F,level)=)
{ create I w(F, level);<H, offset, type〉store I into w(F, level); Return; }
Else if (I w(F, level) less than)
Storage<H, offset, type〉to I w(F, level) in; Return; }
Else if (I w(F, level) full)
{ calculate H (I w(F, level));
In hash table, inquire about H (I w(F, level));
If does not find
I w(F level) is appended to Disk Logs, upgrades hash table simultaneously;
insert(<H(I w(F,level)),offset,ic>,level+1,F.Index);
Create a new index node I w(F, level);
Storage<H, offxet, type〉to I w(F, level) in; Return;
}
}
Step (14) algorithm
storeRemain(F.Index,R(F))
{ // active index the node of each layer among the E.Index is stored in the Disk Logs.
int?level:=0;
Loop: calculate H (I w(F, level));
In hash table, inquire about H (I w(F, level));
If does not find
I w(F level) is appended to Disk Logs, upgrades hash table simultaneously;
if(|I(F,level)|=1)
{ storage<H (I w(F, level)), offset, ic〉to R (F); Return; }
else
{insert(<H(I w(F,level)),offset,ic>,level+1,F.Index);
level:=level+1;goto?loop;
}
}

Claims (4)

1. the data backup system based on fingerprint comprises backup server, backup agent, storage server and Web server, and they intercom mutually by network and finish data backup and recovery, it is characterized in that:
Described backup server is equipped with configuration file and catalog data base, the manipulating object of recording user definition in the configuration file of backup server, manipulating object comprises the attribute of appointing system operation task operation, and backup server is being controlled the process of whole data backup and recovery by manipulating object; Catalog data base storage operation record, charge book is preserved the management information of manipulating object operation;
Described backup agent unit is installed on that each needs on the main frame of Backup Data in the network, from the file system of place main frame, read the file that needs backup by the backup agent unit during backup, file is carried out based on the piecemeal of anchor and calculates the fingerprint of piecemeal, and the block data that fingerprint and part are needed is sent to storage server by network; The backup agent unit receives file data and writes the file system of place main frame under the designated directory from storage server by network during recovery;
Described storage server is equipped with the large capacity disc array, the large capacity disc array is the destination of data backup, receive fingerprint or deblocking from corresponding backup agent unit by network during backup, deblocking is stored on the disk, and set up the index of file; During recovery then from the large capacity disc array according to file index reconstruct file, and file data delivered to corresponding backup agent by network;
Described Web server is the B-S pattern webpage subscriber administration interface of native system, by the login Web server, the user both can appointing system finishes the ruuning situation of interactively back up or restore operation, the operation of surveillance scheduling type automatically, can also revise configuration file, the customization manipulating object of backup server, carry out equipment control.
2. the data backup system based on fingerprint as claimed in claim 1 is characterized in that, described backup server comprises backup server initialization module, order monitoring module, command processing module, operation processing module and network communication module;
Described backup services initialization module is carried out initial work, comprises reading configuration file, set up resource chained list in the internal memory, check catalog data base state, the data consistency that guarantees configuration file and catalog data base and integrality, startup command policing port, accepting user command, initialization job queue and user command formation, load operations object, initiating task and network monitoring service in job queue from Web server;
It is a network monitoring thread that is generated by system that module is monitored in described order, connection request to Web server authenticates, assurance has only the Web server ability connected system through system authorization, monitors the command request of having sent by the Web server that authenticates; Receive orders when asking, command request is joined in the user command formation wait for system handles;
Described command processing module comprises a user command formation and N command job thread, and when the user command formation was overflowed, order was monitored module and changed sleep state over to; Constantly reading order and the execution from the user command formation of these command job threads finished different functions according to the difference of performed order; When order is monitored module adds an order in the user command formation,, just generate a new command job thread if when the number of current command job thread that does not have a free time and active command job thread does not reach N; The command job thread all checks from the user command formation that at every turn order monitors the state of module during reading order, if it is in sleep state then wakes it up;
Described operation processing module comprises that a job queue, a L work operations thread and a job queue load thread, and when the operation formation was overflowed, job queue loaded thread and enters sleep state; The work operations thread is constantly got manipulating object and is carried out from job queue, call different resources, realize different functions according to the difference of manipulating object attribute; Job queue loads thread and carries out job scheduling, check the scheduling strategy attribute of each manipulating object in the operation resource chain, the manipulating object that needs management and running is added in the job queue, if when the current number that does not have idle work operations thread and an active work operations thread does not reach L, just generate a new work operations thread; The work operations thread all checks from job queue that at every turn job queue loads the state of thread during the reading operation object, if it is in sleep state then wakes it up;
Described network communication module encapsulates the network communication applications DLL (dynamic link library) of standard, provide network communication interface to command job thread and work operations thread, network communication interface is realized the Data Transport Protocol between backup server, backup agent and the storage server.
3. the data backup system based on fingerprint as claimed in claim 1 is characterized in that, described backup agent comprises backup agent initialization module, request monitoring module, operation processing module, file block module and network communication module;
Described backup agent initialization module is carried out initial work, comprises reading the backup agent configuration file, setting up the memory source chained list, the initialization job queue, start backup server requests and monitor module;
Described request is monitored the connection request that module is monitored backup server on the network, authenticates the backup server of connection, and authentication is communicated by letter with this backup server by a network connection of back generation socket and added in the job queue;
Described operation processing module comprises a job queue and M work operations thread, and when the operation formation was overflowed, request monitoring module changed sleep state over to; The work operations thread takes out a network connection socket from job queue after, at first set up a job control record for operation, network is connected the socket chain goes in the member variable of job control record, connect socket by this network then and backup server mutual, the relevant attribute of backup server manipulating object by conversion after assignment to the corresponding member variable of job control record; Use the operation bill ticket that obtains from backup server to connect corresponding storage server then, produce a network and be connected socket and it chain is gone in the member variable of job control record with storage server communication; When request monitoring module adds network when connecting socket in job queue,, just generate a new work operations thread if when the current number that does not have idle work operations thread and an active work operations thread does not reach M; The work operations thread is got the state of all checking request monitoring module when a network connects socket at every turn from job queue, if it is in sleep state then wakes it up;
Described file block module is accepted the file block task of the command execution backup job of work operations thread in the operation processing module, each file that on client file systems, opens file concentrated, file is carried out based on the piecemeal of anchor and calculates the piecemeal fingerprint and corresponding storage server coordinates to carry out the backup algorithm of first backup procedure;
Described network communication module is made up of the network connection socket of operation, and each operation of backup agent all has two networks and connects sockets, is respectively applied for the backup server operation and the storage server operation of this operation correspondence and communicates by letter.
4. the data backup system based on fingerprint as claimed in claim 1, it is characterized in that, described storage server comprises the storage server initialization module, connects monitoring module, operation bill table, operation processing module and network communication module, and index buffer zone, blocking and buffering district, piecemeal Hash table and Disk Logs;
Described storage server initialization module is carried out initial work, comprises resolving the storage server configuration file, sets up the memory source chained list, starts the related service thread;
The connection request of described connection monitoring module monitoring backup server and backup agent authenticates the backup server that connects, and authentication generates a network by the back and connects socket and communicate by letter with this backup server and add in the job queue; To the backup agent that connects, then check operation bill table so that it is authenticated according to its operation bill ticket that shows, authentication is communicated by letter with this backup agent by a network connection of back generation socket and is linked in the member variable of corresponding job control record;
Described operation bill table is used to store the bill that operation authenticates to backup agent;
Described operation processing module comprises a job queue and W work operations thread, when the operation formation is overflowed, connects monitoring module and changes " refusal backup server connection request " state over to; The work operations thread takes out a network connection socket from job queue after, at first set up a job control record for operation, network is connected the socket chain goes in the member variable of job control record, mutual by this network connection socket and backup server then, the relevant attribute of backup server manipulating object by conversion after assignment give the corresponding member variable of job control record, and generate an operation bill ticket at random and register in the operation bill table and and transmit this operation bill ticket to the backup server manipulating object; In job queue, adds a network when connecting socket when connecting monitoring module,, just generate a new work operations thread if when the number of current work operations thread that does not have a free time and active work operations thread does not reach W; The work operations thread is got from job queue at every turn and is all checked the state that connects monitoring module when a network connects socket, if it is in " refusal backup server connection request " state then cancels this state so that it accepts the backup server connection request;
Described network communication module is made up of the network connection socket of operation, and each operation of storage server all has two networks and connects sockets, is respectively applied for the backup server operation and the backup agent operation of this operation correspondence and communicates by letter;
Described index buffer zone is the infrastructure that first backup procedure and second backup procedure are carried out in the storage server operation, and the index buffer zone is realized with an internal memory Hash table, is used for storing this job instances of this activity chain Job x(t n) previous job instances Job x(t N-1) all fingerprints that comprise and newly-generated fingerprint in this job run process;
Described blocking and buffering district is the infrastructure that first backup procedure and second backup procedure are carried out in the storage server operation, the blocking and buffering district does not have found deblocking with an independently disk array realization in order to its fingerprint in interim storage first backup procedure in the index buffer zone;
Described piecemeal Hash table is the infrastructure that second backup procedure is carried out in the storage server operation, and the piecemeal Hash table is with an independently disk array realization, in order to set up the piecemeal fingerprint to the mapping of this piecemeal in the memory address of Disk Logs;
Described Disk Logs is the infrastructure that second backup procedure is carried out in the storage server operation, and Disk Logs is with an independently disk array realization, in order to the file index of storing deblocking and storing with block form.
CNB2007101687158A 2007-12-10 2007-12-10 A kind of data backup system based on fingerprint Expired - Fee Related CN100547555C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2007101687158A CN100547555C (en) 2007-12-10 2007-12-10 A kind of data backup system based on fingerprint

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2007101687158A CN100547555C (en) 2007-12-10 2007-12-10 A kind of data backup system based on fingerprint

Publications (2)

Publication Number Publication Date
CN101183323A true CN101183323A (en) 2008-05-21
CN100547555C CN100547555C (en) 2009-10-07

Family

ID=39448610

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2007101687158A Expired - Fee Related CN100547555C (en) 2007-12-10 2007-12-10 A kind of data backup system based on fingerprint

Country Status (1)

Country Link
CN (1) CN100547555C (en)

Cited By (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009152716A1 (en) * 2008-06-19 2009-12-23 华为技术有限公司 Method, device for storing data fingerprint and method for synchronizing data of plurality of devices
CN101814045A (en) * 2010-04-22 2010-08-25 华中科技大学 Data organization method for backup services
CN101887388A (en) * 2010-06-18 2010-11-17 中兴通讯股份有限公司 Data backup system and method based on memory database
CN102169453A (en) * 2011-03-08 2011-08-31 杭州电子科技大学 File online backup method
CN101599079B (en) * 2009-07-22 2011-08-31 中国科学院计算技术研究所 Backup data centralized storage management method
CN102436408A (en) * 2011-10-10 2012-05-02 上海交通大学 Data storage cloud and cloud backup method based on Map/Dedup
CN102456059A (en) * 2010-10-21 2012-05-16 英业达股份有限公司 Data deduplication processing system
CN102510340A (en) * 2011-10-11 2012-06-20 浪潮电子信息产业股份有限公司 Method for realizing remote rapid backup by utilizing common Internet network
CN102714789A (en) * 2011-04-19 2012-10-03 华为终端有限公司 Method for backuping and recovering data of mobile terminal and mobile terminal thereof
CN102915325A (en) * 2012-08-11 2013-02-06 深圳市极限网络科技有限公司 Md5 Hash list-based file decomposing and combining technique
CN103119590A (en) * 2010-09-24 2013-05-22 日立数据系统有限公司 System and method for managing integrity in a distributed database
CN103200169A (en) * 2013-01-30 2013-07-10 中国科学院自动化研究所 Method and system of user data protection based on proxy
WO2013114230A1 (en) * 2012-02-02 2013-08-08 International Business Machines Corporation Erasure correcting codes for storage arrays
CN103384270A (en) * 2013-06-28 2013-11-06 环境保护部华南环境科学研究所 Method and system for data backup of internal and external network penetrating remote data transmission
CN103500120A (en) * 2013-09-17 2014-01-08 北京思特奇信息技术股份有限公司 Distributed cache high-availability processing method and system based on multithreading asynchronous double writing
CN103677973A (en) * 2013-09-01 2014-03-26 西安重装渭南光电科技有限公司 Distributed multi-task scheduling management system
CN103870362A (en) * 2014-03-21 2014-06-18 华为技术有限公司 Data recovery method, data recovery device and backup system
WO2014107845A1 (en) * 2013-01-09 2014-07-17 华为技术有限公司 Data processing method and device
US8918701B2 (en) 2011-02-28 2014-12-23 SK Hynix Inc. Nested multiple erasure correcting codes for storage arrays
CN104331525A (en) * 2014-12-01 2015-02-04 国家计算机网络与信息安全管理中心 Sharing method based on repeating data deletion
CN104408141A (en) * 2014-12-01 2015-03-11 国家计算机网络与信息安全管理中心 Redundancy removal file system and data deployment method thereof
TWI477981B (en) * 2012-12-12 2015-03-21 Hon Hai Prec Ind Co Ltd System and method for avoiding data parts stored in servers repeatedly
CN104508666A (en) * 2012-10-31 2015-04-08 惠普发展公司,有限责任合伙企业 Cataloging backup data
US9058291B2 (en) 2011-02-28 2015-06-16 International Business Machines Corporation Multiple erasure correcting codes for storage arrays
CN106922187A (en) * 2014-08-14 2017-07-04 安东·戈斯德夫 System, method and computer program product for low influence backup
CN107959658A (en) * 2016-10-17 2018-04-24 北京视联动力国际信息技术有限公司 A kind of Web conference method of data synchronization and its system
CN109309706A (en) * 2017-07-27 2019-02-05 Emc知识产权控股有限公司 The method and system of the fingerprint precalculated and data block is shared between the storage system on cloud local area network
CN109347899A (en) * 2018-08-22 2019-02-15 北京百度网讯科技有限公司 The method of daily record data is written in distributed memory system
CN110945496A (en) * 2017-07-27 2020-03-31 思科技术公司 System and method for state object data store
CN111382012A (en) * 2020-03-03 2020-07-07 广州鼎甲计算机科技有限公司 Backup method and device for MySQL cloud database, computer equipment and storage medium
CN114157674A (en) * 2020-08-17 2022-03-08 中移(上海)信息通信科技有限公司 Wireless communication method, device, system, server and medium
CN116566973A (en) * 2023-06-20 2023-08-08 北京中宏立达科技发展有限公司 File transmission system based on peer-to-peer network

Cited By (49)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101610281B (en) * 2008-06-19 2012-11-21 华为技术有限公司 Method and device for storing data fingerprints
WO2009152716A1 (en) * 2008-06-19 2009-12-23 华为技术有限公司 Method, device for storing data fingerprint and method for synchronizing data of plurality of devices
CN101599079B (en) * 2009-07-22 2011-08-31 中国科学院计算技术研究所 Backup data centralized storage management method
CN101814045A (en) * 2010-04-22 2010-08-25 华中科技大学 Data organization method for backup services
CN101887388A (en) * 2010-06-18 2010-11-17 中兴通讯股份有限公司 Data backup system and method based on memory database
CN103119590B (en) * 2010-09-24 2016-08-17 日立数据系统有限公司 The method and system of integrity is managed in distributed data base
CN103119590A (en) * 2010-09-24 2013-05-22 日立数据系统有限公司 System and method for managing integrity in a distributed database
CN102456059A (en) * 2010-10-21 2012-05-16 英业达股份有限公司 Data deduplication processing system
US8918701B2 (en) 2011-02-28 2014-12-23 SK Hynix Inc. Nested multiple erasure correcting codes for storage arrays
US9058291B2 (en) 2011-02-28 2015-06-16 International Business Machines Corporation Multiple erasure correcting codes for storage arrays
CN102169453A (en) * 2011-03-08 2011-08-31 杭州电子科技大学 File online backup method
US10095715B2 (en) 2011-04-19 2018-10-09 Huawei Device (Dongguan) Co., Ltd. Data backup and recovery method for mobile terminal and mobile terminal
CN102714789B (en) * 2011-04-19 2014-04-02 华为终端有限公司 Method for backuping and recovering data of mobile terminal and mobile terminal thereof
CN102714789A (en) * 2011-04-19 2012-10-03 华为终端有限公司 Method for backuping and recovering data of mobile terminal and mobile terminal thereof
CN102436408B (en) * 2011-10-10 2014-02-19 上海交通大学 Data storage cloud and cloud backup method based on Map/Dedup
CN102436408A (en) * 2011-10-10 2012-05-02 上海交通大学 Data storage cloud and cloud backup method based on Map/Dedup
CN102510340A (en) * 2011-10-11 2012-06-20 浪潮电子信息产业股份有限公司 Method for realizing remote rapid backup by utilizing common Internet network
WO2013114230A1 (en) * 2012-02-02 2013-08-08 International Business Machines Corporation Erasure correcting codes for storage arrays
US8869006B2 (en) 2012-02-02 2014-10-21 International Business Machines Corporation Partial-maximum distance separable (PMDS) erasure correcting codes for storage arrays
US8874995B2 (en) 2012-02-02 2014-10-28 International Business Machines Corporation Partial-maximum distance separable (PMDS) erasure correcting codes for storage arrays
CN102915325A (en) * 2012-08-11 2013-02-06 深圳市极限网络科技有限公司 Md5 Hash list-based file decomposing and combining technique
CN104508666A (en) * 2012-10-31 2015-04-08 惠普发展公司,有限责任合伙企业 Cataloging backup data
US9552161B2 (en) 2012-12-12 2017-01-24 Shenzhen Airdrawing Technology Service Co., Ltd Repetitive data block deleting system and method
TWI477981B (en) * 2012-12-12 2015-03-21 Hon Hai Prec Ind Co Ltd System and method for avoiding data parts stored in servers repeatedly
WO2014107845A1 (en) * 2013-01-09 2014-07-17 华为技术有限公司 Data processing method and device
CN103200169A (en) * 2013-01-30 2013-07-10 中国科学院自动化研究所 Method and system of user data protection based on proxy
CN103384270A (en) * 2013-06-28 2013-11-06 环境保护部华南环境科学研究所 Method and system for data backup of internal and external network penetrating remote data transmission
CN103677973A (en) * 2013-09-01 2014-03-26 西安重装渭南光电科技有限公司 Distributed multi-task scheduling management system
CN103500120A (en) * 2013-09-17 2014-01-08 北京思特奇信息技术股份有限公司 Distributed cache high-availability processing method and system based on multithreading asynchronous double writing
CN103870362B (en) * 2014-03-21 2017-08-04 华为技术有限公司 A kind of data reconstruction method, device and standby system
CN103870362A (en) * 2014-03-21 2014-06-18 华为技术有限公司 Data recovery method, data recovery device and backup system
CN106922187A (en) * 2014-08-14 2017-07-04 安东·戈斯德夫 System, method and computer program product for low influence backup
CN106922187B (en) * 2014-08-14 2020-10-30 卫盟软件股份公司 System, method, and computer program product for low impact backup
CN104331525B (en) * 2014-12-01 2018-01-16 国家计算机网络与信息安全管理中心 Sharing method based on data de-duplication
CN104408141B (en) * 2014-12-01 2018-04-17 国家计算机网络与信息安全管理中心 One kind disappears superfluous file system and its data deployment method
CN104331525A (en) * 2014-12-01 2015-02-04 国家计算机网络与信息安全管理中心 Sharing method based on repeating data deletion
CN104408141A (en) * 2014-12-01 2015-03-11 国家计算机网络与信息安全管理中心 Redundancy removal file system and data deployment method thereof
CN107959658B (en) * 2016-10-17 2019-04-26 视联动力信息技术股份有限公司 A kind of Web conference method of data synchronization and its system
CN107959658A (en) * 2016-10-17 2018-04-24 北京视联动力国际信息技术有限公司 A kind of Web conference method of data synchronization and its system
CN110945496A (en) * 2017-07-27 2020-03-31 思科技术公司 System and method for state object data store
CN109309706A (en) * 2017-07-27 2019-02-05 Emc知识产权控股有限公司 The method and system of the fingerprint precalculated and data block is shared between the storage system on cloud local area network
CN109309706B (en) * 2017-07-27 2022-03-04 Emc知识产权控股有限公司 Method and system for sharing fingerprints and data blocks among storage systems of cloud local area network
CN109347899A (en) * 2018-08-22 2019-02-15 北京百度网讯科技有限公司 The method of daily record data is written in distributed memory system
CN109347899B (en) * 2018-08-22 2022-03-25 北京百度网讯科技有限公司 Method for writing log data in distributed storage system
CN111382012A (en) * 2020-03-03 2020-07-07 广州鼎甲计算机科技有限公司 Backup method and device for MySQL cloud database, computer equipment and storage medium
CN111382012B (en) * 2020-03-03 2020-12-29 广州鼎甲计算机科技有限公司 Backup method and device for MySQL cloud database, computer equipment and storage medium
CN114157674A (en) * 2020-08-17 2022-03-08 中移(上海)信息通信科技有限公司 Wireless communication method, device, system, server and medium
CN116566973A (en) * 2023-06-20 2023-08-08 北京中宏立达科技发展有限公司 File transmission system based on peer-to-peer network
CN116566973B (en) * 2023-06-20 2023-11-07 北京中宏立达科技发展有限公司 File transmission system based on peer-to-peer network

Also Published As

Publication number Publication date
CN100547555C (en) 2009-10-07

Similar Documents

Publication Publication Date Title
CN100547555C (en) A kind of data backup system based on fingerprint
US8065273B2 (en) Automated priority restores
US8165221B2 (en) System and method for sampling based elimination of duplicate data
US9239687B2 (en) Systems and methods for retaining and using data block signatures in data protection operations
CN101814045B (en) Data organization method for backup services
US7334098B1 (en) Producing a mass storage backup using a log of write commands and time information
US9405776B2 (en) Remote backup and restore
CN102521072B (en) Virtual tape library equipment and data recovery method
Tan et al. CABdedupe: A causality-based deduplication performance booster for cloud backup services
CN102012851B (en) Continuous data protection method and server
US20060036904A1 (en) Data replication method over a limited bandwidth network by mirroring parities
US20060036901A1 (en) Data replication method over a limited bandwidth network by mirroring parities
CN103959256A (en) Fingerprint-based data deduplication
CN106708653B (en) Mixed tax big data security protection method based on erasure code and multiple copies
US9002800B1 (en) Archive and backup virtualization
US10261696B2 (en) Performance during playback of logged data storage operations
CN110109778A (en) A kind of a large amount of small data file backup methods and restoration methods
CN113886143B (en) Virtual machine continuous data protection method and device and data recovery method and device
CN106708662A (en) Method for restoring database and equipment
CN112800019A (en) Data backup method and system based on Hadoop distributed file system
CN112380067A (en) Metadata-based big data backup system and method under Hadoop environment
WO2023197937A1 (en) Data processing method and apparatus, storage medium, and computer program product
US20190073131A1 (en) Dedupe file system for bulk data migration to cloud platform
Nataraj et al. A Survey on Object Storage System in Cloud Computing
CN117971382A (en) High-performance virtual machine hot migration method for local file system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20091007

Termination date: 20201210

CF01 Termination of patent right due to non-payment of annual fee