CN103902735A - Application perception data routing method oriented to large-scale cluster deduplication and system - Google Patents

Application perception data routing method oriented to large-scale cluster deduplication and system Download PDF

Info

Publication number
CN103902735A
CN103902735A CN201410158590.0A CN201410158590A CN103902735A CN 103902735 A CN103902735 A CN 103902735A CN 201410158590 A CN201410158590 A CN 201410158590A CN 103902735 A CN103902735 A CN 103902735A
Authority
CN
China
Prior art keywords
file
backup
node
disappears
application
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410158590.0A
Other languages
Chinese (zh)
Other versions
CN103902735B (en
Inventor
付印金
胡谷雨
倪桂强
谢钧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
PLA University of Science and Technology
Original Assignee
PLA University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by PLA University of Science and Technology filed Critical PLA University of Science and Technology
Priority to CN201410158590.0A priority Critical patent/CN103902735B/en
Publication of CN103902735A publication Critical patent/CN103902735A/en
Application granted granted Critical
Publication of CN103902735B publication Critical patent/CN103902735B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • G06F11/1453Management of the data involved in backup or backup restore using de-duplication of the data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/44Distributed routing

Abstract

The invention discloses an application perception data routing method oriented to large-scale cluster deduplication and a large-scale backup storage cluster system. The application perception data routing method comprises the steps of (S10) obtaining backup file meta-information, (S20) sensing a file application type, (S30) calculating deduplication storage node loads, (S40) selecting file routing nodes, (S50) sending files to target nodes, (S60) conducting deduplication on the files in the nodes, and the like. The large-scale backup storage cluster system comprises a plurality of backup clients, a backup server and a plurality of deduplication storage servers. The data routing method and the system have the advantages that the data deduplication rate is high, the node throughput rate is high, system communication overheads are low, and system loads are balanced.

Description

Towards large-scale cluster disappear heavy application perception data method for routing and system
Technical field
The invention belongs to information storage and cluster computing field, particularly a kind of towards large-scale cluster disappear heavy application perception data method for routing and extensive back-up storage group system.
Background technology
Data high redundancy in the backup storage system of numerous management mass datas, cluster (Cluster Deduplication) technology heavily of disappearing is that the data that realize distributed parallel on back-up storage server cluster system disappear and heavily process, and can manage the demand expanded in capacity and performance by satisfying magnanimity Backup Data.For building energy-saving and environmental protection, efficient green data center, cluster disappears and weighs the core technology that has become the management of current data central store.
For the consideration to system overhead, cluster disappears to weigh and often selects loosely coupled design, does not go the data of carrying out cross-node to disappear heavily.The data that backup client sends are first disappeared to each and are weighed storage server node by data route assignment, and the heavy storage server interior data content repeating of independent parallel ground deletion of node again disappears.Data route directly affects the system throughput of the storage space utilization factor of Backup Data, the heavy storage server node that disappears, load balancing and the communication overhead of the heavy storage server cluster that disappears.Therefore, data routing method is most important to the disappear lifting of heavy efficiency of cluster.
At present, the cluster heavy data routing method that disappears mainly contains three kinds: the piece DBMS method for routing based on distributed hashtable, the super piece DBMS method for routing based on status information and the file-level data routing method based on similarity.Piece DBMS method for routing based on distributed hashtable, 2009-02-23) and Chinese invention patent application " distributed data deduplication system and method thereof " (application number: 201110461322.2 as USENIX FAST ' 09 meeting paper " HYDRAstor:a Scalable Secondary Storage " (open day:, open day: 2011-12-28), be that data block characteristics value is assigned to the different pieces of information heavy node that disappears by distributed hashtable.Although the method can effectively improve space availability ratio and reduce communication overhead, can not retain the data locality in node and affects system throughput.Based on the super piece DBMS method for routing of status information, as USENIX FAST ' 11 meeting papers " Tradeoffs in Scalable Data Routing for Deduplication Clusters " (open day: 2011-02-14), continuous many data blocks after dividing are merged into even-grained super piece, before super piece route, all need to inquire about the repeat number of canned data piece in its contained data block and each node, then under the prerequisite of considering load balance, super piece is routed to the maximum node of repeating data piece number as far as possible.This strategy can obtain high data reduction rate under the prerequisite of load balance, but in the system communication expense of its broadcast type and node frequently piece fingerprint query manipulation had a strong impact on system performance.File-level data route based on similarity, as IEEE/ACM MASCOTS ' 09 meeting paper " Extreme Binning:Scalable, Parallel Deduplication for Chunk based File Backup " (open day: 2009-09-21), the minimum value of utilization based on data block fingerprint in Broder minimum value independence substitution theorem selecting file is as the similar features of file, by distributed hash mechanism, similar file is routed to the identical heavy storage server node that disappears, but in the time that in data stream, similarity is lower, can not detect document similarity, the cluster of Backup Data disappears, and heavily effect is poor.
In a word, the problem that prior art exists is: the cluster of hundreds and thousands of node scales of data center is disappeared heavily, and the defects such as heavily rate is low, node throughput is low, system communication expense is large and system load is unbalanced that exist data to disappear.
Summary of the invention
The object of the present invention is to provide a kind ofly towards large-scale cluster disappear heavy application perception data method for routing and system, there are data and disappear that heavily rate is high, node throughput is high, system communication expense is little and the feature of system load balancing.
The technical solution that realizes the object of the invention is: a kind of towards the large-scale cluster heavy application perception data method for routing that disappears, described extensive back-up storage group system comprises multiple backup client (100), a backup server (200) and multiple heavy storage server (300) that disappears, it is characterized in that, comprise the steps:
S10) obtain backup file metamessage: backup client (100) sends the file backup request message of the file metamessages such as title, user and the size of include file to backup server (200);
S20) perception file applications type: backup server (200) is divided the application type of backup file according to file metamessage, and inquire about application references structure, obtain the candidate that can deposit respective type application file heavy storage server (300) node listing that disappears;
S30) calculate the heavy memory node load that disappears: backup server (200) obtains respectively to disappear by inquiry application perception index structure and weighs the real-time dynamic load information of storage server (300) node, and calculate and can keep the low load of load balance weight storage server (300) node listing that disappears according to these node load information and backup file metamessage;
S40) selecting file routing node: backup server (200) is analyzed candidate's heavy storage server node listing and low load heavy storage server node listing that disappears that disappears, choose a low load candidate server node depositing same type application data as file route target node, and result is returned to backup client (100);
S50) Transmit message is to destination node: the file routing decision result that backup client (100) is returned according to backup server (200), sends to corresponding route target heavy storage server (300) node that disappears by each file in backup session;
S60) file disappears heavily in processing node: heavy storage server (300) node that disappears is according to the difference of application file data layout and content, independently dissimilar application file is carried out to data and disappears and heavily process.
A kind of for realizing towards the disappear extensive back-up storage group system of heavy application perception data method for routing of large-scale cluster, comprise multiple backup client (100), a backup server (200) and multiple heavy storage server (300) that disappears, it is characterized in that:
Described backup client (100) is for send the file backup request message of the file metamessage such as title, user and size of include file to backup server (200),
Backup server (200) is for according to the application type of file metamessage perception backup file, and inquires about application references structure, obtains the candidate that can deposit respective type application file heavy storage server (300) the node number list that disappears;
Backup server (200) is for obtain the real-time dynamic load information of heavy storage server (300) node that respectively disappears by inquiry application perception index structure, and calculates and can keep the low load of load balance heavy storage server (300) node listing that disappears according to these node load information and backup file metamessage;
Backup server (200) is for analyzing candidate's heavy storage server node listing and low load heavy storage server node listing that disappears that disappears, choose a low load both candidate nodes depositing same type application data as file route target node, and result is returned to backup client (100);
The file routing decision result that backup client (100) is returned according to backup server (200), sends to corresponding route target heavy storage server (300) node that disappears by each file in backup session;
Disappear heavy storage server (300) node for according to the difference of application file data layout and content, independently dissimilar application file is carried out to data and disappear and heavily process.
The present invention compared with prior art, its remarkable advantage:
1, data disappear, and heavily rate is high: the data routing policy by application perception weighs storage server node by similar data allocations to same disappearing, reduce the data overlap between each node, the same file disappearing in heavy storage server node is carried out to data independently by application and disappear and heavily process;
2, node throughput is high: based on file granularity distribute data, keep good data access locality;
3, system load balance: the actual physical storage capacity that disappears heavy storage server node according to each carrys out dynamic assignment storage resources, ensures the load balance of whole back-up storage group system;
4, communication overhead is low: judge data route to be applied as granularity, greatly reduced the message communicating expense of system.
In a word, the invention provides a kind of back-up storage group system of supporting hundreds and thousands of node scales and carry out the cluster heavy application perception data method for routing that disappears.The storage space that it not only can greatly save Backup Data uses, and can also optimize the heavy throughput of disappearing of the heavy storage server node that disappears, and reduces the communication overhead of group system inside, and keeps the load balance of each heavy storage server node that disappears.
Below in conjunction with the drawings and specific embodiments, the present invention is described in further detail.
Brief description of the drawings
Fig. 1 is the extensive back-up storage group system of the present invention structural representation.
Fig. 2 is the present invention towards the large-scale cluster heavy application perception data method for routing main flow chart that disappears.
Fig. 3 is perception file applications type schematic diagram.
Fig. 4 is selecting file routing node flow chart of steps in Fig. 2.
Embodiment
As shown in Figure 1, extensive back-up storage group system of the present invention, comprises multiple backup client 100, backup server 200 and multiple heavy storage server 300 that disappears;
Described backup client 100 is for sending the file backup request message of the file metamessage such as title, user and size of include file to backup server 200; The file routing decision result that backup client 100 is returned according to backup server 200, sends to corresponding route target heavy storage server 300 nodes that disappear by each file in backup session;
Described each backup client 100 comprises file I/O module 101 and backup request module 102, described backup request module 102 is for carrying out file backup session with described backup server 200, the file routing decision result of described file I/O module 101 for returning according to described backup server 200, by each file backup to the heavy storage server 300 that disappears accordingly;
Backup server 200 is for according to the application type of file metamessage perception backup file, and inquires about application references structure, obtains the candidate that can the deposit respective type application file heavy storage server 300 node number lists that disappear; Backup server 200 is the cores of realizing the inventive method.
Backup server 200 obtains the real-time dynamic load information of heavy storage server 300 nodes that respectively disappear by inquiry application perception index structure, and calculates and can keep the low load of load balance heavy storage server 300 node listings that disappear according to these node load information and backup file metamessage;
Backup server 200 is analyzed candidate's heavy storage server node listing and low load heavy storage server node listing that disappears that disappears, choose a low load both candidate nodes depositing same type application data as file route target node, and result is returned to backup client 100;
Described backup server 200 comprises backup session administration module 201, application perception module 202, file routing decision module 203 and load balance module 204, described backup session administration module 201 is for receiving the backup request of backup client 100, file is carried out to grouping management by the identical copy session from same user, and by file routing decision result feedback to backup client 100, described application perception module 202 is for classifying by application type to file, described load balance module 204 weighs the system load balancing of storage server cluster for keeping disappearing, described file routing decision module 203 is for being assigned to the application file of same type the heavy storage server node of disappearing of same low load, and file route target nodal information is fed back to backup client 100, and set up application file to the heavily mapping relations of storage server node that disappear, while recovery for file.
Disappear heavy storage server 300 nodes according to the difference of application file data layout and content, independently dissimilar application file is carried out to data and disappear heavily and to process.
The described heavy storage server 300 that disappears comprises data disappear heavy engine 3 01, file metadata administration module 302 and block management data module 303, described data disappear heavy engine 3 01 for heavily processing that backup file is disappeared, and according to the feature of different application, the file of every kind of application type is carried out to data independently to disappear heavily, described file metadata administration module 302 is for metadata and the piece fingerprint index information of the file of depositing on management node, and block management data module 303 disappears and weighs the rear unduplicated unique data piece of content for management.
As shown in Figure 2, the present invention is towards the large-scale cluster heavy application perception data method for routing that disappears, to disappear and weigh system architecture based on the extendible cluster of one, described extensive back-up storage group system as shown in Figure 1, comprises multiple backup client 100, backup server 200 and multiple heavy storage server 300 that disappears.
The present invention, towards the large-scale cluster heavy application perception data method for routing that disappears, comprises the steps:
S10) obtain backup file metamessage: backup client 100 sends the file backup request message of the file metamessages such as title, user and the size of include file to backup server 200.
S20) perception file applications type: the file metamessage that the application perception module 202 of backup server 200 obtains according to backup session administration module 201 is divided the application type of backup file, and inquire about application references structure, obtain the candidate that can deposit respective type application file heavy storage server 300 node listings that disappear.
Described perception file applications type (S20) step as shown in Figure 3, comprising:
S21) obtain file metamessage: backup server 200 obtains the file metamessage in backup request, comprise the file metamessage such as title 230, user 231 and size 232 of file, file name 230 comprises prefix and suffix, defines application type by suffix; If the prefix of Test.doc is Test, suffix is doc, and corresponding application type is the Word document of doc form.
S22) inquiry application references structure: the application type inquiry application references structure definite according to file name, comprises the index entries such as application type 233, node number 234 and data volume 235;
Wherein, application type 233 is filename suffix that backup file is corresponding, and node number 234 refers to the heavy storage server node number of disappearing of such application file of storage, and data volume 235 refers to the physical data amount of the similar application file being stored on same node.As in application references structure example with doc type matching be the first row and the third line content.
S23) obtain candidate's heavy storage server node number that disappears: from application references structure, find out and deposit the heavy storage server node number of disappearing of same application type file, and result is kept to the candidate heavy storage server node listing 236-LIST that disappears 1in.As shown in Figure 3, discovery node 1 and node 2 are all deposited the application file of doc type.
S30) calculate the heavy memory node load that disappears: the load balance module 204 of backup server 200 is obtained the real-time dynamic load information of heavy storage server 300 nodes that respectively disappear by inquiry application perception index structure, and calculate and can keep the low load of the load balance heavy storage server 300 node listing LIST that disappear according to these node load information and backup file metamessage 2.
Described calculating heavy memory node load (S30) step that disappears comprises:
S31) calculate the physical capacity that the heavy storage server node that disappears has used: the physical capacity C of the heavy storage server node i that disappears i, can be expressed as:
wherein i=1,2 ..., N;
Wherein, N is the heavy storage server cluster server node number that disappears, and K is the application file species number of depositing in node i, C ijfor depositing the corresponding physical capacity of application type j in the heavy storage server node i that disappears obtaining by inquiry application references structure;
S32) search the low load heavy storage server node that disappears: work as C i+ S<T itime, predicate node i is low load node, and node number i is dosed to LIST 2in,
Wherein, T ifor the load threshold of the heavy storage server node i that disappears, the size that S is backup file, LIST 2for disappearing, low load weighs storage server node listing.
S40) selecting file routing node: the file routing decision module 203 of backup server 200 is analyzed the candidate heavy storage server node listing LIST that disappears 1disappear and weigh storage server node listing LIST with low load 2, choose a low load both candidate nodes depositing same type application data as file route target node, and result returned to backup client 100.
As shown in Figure 4, described selecting file routing node (S40) step comprises:
S41) candidate that input the has a same application file heavy storage server node listing LIST that disappears 1disappear and weigh storage server node listing LIST with low load 2;
S42) judge the common factor LIST of these two node listings 1∩ LIST 2be whether empty, go to step in this way S43, as otherwise forward step S46 to;
S43) judge the low load heavy storage server node listing LIST that disappears 2be whether empty, go to step in this way S44, as otherwise go to step S45;
S44) send the heavily warning of storage server cluster load too high that disappears, end process process;
S45) disappear and weigh storage server node listing LIST from low load 2in choose a node;
S46) from the both candidate nodes subset LIST of low load 1∩ LIST 2in choose one and return as destination node.
S50) Transmit message is to destination node: the file routing decision result that backup client 100 is returned according to backup server 200, sends to corresponding route target heavy storage server 300 nodes that disappear by each file in backup session.
S60) file disappears heavily in processing node: heavy storage server 300 nodes that disappear are according to the difference of application file data layout and content, independently dissimilar application file is carried out to data and disappears and heavily process.
The data of heavy storage server node 300 of disappearing disappear heavy engine 3 01 module according to the difference of application file data layout and content, independently dissimilar application file is carried out to the data re-optimization that disappears, and the physical capacity that the heavy rear file storage that disappears is increased is updated in the application references structure of backup server 200 as message feedback.File metadata administration module 302 and block management data module 303 respectively the metadata to the file of depositing on node (comprising piece fingerprint index information) and disappear heavy after the unduplicated unique data piece of content effectively manage.
The present invention optimizes cluster by Application and Development perception and disappears heavily and to process, and using provides a kind of and can take into account that back up memory space is saved and the data route technology of group system extended capability lifting.The present invention can be applied among network backup software, distributed file system and cloud storage system software, easily realizes high efficiency parallel data and disappears heavily and to process.
Certainly; the present invention also can have other various embodiments; in the situation that not deviating from spirit of the present invention and essence thereof; those of ordinary skill in the art are when making according to the present invention various corresponding changes and distortion, but these corresponding changes and distortion all should belong to the protection domain of the appended claim of the present invention.

Claims (6)

1. one kind towards the large-scale cluster heavy application perception data method for routing that disappears, described method is implemented in extensive back-up storage group system, comprise multiple backup client (100), a backup server (200) and multiple heavy storage server (300) that disappears, it is characterized in that, comprise the steps:
S10) obtain backup file metamessage: backup client (100) sends the file backup request message of the file metamessages such as title, user and the size of include file to backup server (200);
S20) perception file applications type: backup server (200) is divided the application type of backup file according to file metamessage, and inquire about application references structure, obtain the candidate that can deposit respective type application file heavy storage server (300) node listing that disappears;
S30) calculate the heavy memory node load that disappears: backup server (200) obtains respectively to disappear by inquiry application perception index structure and weighs the real-time dynamic load information of storage server (300) node, and calculate and can keep the low load of load balance weight storage server (300) node listing that disappears according to these node load information and backup file metamessage;
S40) selecting file routing node: backup server (200) is analyzed candidate's heavy storage server node listing and low load heavy storage server node listing that disappears that disappears, choose a low load both candidate nodes depositing same type application data as file route target node, and result is returned to backup client (100);
S50) Transmit message is to destination node: the file routing decision result that backup client (100) is returned according to backup server (200), sends to corresponding route target heavy storage server (300) node that disappears by each file in backup session;
S60) file disappears heavily in processing node: heavy storage server (300) node that disappears is according to the difference of application file data layout and content, independently dissimilar application file is carried out to data and disappears and heavily process.
2. application perception data method for routing according to claim 1, is characterized in that, described perception file applications type (S20) step comprises:
S21) obtain file metamessage: backup server (200) obtains the file metamessage in backup request, comprise title, user and the size of file, file name comprises prefix and suffix, defines application type by suffix;
S22) inquiry application references structure: the application type inquiry application references structure definite according to file name, application references comprises the index entries such as application type, node number and data volume;
S23) obtain candidate's heavy storage server node number that disappears: from application references structure, find out and deposit the heavy storage server node number of disappearing of same application type file, and result is saved in to candidate's heavy storage server node listing that disappears.
3. application perception data method for routing according to claim 1, is characterized in that, described calculating heavy memory node load (S30) step that disappears comprises:
S31) calculate the physical capacity that the heavy storage server node that disappears has used: the physical capacity C of the heavy storage server node i that disappears ican be expressed as,
Figure FDA0000493014150000021
wherein i=1,2 ..., N;
Wherein, N is the heavy storage server cluster server node number that disappears, and K is the application file species number of depositing in node i, C ijfor depositing the corresponding physical capacity of application type j in the heavy storage server node i that disappears obtaining by inquiry application references structure;
S32) search the low load heavy storage server node that disappears: work as C i+ S<T itime, predicate node i is low load node, node number i is dosed to low load and disappear in heavy storage server node listing;
Wherein, T ifor the load threshold of the heavy storage server node i that disappears, the size that S is backup file.
4. application perception data method for routing according to claim 1, is characterized in that, described selecting file routing node (S40) step comprises:
S41) candidate that input the has a same application file heavy storage server node listing LIST that disappears 1disappear and weigh storage server node listing LIST with low load 2;
S42) judge the common factor LIST1 ∩ LIST of these two node listings 2be whether empty, go to step in this way S43, as otherwise forward step S46 to;
S43) judge the low load heavy storage server node listing LIST that disappears 2be whether empty, go to step in this way S44, as otherwise go to step S45;
S44) send the heavily warning of storage server cluster load too high that disappears, end process process;
S45) disappear and weigh storage server node listing LIST from low load 2in choose a node;
S46) from the both candidate nodes subset LIST of low load 1∩ LIST 2in choose one and return as destination node.
5. one kind for realizing the extensive back-up storage group system of application perception data method for routing claimed in claim 1, comprise multiple backup client (100), a backup server (200) and multiple heavy storage server (300) that disappears, it is characterized in that:
Described backup client (100) is for the file backup request message of the file metamessages such as the title to backup server (200) transmission include file, user and size;
Backup server (200) is for according to the application type of file metamessage perception backup file, and inquires about application references structure, obtains the candidate that can deposit respective type application file heavy storage server (300) the node number list that disappears;
Backup server (200) is for obtain the real-time dynamic load information of heavy storage server (300) node that respectively disappears by inquiry application perception index structure, and calculates and can keep the low load of load balance heavy storage server (300) node listing that disappears according to these node load information and backup file metamessage;
Backup server (200) is for analyzing candidate's heavy storage server node listing and low load heavy storage server node listing that disappears that disappears, choose a low load both candidate nodes depositing same type application data as file route target node, and result is returned to backup client (100);
The file routing decision result that backup client (100) is returned according to backup server (200), sends to corresponding route target heavy storage server (300) node that disappears by each file in backup session;
Disappear heavy storage server (300) node for according to the difference of application file data layout and content, independently dissimilar application file is carried out to data and disappear and heavily process.
6. extensive back-up storage group system according to claim 5, is characterized in that:
Described each backup client (100) comprises file I/O module (101) and backup request module (102), described backup request module (102) is for carrying out file backup session with described backup server (200), the file routing decision result of described file I/O module (101) for returning according to described backup server (200), weighs each file backup storage server (300) to disappearing accordingly;
Described backup server (200) comprises backup session administration module (201), application perception module (202), file routing decision module (203) and load balance module (204), described backup session administration module (201) is for receiving the backup request of backup client (100), file is carried out to grouping management by the identical copy session from same user, and by file routing decision result feedback to backup client (100), described application perception module (202) is for classifying by application type to file, described load balance module (204) weighs the system load balancing of storage server cluster for keeping disappearing, described file routing decision module (203) is for being assigned to the application file of same type the heavy storage server node of disappearing of same low load, and file route target nodal information is fed back to backup client (100), and set up application file to the heavily mapping relations of storage server node that disappear, while recovery for file.
The described heavy storage server (300) that disappears comprises data disappear heavy engine (301), file metadata administration module (302) and block management data module (303), described data disappear heavy engine (301) for heavily processing that backup file is disappeared, and according to the feature of different application, the file of every kind of application type is carried out to data independently to disappear heavily, described file metadata administration module (302) is for metadata and the piece fingerprint index information of the file of depositing on management node, and block management data module (303) disappears and weighs the rear unduplicated unique data piece of content for management.
CN201410158590.0A 2014-04-18 2014-04-18 Application perception data routing method oriented to large-scale cluster deduplication and system Active CN103902735B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410158590.0A CN103902735B (en) 2014-04-18 2014-04-18 Application perception data routing method oriented to large-scale cluster deduplication and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410158590.0A CN103902735B (en) 2014-04-18 2014-04-18 Application perception data routing method oriented to large-scale cluster deduplication and system

Publications (2)

Publication Number Publication Date
CN103902735A true CN103902735A (en) 2014-07-02
CN103902735B CN103902735B (en) 2017-02-22

Family

ID=50994057

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410158590.0A Active CN103902735B (en) 2014-04-18 2014-04-18 Application perception data routing method oriented to large-scale cluster deduplication and system

Country Status (1)

Country Link
CN (1) CN103902735B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104216803A (en) * 2014-09-29 2014-12-17 北京奇艺世纪科技有限公司 Data backup method and device for out-of-service nodes
CN105159925A (en) * 2015-08-04 2015-12-16 北京京东尚科信息技术有限公司 Database cluster data distribution method and system
CN106202134A (en) * 2015-05-30 2016-12-07 中国石油化工股份有限公司 Data redundancy inspection method
CN107666495A (en) * 2016-07-27 2018-02-06 平安科技(深圳)有限公司 The disaster recovery method and terminal of a kind of application
CN108476243A (en) * 2016-01-21 2018-08-31 华为技术有限公司 For the distributed load equalizing of network service function link
CN109214206A (en) * 2018-08-01 2019-01-15 武汉普利商用机器有限公司 cloud backup storage system and method
CN110213319A (en) * 2018-10-08 2019-09-06 腾讯科技(深圳)有限公司 Cut-in method and device, terminal, server and storage medium
CN111400105A (en) * 2020-03-27 2020-07-10 北京拓世寰宇网络技术有限公司 Database backup method and device
CN111858494A (en) * 2020-07-23 2020-10-30 珠海豹趣科技有限公司 File acquisition method and device, storage medium and electronic equipment
CN112685223A (en) * 2019-10-17 2021-04-20 伊姆西Ip控股有限责任公司 File type based file backup
CN113590535A (en) * 2021-09-30 2021-11-02 中国人民解放军国防科技大学 Efficient data migration method and device for deduplication storage system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6065018A (en) * 1998-03-04 2000-05-16 International Business Machines Corporation Synchronizing recovery log having time stamp to a remote site for disaster recovery of a primary database having related hierarchial and relational databases
CN101751394A (en) * 2008-12-16 2010-06-23 青岛海信传媒网络技术有限公司 Method and system for synchronizing data

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6065018A (en) * 1998-03-04 2000-05-16 International Business Machines Corporation Synchronizing recovery log having time stamp to a remote site for disaster recovery of a primary database having related hierarchial and relational databases
CN101751394A (en) * 2008-12-16 2010-06-23 青岛海信传媒网络技术有限公司 Method and system for synchronizing data

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104216803A (en) * 2014-09-29 2014-12-17 北京奇艺世纪科技有限公司 Data backup method and device for out-of-service nodes
CN106202134A (en) * 2015-05-30 2016-12-07 中国石油化工股份有限公司 Data redundancy inspection method
CN105159925A (en) * 2015-08-04 2015-12-16 北京京东尚科信息技术有限公司 Database cluster data distribution method and system
CN108476243A (en) * 2016-01-21 2018-08-31 华为技术有限公司 For the distributed load equalizing of network service function link
CN107666495A (en) * 2016-07-27 2018-02-06 平安科技(深圳)有限公司 The disaster recovery method and terminal of a kind of application
CN109214206A (en) * 2018-08-01 2019-01-15 武汉普利商用机器有限公司 cloud backup storage system and method
CN110213319A (en) * 2018-10-08 2019-09-06 腾讯科技(深圳)有限公司 Cut-in method and device, terminal, server and storage medium
CN112685223A (en) * 2019-10-17 2021-04-20 伊姆西Ip控股有限责任公司 File type based file backup
CN111400105A (en) * 2020-03-27 2020-07-10 北京拓世寰宇网络技术有限公司 Database backup method and device
CN111858494A (en) * 2020-07-23 2020-10-30 珠海豹趣科技有限公司 File acquisition method and device, storage medium and electronic equipment
CN113590535A (en) * 2021-09-30 2021-11-02 中国人民解放军国防科技大学 Efficient data migration method and device for deduplication storage system
CN113590535B (en) * 2021-09-30 2021-12-17 中国人民解放军国防科技大学 Efficient data migration method and device for deduplication storage system

Also Published As

Publication number Publication date
CN103902735B (en) 2017-02-22

Similar Documents

Publication Publication Date Title
CN103902735A (en) Application perception data routing method oriented to large-scale cluster deduplication and system
JP6094487B2 (en) Information system, management apparatus, data processing method, data structure, program, and recording medium
CN106066896B (en) Application-aware big data deduplication storage system and method
US10257255B2 (en) Hierarchical organization for scale-out cluster
CN104820717B (en) A kind of storage of mass small documents and management method and system
US20120143873A1 (en) Method and apparatus for updating a partitioned index
US20090157666A1 (en) Method for improving search engine efficiency
JP6135509B2 (en) Information system, management method and program thereof, data processing method and program, and data structure
CN102664914A (en) IS/DFS-Image distributed file storage query system
Von der Weth et al. Multiterm keyword search in NoSQL systems
CN105005611A (en) File management system and file management method
CN101753405A (en) Cluster server memory management method and system
Li et al. Efficient subspace skyline query based on user preference using MapReduce
Kumar et al. M-Grid: a distributed framework for multidimensional indexing and querying of location based data
CN105282045B (en) A kind of distributed computing and storage method based on consistency hash algorithm
Xu et al. Adaptive and scalable load balancing for metadata server cluster in cloud-scale file systems
Zhao et al. Research of P2P architecture based on cloud computing
Liu et al. Keyword fusion to support efficient keyword-based search in peer-to-peer file sharing
CN101719155A (en) Method of multidimensional attribute range inquiry for supporting distributed multi-cluster computing environment
CN106293537B (en) A kind of autonomous block management method of the data-intensive file system of lightweight
CN114741467A (en) Full-text retrieval method and system
Li et al. A PR-quadtree based multi-dimensional indexing for complex query in a cloud system
Zhou et al. Distributed query processing in an ad-hoc semantic web data sharing system
Zhu et al. DS-index: a distributed search solution for federated cloud
Gao et al. An efficient and scalable multi-dimensional indexing scheme for modular data centers

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant