CN103473278A - Repeating data processing technology - Google Patents
Repeating data processing technology Download PDFInfo
- Publication number
- CN103473278A CN103473278A CN2013103789166A CN201310378916A CN103473278A CN 103473278 A CN103473278 A CN 103473278A CN 2013103789166 A CN2013103789166 A CN 2013103789166A CN 201310378916 A CN201310378916 A CN 201310378916A CN 103473278 A CN103473278 A CN 103473278A
- Authority
- CN
- China
- Prior art keywords
- data
- file
- fingerprint
- data blocks
- block
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses repeating data processing technology which includes two methods including static file segmenting and dynamic file segmenting. Static file segmenting refers to segmenting files according to a fixed size, and dynamic file segmenting includes the following steps: looking up border positions of data blocks according to a certain algorithm; solving data fingerprints; using the data fingerprints to judge whether two data blocks are same or not; storing the same data blocks into one portion, and storing index values of the same data blocks for the convenience of being used during recovery. By adopting the technical scheme, needs of data on storage capacity can be reduced; on the basis of in-depth study on storage capacity optimization technology in disaster recovery backup, a certain technological improvement on repeating data deleting technology is made, and high-quality storage is realized.
Description
Technical field
The present invention relates to warning system, be specifically related to a kind of repeating data treatment technology.
Background technology
Current enterprise is to the storage demand of information just in growth by leaps and bounds, and the collection of information has become one of gordian technique factor that determines enterprise's survival and development with processing.Meanwhile, the reliability of the data in infosystem and security also have been subject to increasing attention, and wherein data disaster tolerance system is exactly a kind of effective technology means that ensure data security.Particularly the September 11th attacks and Southeast Asia tsunami, and the southern snow disaster and the Wenchuan earthquake that occur in not long ago China, these catastrophic event make enterprise that a common main line be arranged, and that is exactly to set up the long-distance disaster system to guarantee the continuity of business.Disaster tolerance system be according to current technology trends and guarantee data security and business continuance propose.Because the problem the most intuitively that burgeoning data volume is brought to disaster recovery and backup systems is memory space inadequate, brought immense pressure also to processing power, the data transfer bandwidth of system simultaneously, so, in order to ensure that disaster tolerance system moves efficiently and stably, need to set up a memory capacity Optimization Mechanism and reduce the demand of data to memory capacity.On the basis of memory capacity optimisation technique, data de-duplication technology has been carried out to certain technological improvement in the further investigation disaster-tolerant backup, realized high-quality storage.
Summary of the invention
The object of the invention is to overcome the problem that prior art exists, a kind of repeating data treatment technology is provided.
For realizing above-mentioned technical purpose, reach above-mentioned technique effect, the present invention is achieved through the following technical solutions:
A kind of repeating data treatment technology comprises that two kinds of methods are respectively: static cutting file and dynamic cutting file, and the cutting file of described static state is that file is carried out to cutting according to fixed size, described dynamic cutting file comprises the following steps:
Step 1) is searched the boundary position of data block according to certain algorithm;
Step 2) solve data fingerprint, after soon File cutting becomes a plurality of little modules, need to calculate data fingerprint to each small data piece;
Step 3) judges that with data fingerprint whether two data blocks are identical; Search data block, due to the data block One's name is legion, adopting the HASH lookup method based on functional form, can effectively shorten the time of searching;
Step 4) is a by identical block storage, and the index value of storage identical block, so that used while recovering.
Further, the computational data piece fingerprint in described dynamic cutting file has adopted weak proof test value and SHA1 algorithm to carry out computational data piece fingerprint.
Further, described weak proof test value is the cyclic redundancy value of calculating each data block, described algorithm is fairly simple, when the cyclic redundancy value is different, can judge this two data block differences, when the cyclic redundancy value is identical, can not judge that whether these two data blocks are identical, we need to calculate with described SHA1 algorithm the value of these two data blocks, when two data blocks are identical, after SHA1 calculates, resulting 160 place values are identical, otherwise different.
Beneficial effect of the present invention:
Technical solution of the present invention, can reduce the demand of data to memory capacity, on the basis of memory capacity optimisation technique, data de-duplication technology carried out to certain technological improvement in the further investigation disaster-tolerant backup simultaneously, realized high-quality storage.
The accompanying drawing explanation
Fig. 1 be of the present invention data-optimized before and data-optimized after comparison diagram;
Fig. 2 is the specific implementation of the present invention schematic diagram.
Embodiment
Below with reference to the accompanying drawings and in conjunction with the embodiments, describe the present invention in detail.
Shown in Fig. 2, a kind of repeating data treatment technology, comprise that two kinds of methods are respectively: static cutting file and dynamic cutting file, the cutting file of described static state is that file is carried out to cutting according to fixed size, described dynamic cutting file comprises the following steps:
Step 1) is searched the boundary position of data block according to certain algorithm;
Step 2) solve data fingerprint, after soon File cutting becomes a plurality of little modules, need to calculate data fingerprint to each small data piece;
Step 3) judges that with data fingerprint whether two data blocks are identical; Search data block, due to the data block One's name is legion, adopting the HASH lookup method based on functional form, can effectively shorten the time of searching;
Step 4) is a by identical block storage, and the index value of storage identical block, so that used while recovering.
Further, the computational data piece fingerprint in described dynamic cutting file has adopted weak proof test value and SHA1 algorithm to carry out computational data piece fingerprint.
Further, described weak proof test value is the cyclic redundancy value of calculating each data block, described algorithm is fairly simple, when the cyclic redundancy value is different, can judge this two data block differences, when the cyclic redundancy value is identical, can not judge that whether these two data blocks are identical, we need to calculate with described SHA1 algorithm the value of these two data blocks, when two data blocks are identical, after SHA1 calculates, resulting 160 place values are identical, otherwise different.
Principle of the present invention:
A File cutting is become to a plurality of small data segments, utilize certain algorithm to calculate the data fingerprint of these small data pieces, illustrate that these two data block contents are identical if data fingerprint is identical, otherwise the content of two small data pieces is just different, in storage, we only need the portion of storage identical block, and the piece of storage is called meta data block, in order to revert to raw data, we also need to store the index value of identical block in former data.
The foregoing is only the preferred embodiments of the present invention, be not limited to the present invention, for a person skilled in the art, the present invention can have various modifications and variations.Within the spirit and principles in the present invention all, any modification of doing, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.
Claims (3)
1. a repeating data treatment technology, it is characterized in that, comprise that two kinds of methods are respectively: static cutting file and dynamic cutting file, the cutting file of described static state is that file is carried out to cutting according to fixed size, described dynamic cutting file comprises the following steps:
Step 1) is searched the boundary position of data block according to certain algorithm;
Step 2) solve data fingerprint, after soon File cutting becomes a plurality of little modules, need to calculate data fingerprint to each small data piece;
Step 3) judges that with data fingerprint whether two data blocks are identical; Search data block, due to the data block One's name is legion, adopting the HASH lookup method based on functional form, can effectively shorten the time of searching;
Step 4) is a by identical block storage, and the index value of storage identical block, so that used while recovering.
2. repeating data treatment technology according to claim 1, is characterized in that, the computational data piece fingerprint in described dynamic cutting file has adopted weak proof test value and SHA1 algorithm to carry out computational data piece fingerprint.
3. repeating data treatment technology according to claim 2, it is characterized in that, described weak proof test value is the cyclic redundancy value of calculating each data block, described algorithm is fairly simple, when the cyclic redundancy value is different, can judge this two data block differences, when the cyclic redundancy value is identical, can not judge that whether these two data blocks are identical, we need to calculate with described SHA1 algorithm the value of these two data blocks, when two data blocks are identical, after SHA1 calculates, resulting 160 place values are identical, otherwise different.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2013103789166A CN103473278A (en) | 2013-08-28 | 2013-08-28 | Repeating data processing technology |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2013103789166A CN103473278A (en) | 2013-08-28 | 2013-08-28 | Repeating data processing technology |
Publications (1)
Publication Number | Publication Date |
---|---|
CN103473278A true CN103473278A (en) | 2013-12-25 |
Family
ID=49798126
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2013103789166A Pending CN103473278A (en) | 2013-08-28 | 2013-08-28 | Repeating data processing technology |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103473278A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103955530A (en) * | 2014-05-12 | 2014-07-30 | 暨南大学 | Data reconstruction and optimization method of on-line repeating data deletion system |
CN104317823A (en) * | 2014-09-30 | 2015-01-28 | 北京合力思腾科技股份有限公司 | Method for carrying out data detection by utilizing data fingerprints |
CN104408154A (en) * | 2014-12-04 | 2015-03-11 | 华为技术有限公司 | Repeated data deletion method and device |
CN104407928A (en) * | 2014-11-18 | 2015-03-11 | 杭州华为企业通信技术有限公司 | Data transmission method and device |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20030083292A (en) * | 2002-04-20 | 2003-10-30 | 주식회사 퓨쳐시스템 | Apparatus and method for providing a cipher accelerator using a hash function |
CN101706825A (en) * | 2009-12-10 | 2010-05-12 | 华中科技大学 | Replicated data deleting method based on file content types |
CN101916171A (en) * | 2010-07-16 | 2010-12-15 | 中国科学院计算技术研究所 | Concurrent hierarchy type replicated data eliminating method and system |
CN101989929A (en) * | 2010-11-17 | 2011-03-23 | 中兴通讯股份有限公司 | Disaster recovery data backup method and system |
US20120005144A1 (en) * | 2010-06-30 | 2012-01-05 | Alcatel-Lucent Canada, Inc. | Optimization of rule entities |
-
2013
- 2013-08-28 CN CN2013103789166A patent/CN103473278A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20030083292A (en) * | 2002-04-20 | 2003-10-30 | 주식회사 퓨쳐시스템 | Apparatus and method for providing a cipher accelerator using a hash function |
CN101706825A (en) * | 2009-12-10 | 2010-05-12 | 华中科技大学 | Replicated data deleting method based on file content types |
US20120005144A1 (en) * | 2010-06-30 | 2012-01-05 | Alcatel-Lucent Canada, Inc. | Optimization of rule entities |
CN101916171A (en) * | 2010-07-16 | 2010-12-15 | 中国科学院计算技术研究所 | Concurrent hierarchy type replicated data eliminating method and system |
CN101989929A (en) * | 2010-11-17 | 2011-03-23 | 中兴通讯股份有限公司 | Disaster recovery data backup method and system |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103955530A (en) * | 2014-05-12 | 2014-07-30 | 暨南大学 | Data reconstruction and optimization method of on-line repeating data deletion system |
CN103955530B (en) * | 2014-05-12 | 2017-02-22 | 暨南大学 | Data reconstruction and optimization method of on-line repeating data deletion system |
CN104317823A (en) * | 2014-09-30 | 2015-01-28 | 北京合力思腾科技股份有限公司 | Method for carrying out data detection by utilizing data fingerprints |
CN104317823B (en) * | 2014-09-30 | 2016-03-16 | 北京艾秀信安科技有限公司 | A kind of method utilizing data fingerprint to carry out Data Detection |
CN104407928A (en) * | 2014-11-18 | 2015-03-11 | 杭州华为企业通信技术有限公司 | Data transmission method and device |
CN104408154A (en) * | 2014-12-04 | 2015-03-11 | 华为技术有限公司 | Repeated data deletion method and device |
CN104408154B (en) * | 2014-12-04 | 2018-05-29 | 华为技术有限公司 | Data de-duplication method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20200117385A1 (en) | System and method for reference tracking garbage collector | |
US10162552B2 (en) | System and method for quasi-compacting garbage collection | |
US8898120B1 (en) | Systems and methods for distributed data deduplication | |
EP3519965B1 (en) | Systems and methods for healing images in deduplication storage | |
CN103095843B (en) | A kind of data back up method and client based on version vector | |
CN104077380B (en) | A kind of data de-duplication method, apparatus and system | |
US20150293817A1 (en) | Secure Relational File System With Version Control, Deduplication, And Error Correction | |
CN102722583A (en) | Hardware accelerating device for data de-duplication and method | |
US9785643B1 (en) | Systems and methods for reclaiming storage space in deduplicating data systems | |
CN101989929A (en) | Disaster recovery data backup method and system | |
CN101968796B (en) | Method for segmenting bidirectionally and concurrently executed file level variable-length data | |
CN106611035A (en) | Retrieval algorithm for deleting repetitive data in cloud storage | |
CN103473278A (en) | Repeating data processing technology | |
US10409497B2 (en) | Systems and methods for increasing restore speeds of backups stored in deduplicated storage systems | |
CN106469152A (en) | A kind of document handling method based on ETL and system | |
CN104317676A (en) | Data backup disaster tolerance method | |
CN103617260A (en) | Index generation method and device for repeated data deletion | |
CN104461773A (en) | Backup deduplication method of virtual machine | |
CN104965835B (en) | A kind of file read/write method and device of distributed file system | |
CN105917304A (en) | Apparatus and method for de-duplication of data | |
CN105095027A (en) | Data backup method and apparatus | |
RU2016124319A (en) | METHOD AND DEVICE FOR RESTORING DEDUPLICATED DATA | |
CN103403709B (en) | A kind of methods, devices and systems of reading and writing data | |
CN104486387A (en) | Data synchronization processing method and system | |
CN103176867A (en) | Fast file differential backup method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C02 | Deemed withdrawal of patent application after publication (patent law 2001) | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20131225 |