CN103473278A

CN103473278A - Repeating data processing technology

Info

Publication number: CN103473278A
Application number: CN2013103789166A
Authority: CN
Inventors: 曹峰
Original assignee: SUZHOU TIANYONGBEI NETWORK TECHNOLOGY Co Ltd
Current assignee: SUZHOU TIANYONGBEI NETWORK TECHNOLOGY Co Ltd
Priority date: 2013-08-28
Filing date: 2013-08-28
Publication date: 2013-12-25

Abstract

The invention discloses repeating data processing technology which includes two methods including static file segmenting and dynamic file segmenting. Static file segmenting refers to segmenting files according to a fixed size, and dynamic file segmenting includes the following steps: looking up border positions of data blocks according to a certain algorithm; solving data fingerprints; using the data fingerprints to judge whether two data blocks are same or not; storing the same data blocks into one portion, and storing index values of the same data blocks for the convenience of being used during recovery. By adopting the technical scheme, needs of data on storage capacity can be reduced; on the basis of in-depth study on storage capacity optimization technology in disaster recovery backup, a certain technological improvement on repeating data deleting technology is made, and high-quality storage is realized.

Description

A kind of repeating data treatment technology

Technical field

The present invention relates to warning system, be specifically related to a kind of repeating data treatment technology.

Background technology

Current enterprise is to the storage demand of information just in growth by leaps and bounds, and the collection of information has become one of gordian technique factor that determines enterprise's survival and development with processing.Meanwhile, the reliability of the data in infosystem and security also have been subject to increasing attention, and wherein data disaster tolerance system is exactly a kind of effective technology means that ensure data security.Particularly the September 11th attacks and Southeast Asia tsunami, and the southern snow disaster and the Wenchuan earthquake that occur in not long ago China, these catastrophic event make enterprise that a common main line be arranged, and that is exactly to set up the long-distance disaster system to guarantee the continuity of business.Disaster tolerance system be according to current technology trends and guarantee data security and business continuance propose.Because the problem the most intuitively that burgeoning data volume is brought to disaster recovery and backup systems is memory space inadequate, brought immense pressure also to processing power, the data transfer bandwidth of system simultaneously, so, in order to ensure that disaster tolerance system moves efficiently and stably, need to set up a memory capacity Optimization Mechanism and reduce the demand of data to memory capacity.On the basis of memory capacity optimisation technique, data de-duplication technology has been carried out to certain technological improvement in the further investigation disaster-tolerant backup, realized high-quality storage.

Summary of the invention

The object of the invention is to overcome the problem that prior art exists, a kind of repeating data treatment technology is provided.

For realizing above-mentioned technical purpose, reach above-mentioned technique effect, the present invention is achieved through the following technical solutions:

A kind of repeating data treatment technology comprises that two kinds of methods are respectively: static cutting file and dynamic cutting file, and the cutting file of described static state is that file is carried out to cutting according to fixed size, described dynamic cutting file comprises the following steps:

Step 1) is searched the boundary position of data block according to certain algorithm;

Step 2) solve data fingerprint, after soon File cutting becomes a plurality of little modules, need to calculate data fingerprint to each small data piece;

Step 3) judges that with data fingerprint whether two data blocks are identical; Search data block, due to the data block One's name is legion, adopting the HASH lookup method based on functional form, can effectively shorten the time of searching;

Step 4) is a by identical block storage, and the index value of storage identical block, so that used while recovering.

Further, the computational data piece fingerprint in described dynamic cutting file has adopted weak proof test value and SHA1 algorithm to carry out computational data piece fingerprint.

Further, described weak proof test value is the cyclic redundancy value of calculating each data block, described algorithm is fairly simple, when the cyclic redundancy value is different, can judge this two data block differences, when the cyclic redundancy value is identical, can not judge that whether these two data blocks are identical, we need to calculate with described SHA1 algorithm the value of these two data blocks, when two data blocks are identical, after SHA1 calculates, resulting 160 place values are identical, otherwise different.

Beneficial effect of the present invention:

Technical solution of the present invention, can reduce the demand of data to memory capacity, on the basis of memory capacity optimisation technique, data de-duplication technology carried out to certain technological improvement in the further investigation disaster-tolerant backup simultaneously, realized high-quality storage.

The accompanying drawing explanation

Fig. 1 be of the present invention data-optimized before and data-optimized after comparison diagram;

Fig. 2 is the specific implementation of the present invention schematic diagram.

Embodiment

Below with reference to the accompanying drawings and in conjunction with the embodiments, describe the present invention in detail.

Shown in Fig. 2, a kind of repeating data treatment technology, comprise that two kinds of methods are respectively: static cutting file and dynamic cutting file, the cutting file of described static state is that file is carried out to cutting according to fixed size, described dynamic cutting file comprises the following steps:

Principle of the present invention:

A File cutting is become to a plurality of small data segments, utilize certain algorithm to calculate the data fingerprint of these small data pieces, illustrate that these two data block contents are identical if data fingerprint is identical, otherwise the content of two small data pieces is just different, in storage, we only need the portion of storage identical block, and the piece of storage is called meta data block, in order to revert to raw data, we also need to store the index value of identical block in former data.

The foregoing is only the preferred embodiments of the present invention, be not limited to the present invention, for a person skilled in the art, the present invention can have various modifications and variations.Within the spirit and principles in the present invention all, any modification of doing, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.

Claims

1. a repeating data treatment technology, it is characterized in that, comprise that two kinds of methods are respectively: static cutting file and dynamic cutting file, the cutting file of described static state is that file is carried out to cutting according to fixed size, described dynamic cutting file comprises the following steps:

2. repeating data treatment technology according to claim 1, is characterized in that, the computational data piece fingerprint in described dynamic cutting file has adopted weak proof test value and SHA1 algorithm to carry out computational data piece fingerprint.

3. repeating data treatment technology according to claim 2, it is characterized in that, described weak proof test value is the cyclic redundancy value of calculating each data block, described algorithm is fairly simple, when the cyclic redundancy value is different, can judge this two data block differences, when the cyclic redundancy value is identical, can not judge that whether these two data blocks are identical, we need to calculate with described SHA1 algorithm the value of these two data blocks, when two data blocks are identical, after SHA1 calculates, resulting 160 place values are identical, otherwise different.