CN102156703A

CN102156703A - Low-power consumption high-performance repeating data deleting system

Info

Publication number: CN102156703A
Application number: CN2011100247439A
Authority: CN
Inventors: 刘晓光; 王刚; 赵彬; 马井玮
Original assignee: Nankai University
Current assignee: Nankai University
Priority date: 2011-01-24
Filing date: 2011-01-24
Publication date: 2011-08-17

Abstract

The invention provides a low-power consumption high-performance repeating data deleting system, comprising a production center, a computation center and a backup center, wherein the production center is used for copying user request data and sending to the computation center; the computation center is used for deleting the repeating data and sending the non-repeating data to the backup center; and the backup center is used for storing the received data. The computation center uses a Very Innovative Architecture (VIA) processor to reduce the operation power consumption of the system. The performance of the system is improved via the following policies: (1) a special assembler command of the coprocessor module provided by the VIA processor is used for summary calculation and data encryption to improve the system performance via the hardware; (2) the computation center uses a Graphics Processing Unit (GPU) to quicken the data compressing procedure in the repeating data deleting system and the computation process of Bloomfilter; and the concurrent processing of the GPU is used for improving the operation efficiency of the system; and (3) the system performance is further improved by using two flow line mechanisms.

Description

A kind of high-performance data de-duplication system of low-power consumption

Technical field

The present invention relates to the data de-duplication field, specially refer to the optimization problem of energy consumption and calculated performance in the data de-duplication system.

Background technology

Along with development of computer; informationization makes that the importance of data protection is more and more higher; scheduled store data and unload database are that people are used for one of main means of protected data at first; this mode has been protected user's data safety to a certain extent; development along with the time; very fast people find that other data protection of this level has been difficult to satisfy existing demand; still the danger that exists lot of data to lose between twice backup; so; continuous data protection technology (continuous data protection; CDP) arise at the historic moment; as its name suggests; continuous data protection can provide more fine-grained data protection; even can back up and protect each write request, certainly, fine-grained data protection like this need be paid more storage cost; along with the continuous increase of data volume, the storage cost that Backup Data produces also more and more allows the people be difficult to accept.Obtain from ESG (Enterprise Strategy Group) statistical information, the data total amount that needs protection needs the data total amount of storage to reach the perabyte order of magnitude with the speed increment in every year 60% at present.In the face of huge data volume like this, the storage overhead that traditional backup method causes will be very huge.Yet the Backup Data major part that traditional redundancy technique, particularly full backup, incremental backup produce all is repetition, need not the repeating data that has backed up be backed up again.Therefore, data de-duplication technology is introduced into.At present, be a focus based on the continuous data protection technology of data de-duplication technology, by being discarded in the repeating data that produces in the middle of the data backup process, can effectively reduce the expense of storage system.Data Domain Deduplication File System (DDFS) for example carries out data de-duplication work to the Backup Data that produced in month, finally can reach the compressibility of 38.54:1, thereby picture has reduced storage overhead expecting greatly.

Consult Fig. 1, showed the data deletion process of general data de-duplication system, concrete step is as described below:

Step S101. receives the data block to be backed up that backup center arrives.

Step S102. calculates the summary of the data block that receives.

Step S103. judges relatively by summary whether the data block receive repeats, and filters out the data of repetition and to all data creation corresponding metadata.

Heavy data and metadata are gone in step S104. compression.

Data after the step S105. ciphered compressed.

The data that step S106. will encrypt send to backup center by network and back up.

In above-mentioned steps, going of data block heavily is that summary by the comparing data piece is realized, this algorithm that just requires to calculate summary must have very low collision rate, and the digest algorithm that adopts in the existing data de-duplication system generally is MD5, SHA-1 and SHA-256.In the algorithm of relatively making a summary, generally to tentatively judge with the Bloomfilter algorithm earlier, and then by relatively doing further judgement based on the index of disk.Usually,, can in internal memory, preserve a part of index, reduce the read-write operation of disk with this as cache in order to accelerate the comparison of disk index, thus speed-up ratio process.

Consult Fig. 2, the comparison flow process that the repeating data piece is removed by now general data de-duplication system is shown, concrete steps are as described below:

Step S201. computational data piece summary.

Whether step S202. calculates Bloomfilter, judge the set of corresponding position, if be not set, then this data block is unique, creates metadata; Otherwise, need further judge.

Step S203. compares new summary and the index that is stored among the cache, if having identically, then data block repeats, and the data that record repeats are in the position of backup center storage, and creates metadata; Otherwise, need and disk on index compare again, if also do not have identically, then data block is unique, creates metadata, otherwise, the data block repetition.

If step S204. data block repeats, read the index of follow-up several sectors of disk position of its index record, upgrade cache, utilize the principle of locality of data to reduce magnetic disc access times.

In the past in the research based on data de-duplication technology, people are placed on research emphasis how to reduce relatively the going up of index (because the access time of traditional magnetic disk is a link the most consuming time in the above flow process) usually, and nobody pays close attention to a large amount of computational problem that exists in the data de-duplication process.As time goes on, the application of the raising of the transmission speed of network and high-capacity and high-speed storer (for example use of solid state hard disc), the factor (Network Transmission and disk be access speed etc. immediately) that restricts the data de-duplication system throughput at present will no longer become problem in the near future; And meanwhile, the calculation task of a large amount of complexity in the data de-duplication process will become new system bottleneck for the calculating pressure that brings of CPU.

Summary of the invention

The objective of the invention is the problem of low high power consumption in, a kind of high-performance data de-duplication system based on low power processor is provided for the performance that solves existing data de-duplication system.What this system can well improve the data de-duplication system is energy, and the use of the processor of low-power consumption has simultaneously reduced the energy consumption of total system greatly.

How the present invention studies by the distribution of computation tasks (for example Weisheng processor VIA processor coprocessor and graphic process unit GPU) to general coprocessor of complexity is come the speed-up computation process, improves the throughput of system with this; Simultaneously, utilize the advantage of low-power consumption coprocessor aspect energy consumption will reduce the power consumption of total system, finally realize the data de-duplication system of the high-throughput of a low-power consumption.

The high performance data de-duplication of low-power consumption provided by the invention system, form (referring to Fig. 9) by following three parts: the production center, computing center and backup center (also claiming disaster recovery center), three parts are deployed in respectively on three nodes (computing machine).Wherein, the production center and backup center are deployed in the Intel Xeon node of two monokaryon 2.66GHZ respectively, and every machine has internal memory and hardware raid-0 who is made of 6 disks of 4GB.Computing node is deployed on the VIA nano platform of 1.6GHZ, and it has the internal memory of 2GHZ and the raid-0 that is made of two disks.

The production center produces wants backed up data, and the present invention realizes data protection at the block device layer, in block device layer intercepting and capturing user's write request, duplicates the data of write request and these data are sent to the interior computing center's node of same LAN (Local Area Network).

Computing center receives the data processing request of production center node, carries out data repeatability and judges, the data that deletion repeats are also encrypted unduplicated data, and the not repeating data after will encrypting then sends to backup center and does permanent storage; Backup center is responsible for receiving metadata that computing center sends over and unduplicated data and they is stored with rational layout type.

For the compressibility that improves data to save storage time and to reduce the time that the computational data summary needs, at first data are compressed processing; In order to reduce data size relatively, we calculate its summary to each pending data block to be processed, with the comparison that relatively is converted to the data block summary of data block contents, can reduce the working time of the time raising system of comparison like this; Use the Bloomfilter technology further to reduce disk number of times relatively; For the security of protected data, unduplicated data are done cryptographic operation send to the far-end backup center then at last.

We can see that there is a large amount of calculation tasks in computing center, and complicated a large amount of calculating meetings move the performance of bringing very big pressure and having a strong impact on system to system.In order to quicken these computation processes to improve the operational efficiency of system, computing center of the present invention adopts following method speed-up computation process and reduces power consumption:

1st, use the calculating of coprocessor accelerating system

1.1st, use the association of Weisheng processor VIA to handle acceleration digest calculations and data encryption process;

1.2nd, the use figure quickens the compression process in the video card GPU accelerating system;

1.3rd, use the data summarization in the GPU accelerating system to calculate;

1.4th, use GPU to quicken the Bloomfilter process;

1.5th, use GPU to handle whole data and tentatively look into heavy process;

2nd, the streamline mechanism of multi-threaded parallel processing

2.1st, multithreading streamline mechanism;

2.2nd, compress streamline the preceding;

2.3rd, the streamline after being compressed in;

2.4th, use GPU to make the streamline of main calculating.

Optionally, use the association of Weisheng processor VIA to handle the acceleration digest calculations, be specially: quicken SHA-1 data summarization computation process with the coprocessor Padlock that carries on the platform of Weisheng, Padlock provides one of accelerating engine PHE, can use this accelerating engine to quicken digest calculations (SHA1) computation process by the special assembly instruction of VIA processor, encapsulate these assembly language directives by the C language and become two unified C function call: VIA_SHA1, VIA_SHA1 uses the inline assembly characteristics of C language to encapsulate the assembly language DLL (dynamic link library) that PadLock provides, and calls computation process that top two C function calls make a summary and reaches the purpose of speed-up computation with this.

Preferably, use the association of Weisheng processor VIA to handle and data encryption process, be specially: quicken AES data encryption computation process with the coprocessor Padlock that carries on the platform of Weisheng, Padlock provides hardware acceleration engine ACE and ACE2 to quicken the computations process.Can call the accelerating engine of PadLock and quicken computations by calling assembly instruction special on the processor, encapsulate these assembly language directives by the C language and become unified C function call VIA_AES, VIA_AES uses the inline assembly characteristics of C language to encapsulate the assembly language DLL (dynamic link library) that PadLock provides, and calls computation process that packaged program encrypts and reaches the purpose of speed-up computation with this.

Preferably, the compression process in the use GPU accelerating system.Be specially: utilize this programming model of unified calculation framework CUDA of GPU to be written as a compression calculating storehouse CompressionGPU; CompressionGPU comprises following function interface: ComGPUInit, ComGPUKernel, CompressionGPUDestroy; Call ComGPUInit and be responsible for the internal memory application work that GPU holds, ComGPUKernel obtains data from the global data buffer zone, call and be transferred in the video memory that calculates video card, calculate video card and open up abundant execution thread, each thread compresses processing to one of them pending data block, each thread is according to the reference position of the definite data that will compress of Thread Id of oneself, and read the data block length that will compress (data are fixed through reading after the thread process each data block length of data) from reference position, use the lzjb compression algorithm that these data are compressed processing then.ComGPUDestroy is responsible for the destruction work of the internal memory of ComGPUInit application.The concurrent processing of all like this data blocks can reduce the needed time of compression, improves the performance of system.

Optionally, use the data summarization in the GPU accelerating system to calculate.Digest calculations is typical data-intensive calculating, and the single instruction multiple data computation model of GPU is fit to handle this calculating.Be specially: utilize this programming model of CUDA to be packaged into a digest calculations storehouse SHA1GPU; SHA1GPU comprises following function interface: SHA1GPU and comprises following function interface: SHA1GPUInit, SHA1GPUKernel, SHA1GPUDestroy; SHA1GPUInit is responsible for the Memory Allocation work of GPU end, the copying data that SHA1GPUKernel is responsible for compression is produced is in the GPU video memory, GPU opens abundant execution thread, and each thread is responsible for calculating the summary of a pending data block, and the result is turned back to CPU; SHA1GPUDestroy is responsible for destroying the internal memory of SHA1GPUInit application.

Preferably, use GPU to quicken the Bloomfilter process.It is typical data-intensive calculating that Bloomfilter calculates, and the single instruction multiple data computation model of GPU is fit to handle this calculating.Be specially: at first we utilize CUDA programming model encapsulation Bloomfilter computation process to become a storehouse BloomfilterGPU; BloomfilterGPU comprises following function interface: BFGPUInit, BFGPUKernel, BFGPUDestroy; BFGPUInit opens up a big buffer zone as the summary vector of bloomfilter calculating in the video memory of GPU, the Bloomfilter computational threads is handled for the result of digest calculations, call BFGPUKernel then and open up abundant execution thread at the calculating video card, each thread carries out Bloomfilter to one of them pending data block and calculates, the concurrent execution of all threads, each determines to do the data reference position that Bloomfilter calculates according to the Thread Id of oneself, read the data (result of digest calculations, length are 160) of regular length then and do Bloomfilter calculating.

Preferably, the pipeline processes of system, the process of whole data de-duplication can be divided into following step: Data Receiving, data compression, data summarization calculates, Bloomfilter calculates and data encryption is calculated, the transmission of Backup Data, each step is finished a Processing tasks of data, these tasks can be carried out on different hardware, and data dependence relation each other is very weak, and different calculation tasks can be carried out concurrently, in conjunction with the corresponding optimized Algorithm of each calculation task, the concurrency that adopts streamline to improve program can further improve the performance of system.

Optionally, the preferential pipeline processes mechanism of data compression.In the process of carrying out data de-duplication, data stream is after main thread reception and pre-service, at first flow into the data compression thread, flow through successively again digest calculations thread, BloomFilter thread and data encryption thread, at last through the data processing thread to metadata with go the data of weight to write disk and back up.Compress preferential processing procedure and can improve the time of reducing the needed data volume minimizing of digest calculations digest calculations.The system that the time of calculating for compression is less than the time of time of digest calculations and computations adopts this kind streamline can reduce the working time of total system.

Optionally, the streamline after being compressed in.Data stream is after main thread reception and pre-service, at first flow into the data summarization computational threads, flow into BloomFilter thread, data compression and data encryption thread more successively, at last through the data processing thread to metadata with go the data of weight to write disk and back up.For the data set of data repetition rate greater than 2:1, whole data are not carried out data compression, only non-repetitive data are compressed processing, can reduce the data volume of compression greatly, save the time of calculating, improve the handling capacity of system.

Optionally, use GPU to make the streamline of main calculating.We see, compression calculating, the digest calculations tentatively looked in the heavy process also have Bloomfilter calculating can use GPU to handle, data transmission between CPU and the GPU need take the regular hour, aforementioned calculation is put into GPU goes up the data transmission times that processing of order can reduce between CPU and the GPU (6 times be reduced to twice) and need data quantity transmitted (twice intermediate result does not need transmission), such streamline is obtained reasonable effect for only using the system that GPU quickens.

Compared with prior art, the present invention has following advantage:

The objective of the invention is to solve and to become the system-computed of data de-duplication system bottleneck problem in the near future, by computational problem being put into the pressure that coprocessor is alleviated primary processor, and utilize the coprocessor powerful computing ability that computation process is quickened to reach the purpose that system performance promotes; Simultaneously, utilize the low in power consumption of coprocessor, realize the reduction of system power dissipation.

(1) computational problem optimization disclosed by the invention realizes with general coprocessor and GPU, compares existing dedicated coprocessor speed-up computation and has better generality, lower realization cost.

(2) data de-duplication disclosed by the invention system realizes that on the platform of low power consuming by the optimization of flow process and the optimization of computational problem, its performance can compare favourably with server, fully alternative server.

(3) data de-duplication disclosed by the invention system has very high throughput, and cheaper cost and lower energy consumption meet the spirit of the current energy-conserving and environment-protective of advocating more.

Description of drawings:

Fig. 1 is the basic flow sheet of data de-duplication.

Fig. 2 looks into heavy process flow diagram, i.e. the step s102 of basic flow sheet and s103 for data in the data de-duplication.

The principle of operation that Fig. 3 compiles for batch quantification in the data de-duplication system.

The streamline mechanism schematic diagram that Fig. 4 handles for multi-threaded parallel.

Fig. 5 is the process flow diagram of GPU acceleration pressure compression algorithm.

Fig. 6 is the preferential pipeline processes mechanism process flow diagram of compression.

Fig. 7 is the preferential pipeline processes mechanism process flow diagram of digest calculations.

The streamline mechanism process flow diagram that Fig. 8 tentatively looks into re-computation for using GPU to do.

Fig. 9 is the Organization Chart of data de-duplication system.

Embodiment

For make the above-mentioned purpose of the present invention, feature and advantage are more clear and easy to understand, be described further below in conjunction with the drawings and specific embodiments.

Embodiment 1:

Consult Fig. 3, batch of the present invention is shown quantizes to compile algorithm.Concrete principle and operation steps are as described below:

Step S301 to the data block that receives from network, generates its corresponding metadata information.

Step S302 is mounted to the metadata that generates among the step S301 on the corresponding global metadata chained list, to treat the use of back thread.

Step S303 obtains the size of data block, is saved in the relevant position of global data buffer zone.

Step S304 is put into data block the correspondence position of global data buffer zone.

To batch encapsulation of data stream, mainly be the data pre-service of doing for the data de-duplication flow process of serial traditionally being changed into streamline mechanism.Simultaneously, to data stream batch encapsulation, the storage of coming of data and metadata branch, also be for follow-up GPU parallel computation ready.

Embodiment 2:

Consult Fig. 4, the flow process of GPU compression algorithm of the present invention is shown.Concrete principle and operation steps are as described below:

Step S401, the ID of acquisition current process.

Step S402 is because data are to organize in advance in the process of accumulation.So can obtain the position of thread data to be processed according to Thread Id before in total data block.

Step S403 obtains the size of thread data block to be processed from the reference position of data block, leaves pointer position that previous step obtains suddenly in.

Step S404 uses certain compression algorithm that this data block is done compression and handles.

Embodiment 3:

Consult Fig. 5, the present invention is Bloomfilter of GPU flow process is shown.Concrete principle and operation steps are as described below:

Step S501 obtains the ID of current thread.

Step S502 obtains this thread data to be processed, and data are organized in advance, so just can obtain the first address of data at the total data piece according to current thread ID.Data length is (160) that fix.

Step S503 uses certain algorithm that this blocks of data is Bloomfilter and calculates.

Embodiment 4:

Consult Fig. 6, compression preferential flow waterline treatment mechanism flow process of the present invention is shown.The principle and the operation steps of collective are as described below:

Step S601, main thread obtains data from network, extracts metadata information, carries out a batch encapsulation, at last packaged data block is mounted in the processing queue of data compression thread, waits the processing of data compression thread.

Step S602, data compression thread win a node from the task queue chained list of oneself, according to metadata information, each data block in the data field is compressed, and the length after will compressing stores in the corresponding metadata information; Data after the compression are reformulated data block and metadata is packaged into new node together, and new node are suspended on the task queue chained list of digest calculations thread, and then win next node in proper order from task queue, carry out the calculating of next batch.

Step S603, the digest calculations thread is won a node from the task queue chained list of oneself, according to metadata information, each data block in the data field is carried out digest calculations, and result of calculation is kept in the metadata information, and this node is suspended on the task queue chained list of Bloomfilter thread.And then from task queue, win next node in proper order, carry out the digest calculations of next batch.

Step S604, the Bloomfilter thread is won a node from the task queue of oneself, carrying out Bloomfilter according to the summary of storing in the metadata information calculates, judge for the first time: exist if Bloomfilter judges this summary, the metadata information that then needs and be stored on the disk compares one by one, sees whether this summary really repeats, if repeat, then abandon data, the update metadata corresponding information is also preserved; Otherwise this summary does not repeat, and corresponding, its corresponding data block also is unique, needs to preserve data and metadata information.So far, the task of data filter is finished, and filtered data and metadata is packaged into new node and it is suspended on the task queue chained list of encrypting thread.And then from task queue, win next node in proper order, carry out the repeating data of next batch and filter calculating.

Step S605 encrypts thread and win a node from the task queue of oneself, respectively metadata and data is encrypted, and data encrypted is sent to backup node by network stores.

In data compression, digest calculations, Bloomfilter calculating and computations thread, computational threads all is that each data block in batch is calculated, in order to raise the efficiency, can utilize the powerful computation capability of GPU, the task that all data blocks are calculated is carried out simultaneously.Like this, calculate the time of a lot data and calculate the spent time of data block basic identical, thereby improved computing velocity greatly.

Because the spent time is calculated in digest calculations and data encryption and the size of data block is directly proportional, so data compression is carried out prior to other calculation tasks, the size of data block can be shortened, thereby digest calculations and spent time of data encryption calculating can be reduced effectively.Data de-duplication model more consuming time is comparatively speaking calculated in digest calculations and data encryption for the compression speed ratio is very fast for this, and the preferential streamline mechanism of data compression can improve the efficient of system effectively.

Embodiment 5:

Consult Fig. 7, the preferential streamline mechanism treatment scheme of summary of the present invention is shown.Concrete principle and operation steps are as described below:

Step S701, main thread obtains data from network, extracts metadata information, carries out a batch encapsulation, at last packaged data block is mounted in the processing queue of digest calculations thread, waits the processing of digest calculations thread.

Step S702, the digest calculations thread is won a node from the task queue chained list of oneself, according to metadata information, each data block in the data field is carried out digest calculations, and result of calculation is kept in the metadata information, and this node is suspended on the task queue chained list of Bf thread.And then from task queue, win next node in proper order, carry out the digest calculations of next batch.

Step S703, the Bf thread is won a node from the task queue of oneself, carrying out Bloomfilter according to the summary of storing in the metadata information calculates, judge for the first time: exist if Bloomfilter judges this summary, the metadata information that then needs and be stored on the disk compares one by one, sees whether this summary really repeats, if repeat, then abandon data, the update metadata corresponding information is also preserved; Otherwise this summary does not repeat, and corresponding, its corresponding data block also is unique, needs to preserve data and metadata information.So far, the task of data filter is finished, and filtered data and metadata is packaged into new node and it is suspended on the task queue chained list of data compression thread.And then from task queue, win next node in proper order, carry out the repeating data of next batch and filter calculating.

Step S704, data compression thread win a node from the task queue chained list of oneself, according to metadata information, each data block in the data field is compressed, and the length after will compressing stores in the corresponding metadata information; Data after the compression are reformulated data block and metadata is packaged into new node together, and new node are suspended on the task queue chained list of data encryption thread, and then win next node in proper order from task queue, carry out the calculating of next batch.

Step S705 encrypts thread and win a node from the task queue of oneself, respectively metadata and data is encrypted, and data encrypted is sent to backup node by network stores.

Data compression is carried out after data go to weigh, and is primarily aimed at following two applied environments:

1, the data compression algorithm of selecting with respect to other algorithms situation more consuming time under, data compression is placed on carries out after heavy, can reduce the task amount of data compression, thereby reduce the consuming time of total system.

2, use coprocessor execution, other calculating to use in the environment of GPU execution in digest calculations and computations, data compression is put into data and goes heavy execution afterwards, can reduce the number of times that data copy between internal memory and video memory.Because Bloomfilter calculates and data compression is calculated all to be placed among the GPU and carried out, executing after Bloomfilter calculates, the data video memory that can swap out, directly execution is compressed and being got final product.

Embodiment 7:

Consult Fig. 8, use GPU of the present invention is shown does the streamline mechanism of tentatively looking into re-computation, concrete principle and operation steps are as described below:

Step S801, main thread obtains data from network, extracts metadata information, carries out a batch encapsulation, at last packaged data block is mounted in the processing queue of digest calculations thread, waits the processing of tentatively looking into the re-computation thread.

Step S802, tentatively look into the re-computation thread and win a node from the task queue chained list of oneself, according to metadata information, to each data block in the data field, at first compress processing, carry out digest calculations then, carry out bloomfilter at last and tentatively look into heavily processing, then three kinds of result calculated are preserved.And then from task queue, win next node in proper order, carry out the data of next batch and tentatively look into re-computation.

Step S803 encrypts thread and win a node from the task queue of oneself, respectively metadata and data is encrypted, and data encrypted is sent to backup node by network stores.

Claims

1. the high-performance data de-duplication system of a low-power consumption, this system is made up of following three parts: the production center, computing center and backup center; The production center produces the data that need backup and sends to computing center, is responsible for user interaction process simultaneously; Computing center receives the data from the production center, carries out data repeatability and judges, the data that deletion repeats are also encrypted not repeating data after unduplicated data will be encrypted then and sent to backup center and do permanent storage; Backup center is responsible for receiving the storage of the unduplicated merging reason after the encryption that computing center sends; The data de-duplication system generally adopts the method for abstracting of comparing data to look into heavily, and has introduced the Bloomfilter technology and carried out tentatively looking into heavily with further minimizing data comparative quantity of data; It is characterized in that computing center adopts the computation process of following method acceleration and reduces power consumption:

1st, use the computational problem of coprocessor accelerating system

1.1st, use the association of Weisheng processor VIA to handle digest calculations and the data encryption task quickened

1.2nd, the use figure quickens the compression process in the video card GPU accelerating system

1.3rd, use the data summarization in the GPU accelerating system to calculate

1.4th, use GPU to quicken the Bloomfilter process

1.5th, use GPU to handle the heavy process of tentatively looking into of total system

2nd, the streamline mechanism of multi-threaded parallel processing

2.1st, multithreading streamline mechanism

2.2nd, compress streamline the preceding

2.3rd, the streamline after being compressed in

2.4th, use GPU to make the streamline of main calculating.

2. according to the described system of claim 1, it is as follows to it is characterized in that use VIA coprocessor described in the step 1.1 quickens the concrete grammar of digest calculations and data encryption task:

The cryptographic tasks of digest calculations task and data is assigned to the co-processor module PadLock of VIA processor;

The coprocessor PadLock of VIA processor has five accelerating engines: random number generator engine RNG, superencipherment engine ACE, superencipherment Engine Version 2ACE2, digest calculations engine PHE and montgomery multiplication device PMM; PHE can be used for quickening digest calculations, and ACE and ACE2 can be used for quickening computations; PadLock only provides the instruction of a few cover compilation, and we use the C language that these assembly instructions are encapsulated, and generate two wieldy C language function call VIA_SHA1 and VIA_AES, and these two functions of system call quicken digest calculations and ciphering process.

3. according to the described system of claim 1, it is characterized in that the concrete grammar of the use GPU acceleration pressure compression process described in the step 1.2 is as follows: the single program multiple data computation model that GPU provides very is fit to handle the lot data task of computation-intensive; Utilize this programming model of unified calculation framework CUDA of GPU to be written as a compression calculating storehouse CompressionGPU; CompressionGPU comprises following function interface: ComGPUInit, ComGPUKernel, CompressionGPUDestroy; Preliminary data preliminary work is at first done to data by system on CPU, the function interface that calls CompressionGPU then is with distribution of computation tasks acceleration pressure compression process to the GPU.

4. according to the described system of claim 1, it is as follows to it is characterized in that use GPU described in the step 1.3 quickens the concrete grammar of digest calculations process: data summarization also is the computation process of typical calculation intensity, and same being fit to come speed-up computation with GPU; At first utilize the CUDA programming model that data summarization is calculated and be packaged into digest calculations storehouse SHA1GPU; SHA1GPU comprises following function interface: SHA1GPUInit, SHA1GPUKernel, SHA1GPUDestroy; System data compression thread carries out turning back to CPU after the pre-service to packed data, and the function interface that CPU calls SHA1GPU carries out the digest calculations Task Distribution to GPU.

5. according to the described system of claim 1, it is as follows to it is characterized in that use GPU described in the step 1.4 quickens the concrete grammar of Bloomfilter computation process: in system, Bloomfilter is the highest process of computational intensity in the data de-duplication system, the most suitablely does speed-up computation with GPU; At first we utilize CUDA programming model encapsulation Bloomfilter computation process to become a storehouse BloomfilterGPU; BloomfilterGPU comprises following function interface: BFGPUInit

, BFGPUKernel, BFGPUDestroy; We call function interface among the BloomfilterGPU to the result data of digest calculations and Bloomfilter is assigned on the GPU processes then.

6. according to the described system of claim 1, it is characterized in that use GPU described in the step 1.5 handling whole data, tentatively to look into the concrete grammar of heavy process as follows: at first we can call the CUDA programming model and generate a preliminary data and look into heavy storehouse DuplicateGPU; DuplicateGPU provides following function interface, DuplicateGPUInit, DuplicateGPUKernel, DuplicateGPUDestroy; DuplicateGPU calculates at compression calculating, digest calculations and Bloomfilter and has done unified encapsulation, GPU is for the data of input, be data compression, digest calculations and the Bloomfilter of succession calculate, and then each result calculated are once turned back to CPU;

Use storehouse DuplicateGPU, secondary data input and result just spread out of and can unify to handle to above-mentioned three kinds of calculating, have reduced the number of times and the data volume of data transmission.

7. according to the described system of claim 1, it is characterized in that the concrete method of the multithreading streamline mechanism described in the step 2.1 is:

Thread independently below system opens: Data Receiving thread, data compression thread, digest calculations thread, Bloomfilter processing threads, data encryption and transmission thread; The Data Receiving thread is responsible for receiving data, generates the data metadata corresponding and then metadata is mounted to a later processing of metadata chained list wait, and data are put into a global data buffer zone according to the sequencing that arrives computing center;

System preestablishes a buffer size value BUFSIZE, and when the data bulk that receives reaches BUFSIZE, receiving thread sends data to the data compression thread; The data compression thread is responsible for carrying out the data compression process of batch data, and Ya Suo data are sent to the digest calculations thread then; The digest calculations thread is responsible for calculating the summary through the batch data of overcompression, and the summary with batch data sends to the BloomFilter thread then; The BloomFilter thread is responsible for summary data Bloomfilter calculating operation in batches, tentatively looking into of data is heavily sent to the result to encrypt and the data transmission line journey then; Encrypt with the transmission thread and be responsible for data repeatability is done further checking, then non-repetitive data block is encrypted, at last metadata is write local disk, and write remote back-up server by the nbd module with metadata with through the Backup Data of encrypting.

8. according to the described system of claim 1, it is characterized in that described in the step 2.2 compression the preceding streamline be fit to compression time respectively less than the data de-duplication system of the time and the data encryption time of digest calculations, concrete grammar is as follows: the execution sequence of adjusting each computational threads, the compression thread is handled as first calculation procedure of streamline, data after will compressing then pass to the thread of digest calculations, such streamline order can reduce the data volume of digest calculations, reduces the time of digest calculations.

9. according to the described system of claim 1, it is characterized in that the streamline after use described in the step 2.3 is compressed in, the repetition rate that is suitable for data is greater than 50%, need the backed up data amount to be less than 25% and compressometer evaluation time of total amount of data summation greater than the time of digest calculations and computations, only non-repetitive data are done compression and handle the time that can reduce compression, improve the handling capacity of total system; Concrete grammar is as follows: cancellation compression computational threads, and compression process is integrated into encrypts and the data transmission line journey, when finishing further the looking into after heavy industry does of data, non-repetitive data are compressed processing; And then do data encryption and data transmission work.

10. according to the described system of claim 1, it is characterized in that the use GPU described in the step 2.4 makees the streamline of main calculating; According to the characteristics of CUDA programming model, the data transmission that needs to handle could utilize the powerful computation capability of GPU to come deal with data to calculating video card; Concrete method is as follows: data compression thread, data summarization computational threads, Bloomfilter thread are merged into a computational threads, such streamline order can reduce data interaction number of times and the data volume size between CPU and the GPU, the performance of elevator system.