CN102156703A - Low-power consumption high-performance repeating data deleting system - Google Patents

Low-power consumption high-performance repeating data deleting system Download PDF

Info

Publication number
CN102156703A
CN102156703A CN2011100247439A CN201110024743A CN102156703A CN 102156703 A CN102156703 A CN 102156703A CN 2011100247439 A CN2011100247439 A CN 2011100247439A CN 201110024743 A CN201110024743 A CN 201110024743A CN 102156703 A CN102156703 A CN 102156703A
Authority
CN
China
Prior art keywords
data
gpu
thread
compression
bloomfilter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2011100247439A
Other languages
Chinese (zh)
Inventor
刘晓光
王刚
赵彬
马井玮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nankai University
Original Assignee
Nankai University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nankai University filed Critical Nankai University
Priority to CN2011100247439A priority Critical patent/CN102156703A/en
Publication of CN102156703A publication Critical patent/CN102156703A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention provides a low-power consumption high-performance repeating data deleting system, comprising a production center, a computation center and a backup center, wherein the production center is used for copying user request data and sending to the computation center; the computation center is used for deleting the repeating data and sending the non-repeating data to the backup center; and the backup center is used for storing the received data. The computation center uses a Very Innovative Architecture (VIA) processor to reduce the operation power consumption of the system. The performance of the system is improved via the following policies: (1) a special assembler command of the coprocessor module provided by the VIA processor is used for summary calculation and data encryption to improve the system performance via the hardware; (2) the computation center uses a Graphics Processing Unit (GPU) to quicken the data compressing procedure in the repeating data deleting system and the computation process of Bloomfilter; and the concurrent processing of the GPU is used for improving the operation efficiency of the system; and (3) the system performance is further improved by using two flow line mechanisms.

Description

A kind of high-performance data de-duplication system of low-power consumption
Technical field
The present invention relates to the data de-duplication field, specially refer to the optimization problem of energy consumption and calculated performance in the data de-duplication system.
Background technology
Along with development of computer; informationization makes that the importance of data protection is more and more higher; scheduled store data and unload database are that people are used for one of main means of protected data at first; this mode has been protected user's data safety to a certain extent; development along with the time; very fast people find that other data protection of this level has been difficult to satisfy existing demand; still the danger that exists lot of data to lose between twice backup; so; continuous data protection technology (continuous data protection; CDP) arise at the historic moment; as its name suggests; continuous data protection can provide more fine-grained data protection; even can back up and protect each write request, certainly, fine-grained data protection like this need be paid more storage cost; along with the continuous increase of data volume, the storage cost that Backup Data produces also more and more allows the people be difficult to accept.Obtain from ESG (Enterprise Strategy Group) statistical information, the data total amount that needs protection needs the data total amount of storage to reach the perabyte order of magnitude with the speed increment in every year 60% at present.In the face of huge data volume like this, the storage overhead that traditional backup method causes will be very huge.Yet the Backup Data major part that traditional redundancy technique, particularly full backup, incremental backup produce all is repetition, need not the repeating data that has backed up be backed up again.Therefore, data de-duplication technology is introduced into.At present, be a focus based on the continuous data protection technology of data de-duplication technology, by being discarded in the repeating data that produces in the middle of the data backup process, can effectively reduce the expense of storage system.Data Domain Deduplication File System (DDFS) for example carries out data de-duplication work to the Backup Data that produced in month, finally can reach the compressibility of 38.54:1, thereby picture has reduced storage overhead expecting greatly.
Consult Fig. 1, showed the data deletion process of general data de-duplication system, concrete step is as described below:
Step S101. receives the data block to be backed up that backup center arrives.
Step S102. calculates the summary of the data block that receives.
Step S103. judges relatively by summary whether the data block receive repeats, and filters out the data of repetition and to all data creation corresponding metadata.
Heavy data and metadata are gone in step S104. compression.
Data after the step S105. ciphered compressed.
The data that step S106. will encrypt send to backup center by network and back up.
In above-mentioned steps, going of data block heavily is that summary by the comparing data piece is realized, this algorithm that just requires to calculate summary must have very low collision rate, and the digest algorithm that adopts in the existing data de-duplication system generally is MD5, SHA-1 and SHA-256.In the algorithm of relatively making a summary, generally to tentatively judge with the Bloomfilter algorithm earlier, and then by relatively doing further judgement based on the index of disk.Usually,, can in internal memory, preserve a part of index, reduce the read-write operation of disk with this as cache in order to accelerate the comparison of disk index, thus speed-up ratio process.
Consult Fig. 2, the comparison flow process that the repeating data piece is removed by now general data de-duplication system is shown, concrete steps are as described below:
Step S201. computational data piece summary.
Whether step S202. calculates Bloomfilter, judge the set of corresponding position, if be not set, then this data block is unique, creates metadata; Otherwise, need further judge.
Step S203. compares new summary and the index that is stored among the cache, if having identically, then data block repeats, and the data that record repeats are in the position of backup center storage, and creates metadata; Otherwise, need and disk on index compare again, if also do not have identically, then data block is unique, creates metadata, otherwise, the data block repetition.
If step S204. data block repeats, read the index of follow-up several sectors of disk position of its index record, upgrade cache, utilize the principle of locality of data to reduce magnetic disc access times.
In the past in the research based on data de-duplication technology, people are placed on research emphasis how to reduce relatively the going up of index (because the access time of traditional magnetic disk is a link the most consuming time in the above flow process) usually, and nobody pays close attention to a large amount of computational problem that exists in the data de-duplication process.As time goes on, the application of the raising of the transmission speed of network and high-capacity and high-speed storer (for example use of solid state hard disc), the factor (Network Transmission and disk be access speed etc. immediately) that restricts the data de-duplication system throughput at present will no longer become problem in the near future; And meanwhile, the calculation task of a large amount of complexity in the data de-duplication process will become new system bottleneck for the calculating pressure that brings of CPU.
Summary of the invention
The objective of the invention is the problem of low high power consumption in, a kind of high-performance data de-duplication system based on low power processor is provided for the performance that solves existing data de-duplication system.What this system can well improve the data de-duplication system is energy, and the use of the processor of low-power consumption has simultaneously reduced the energy consumption of total system greatly.
How the present invention studies by the distribution of computation tasks (for example Weisheng processor VIA processor coprocessor and graphic process unit GPU) to general coprocessor of complexity is come the speed-up computation process, improves the throughput of system with this; Simultaneously, utilize the advantage of low-power consumption coprocessor aspect energy consumption will reduce the power consumption of total system, finally realize the data de-duplication system of the high-throughput of a low-power consumption.
The high performance data de-duplication of low-power consumption provided by the invention system, form (referring to Fig. 9) by following three parts: the production center, computing center and backup center (also claiming disaster recovery center), three parts are deployed in respectively on three nodes (computing machine).Wherein, the production center and backup center are deployed in the Intel Xeon node of two monokaryon 2.66GHZ respectively, and every machine has internal memory and hardware raid-0 who is made of 6 disks of 4GB.Computing node is deployed on the VIA nano platform of 1.6GHZ, and it has the internal memory of 2GHZ and the raid-0 that is made of two disks.
The production center produces wants backed up data, and the present invention realizes data protection at the block device layer, in block device layer intercepting and capturing user's write request, duplicates the data of write request and these data are sent to the interior computing center's node of same LAN (Local Area Network).
Computing center receives the data processing request of production center node, carries out data repeatability and judges, the data that deletion repeats are also encrypted unduplicated data, and the not repeating data after will encrypting then sends to backup center and does permanent storage; Backup center is responsible for receiving metadata that computing center sends over and unduplicated data and they is stored with rational layout type.
For the compressibility that improves data to save storage time and to reduce the time that the computational data summary needs, at first data are compressed processing; In order to reduce data size relatively, we calculate its summary to each pending data block to be processed, with the comparison that relatively is converted to the data block summary of data block contents, can reduce the working time of the time raising system of comparison like this; Use the Bloomfilter technology further to reduce disk number of times relatively; For the security of protected data, unduplicated data are done cryptographic operation send to the far-end backup center then at last.
We can see that there is a large amount of calculation tasks in computing center, and complicated a large amount of calculating meetings move the performance of bringing very big pressure and having a strong impact on system to system.In order to quicken these computation processes to improve the operational efficiency of system, computing center of the present invention adopts following method speed-up computation process and reduces power consumption:
1st, use the calculating of coprocessor accelerating system
1.1st, use the association of Weisheng processor VIA to handle acceleration digest calculations and data encryption process;
1.2nd, the use figure quickens the compression process in the video card GPU accelerating system;
1.3rd, use the data summarization in the GPU accelerating system to calculate;
1.4th, use GPU to quicken the Bloomfilter process;
1.5th, use GPU to handle whole data and tentatively look into heavy process;
2nd, the streamline mechanism of multi-threaded parallel processing
2.1st, multithreading streamline mechanism;
2.2nd, compress streamline the preceding;
2.3rd, the streamline after being compressed in;
2.4th, use GPU to make the streamline of main calculating.
Optionally, use the association of Weisheng processor VIA to handle the acceleration digest calculations, be specially: quicken SHA-1 data summarization computation process with the coprocessor Padlock that carries on the platform of Weisheng, Padlock provides one of accelerating engine PHE, can use this accelerating engine to quicken digest calculations (SHA1) computation process by the special assembly instruction of VIA processor, encapsulate these assembly language directives by the C language and become two unified C function call: VIA_SHA1, VIA_SHA1 uses the inline assembly characteristics of C language to encapsulate the assembly language DLL (dynamic link library) that PadLock provides, and calls computation process that top two C function calls make a summary and reaches the purpose of speed-up computation with this.
Preferably, use the association of Weisheng processor VIA to handle and data encryption process, be specially: quicken AES data encryption computation process with the coprocessor Padlock that carries on the platform of Weisheng, Padlock provides hardware acceleration engine ACE and ACE2 to quicken the computations process.Can call the accelerating engine of PadLock and quicken computations by calling assembly instruction special on the processor, encapsulate these assembly language directives by the C language and become unified C function call VIA_AES, VIA_AES uses the inline assembly characteristics of C language to encapsulate the assembly language DLL (dynamic link library) that PadLock provides, and calls computation process that packaged program encrypts and reaches the purpose of speed-up computation with this.
Preferably, the compression process in the use GPU accelerating system.Be specially: utilize this programming model of unified calculation framework CUDA of GPU to be written as a compression calculating storehouse CompressionGPU; CompressionGPU comprises following function interface: ComGPUInit, ComGPUKernel, CompressionGPUDestroy; Call ComGPUInit and be responsible for the internal memory application work that GPU holds, ComGPUKernel obtains data from the global data buffer zone, call and be transferred in the video memory that calculates video card, calculate video card and open up abundant execution thread, each thread compresses processing to one of them pending data block, each thread is according to the reference position of the definite data that will compress of Thread Id of oneself, and read the data block length that will compress (data are fixed through reading after the thread process each data block length of data) from reference position, use the lzjb compression algorithm that these data are compressed processing then.ComGPUDestroy is responsible for the destruction work of the internal memory of ComGPUInit application.The concurrent processing of all like this data blocks can reduce the needed time of compression, improves the performance of system.
Optionally, use the data summarization in the GPU accelerating system to calculate.Digest calculations is typical data-intensive calculating, and the single instruction multiple data computation model of GPU is fit to handle this calculating.Be specially: utilize this programming model of CUDA to be packaged into a digest calculations storehouse SHA1GPU; SHA1GPU comprises following function interface: SHA1GPU and comprises following function interface: SHA1GPUInit, SHA1GPUKernel, SHA1GPUDestroy; SHA1GPUInit is responsible for the Memory Allocation work of GPU end, the copying data that SHA1GPUKernel is responsible for compression is produced is in the GPU video memory, GPU opens abundant execution thread, and each thread is responsible for calculating the summary of a pending data block, and the result is turned back to CPU; SHA1GPUDestroy is responsible for destroying the internal memory of SHA1GPUInit application.
Preferably, use GPU to quicken the Bloomfilter process.It is typical data-intensive calculating that Bloomfilter calculates, and the single instruction multiple data computation model of GPU is fit to handle this calculating.Be specially: at first we utilize CUDA programming model encapsulation Bloomfilter computation process to become a storehouse BloomfilterGPU; BloomfilterGPU comprises following function interface: BFGPUInit, BFGPUKernel, BFGPUDestroy; BFGPUInit opens up a big buffer zone as the summary vector of bloomfilter calculating in the video memory of GPU, the Bloomfilter computational threads is handled for the result of digest calculations, call BFGPUKernel then and open up abundant execution thread at the calculating video card, each thread carries out Bloomfilter to one of them pending data block and calculates, the concurrent execution of all threads, each determines to do the data reference position that Bloomfilter calculates according to the Thread Id of oneself, read the data (result of digest calculations, length are 160) of regular length then and do Bloomfilter calculating.
Preferably, the pipeline processes of system, the process of whole data de-duplication can be divided into following step: Data Receiving, data compression, data summarization calculates, Bloomfilter calculates and data encryption is calculated, the transmission of Backup Data, each step is finished a Processing tasks of data, these tasks can be carried out on different hardware, and data dependence relation each other is very weak, and different calculation tasks can be carried out concurrently, in conjunction with the corresponding optimized Algorithm of each calculation task, the concurrency that adopts streamline to improve program can further improve the performance of system.
Optionally, the preferential pipeline processes mechanism of data compression.In the process of carrying out data de-duplication, data stream is after main thread reception and pre-service, at first flow into the data compression thread, flow through successively again digest calculations thread, BloomFilter thread and data encryption thread, at last through the data processing thread to metadata with go the data of weight to write disk and back up.Compress preferential processing procedure and can improve the time of reducing the needed data volume minimizing of digest calculations digest calculations.The system that the time of calculating for compression is less than the time of time of digest calculations and computations adopts this kind streamline can reduce the working time of total system.
Optionally, the streamline after being compressed in.Data stream is after main thread reception and pre-service, at first flow into the data summarization computational threads, flow into BloomFilter thread, data compression and data encryption thread more successively, at last through the data processing thread to metadata with go the data of weight to write disk and back up.For the data set of data repetition rate greater than 2:1, whole data are not carried out data compression, only non-repetitive data are compressed processing, can reduce the data volume of compression greatly, save the time of calculating, improve the handling capacity of system.
Optionally, use GPU to make the streamline of main calculating.We see, compression calculating, the digest calculations tentatively looked in the heavy process also have Bloomfilter calculating can use GPU to handle, data transmission between CPU and the GPU need take the regular hour, aforementioned calculation is put into GPU goes up the data transmission times that processing of order can reduce between CPU and the GPU (6 times be reduced to twice) and need data quantity transmitted (twice intermediate result does not need transmission), such streamline is obtained reasonable effect for only using the system that GPU quickens.
Compared with prior art, the present invention has following advantage:
The objective of the invention is to solve and to become the system-computed of data de-duplication system bottleneck problem in the near future, by computational problem being put into the pressure that coprocessor is alleviated primary processor, and utilize the coprocessor powerful computing ability that computation process is quickened to reach the purpose that system performance promotes; Simultaneously, utilize the low in power consumption of coprocessor, realize the reduction of system power dissipation.
(1) computational problem optimization disclosed by the invention realizes with general coprocessor and GPU, compares existing dedicated coprocessor speed-up computation and has better generality, lower realization cost.
(2) data de-duplication disclosed by the invention system realizes that on the platform of low power consuming by the optimization of flow process and the optimization of computational problem, its performance can compare favourably with server, fully alternative server.
(3) data de-duplication disclosed by the invention system has very high throughput, and cheaper cost and lower energy consumption meet the spirit of the current energy-conserving and environment-protective of advocating more.
Description of drawings:
Fig. 1 is the basic flow sheet of data de-duplication.
Fig. 2 looks into heavy process flow diagram, i.e. the step s102 of basic flow sheet and s103 for data in the data de-duplication.
The principle of operation that Fig. 3 compiles for batch quantification in the data de-duplication system.
The streamline mechanism schematic diagram that Fig. 4 handles for multi-threaded parallel.
Fig. 5 is the process flow diagram of GPU acceleration pressure compression algorithm.
Fig. 6 is the preferential pipeline processes mechanism process flow diagram of compression.
Fig. 7 is the preferential pipeline processes mechanism process flow diagram of digest calculations.
The streamline mechanism process flow diagram that Fig. 8 tentatively looks into re-computation for using GPU to do.
Fig. 9 is the Organization Chart of data de-duplication system.
Embodiment
For make the above-mentioned purpose of the present invention, feature and advantage are more clear and easy to understand, be described further below in conjunction with the drawings and specific embodiments.
Embodiment 1:
Consult Fig. 3, batch of the present invention is shown quantizes to compile algorithm.Concrete principle and operation steps are as described below:
Step S301 to the data block that receives from network, generates its corresponding metadata information.
Step S302 is mounted to the metadata that generates among the step S301 on the corresponding global metadata chained list, to treat the use of back thread.
Step S303 obtains the size of data block, is saved in the relevant position of global data buffer zone.
Step S304 is put into data block the correspondence position of global data buffer zone.
To batch encapsulation of data stream, mainly be the data pre-service of doing for the data de-duplication flow process of serial traditionally being changed into streamline mechanism.Simultaneously, to data stream batch encapsulation, the storage of coming of data and metadata branch, also be for follow-up GPU parallel computation ready.
Embodiment 2:
Consult Fig. 4, the flow process of GPU compression algorithm of the present invention is shown.Concrete principle and operation steps are as described below:
Step S401, the ID of acquisition current process.
Step S402 is because data are to organize in advance in the process of accumulation.So can obtain the position of thread data to be processed according to Thread Id before in total data block.
Step S403 obtains the size of thread data block to be processed from the reference position of data block, leaves pointer position that previous step obtains suddenly in.
Step S404 uses certain compression algorithm that this data block is done compression and handles.
Embodiment 3:
Consult Fig. 5, the present invention is Bloomfilter of GPU flow process is shown.Concrete principle and operation steps are as described below:
Step S501 obtains the ID of current thread.
Step S502 obtains this thread data to be processed, and data are organized in advance, so just can obtain the first address of data at the total data piece according to current thread ID.Data length is (160) that fix.
Step S503 uses certain algorithm that this blocks of data is Bloomfilter and calculates.
Embodiment 4:
Consult Fig. 6, compression preferential flow waterline treatment mechanism flow process of the present invention is shown.The principle and the operation steps of collective are as described below:
Step S601, main thread obtains data from network, extracts metadata information, carries out a batch encapsulation, at last packaged data block is mounted in the processing queue of data compression thread, waits the processing of data compression thread.
Step S602, data compression thread win a node from the task queue chained list of oneself, according to metadata information, each data block in the data field is compressed, and the length after will compressing stores in the corresponding metadata information; Data after the compression are reformulated data block and metadata is packaged into new node together, and new node are suspended on the task queue chained list of digest calculations thread, and then win next node in proper order from task queue, carry out the calculating of next batch.
Step S603, the digest calculations thread is won a node from the task queue chained list of oneself, according to metadata information, each data block in the data field is carried out digest calculations, and result of calculation is kept in the metadata information, and this node is suspended on the task queue chained list of Bloomfilter thread.And then from task queue, win next node in proper order, carry out the digest calculations of next batch.
Step S604, the Bloomfilter thread is won a node from the task queue of oneself, carrying out Bloomfilter according to the summary of storing in the metadata information calculates, judge for the first time: exist if Bloomfilter judges this summary, the metadata information that then needs and be stored on the disk compares one by one, sees whether this summary really repeats, if repeat, then abandon data, the update metadata corresponding information is also preserved; Otherwise this summary does not repeat, and corresponding, its corresponding data block also is unique, needs to preserve data and metadata information.So far, the task of data filter is finished, and filtered data and metadata is packaged into new node and it is suspended on the task queue chained list of encrypting thread.And then from task queue, win next node in proper order, carry out the repeating data of next batch and filter calculating.
Step S605 encrypts thread and win a node from the task queue of oneself, respectively metadata and data is encrypted, and data encrypted is sent to backup node by network stores.
In data compression, digest calculations, Bloomfilter calculating and computations thread, computational threads all is that each data block in batch is calculated, in order to raise the efficiency, can utilize the powerful computation capability of GPU, the task that all data blocks are calculated is carried out simultaneously.Like this, calculate the time of a lot data and calculate the spent time of data block basic identical, thereby improved computing velocity greatly.
Because the spent time is calculated in digest calculations and data encryption and the size of data block is directly proportional, so data compression is carried out prior to other calculation tasks, the size of data block can be shortened, thereby digest calculations and spent time of data encryption calculating can be reduced effectively.Data de-duplication model more consuming time is comparatively speaking calculated in digest calculations and data encryption for the compression speed ratio is very fast for this, and the preferential streamline mechanism of data compression can improve the efficient of system effectively.
Embodiment 5:
Consult Fig. 7, the preferential streamline mechanism treatment scheme of summary of the present invention is shown.Concrete principle and operation steps are as described below:
Step S701, main thread obtains data from network, extracts metadata information, carries out a batch encapsulation, at last packaged data block is mounted in the processing queue of digest calculations thread, waits the processing of digest calculations thread.
Step S702, the digest calculations thread is won a node from the task queue chained list of oneself, according to metadata information, each data block in the data field is carried out digest calculations, and result of calculation is kept in the metadata information, and this node is suspended on the task queue chained list of Bf thread.And then from task queue, win next node in proper order, carry out the digest calculations of next batch.
Step S703, the Bf thread is won a node from the task queue of oneself, carrying out Bloomfilter according to the summary of storing in the metadata information calculates, judge for the first time: exist if Bloomfilter judges this summary, the metadata information that then needs and be stored on the disk compares one by one, sees whether this summary really repeats, if repeat, then abandon data, the update metadata corresponding information is also preserved; Otherwise this summary does not repeat, and corresponding, its corresponding data block also is unique, needs to preserve data and metadata information.So far, the task of data filter is finished, and filtered data and metadata is packaged into new node and it is suspended on the task queue chained list of data compression thread.And then from task queue, win next node in proper order, carry out the repeating data of next batch and filter calculating.
Step S704, data compression thread win a node from the task queue chained list of oneself, according to metadata information, each data block in the data field is compressed, and the length after will compressing stores in the corresponding metadata information; Data after the compression are reformulated data block and metadata is packaged into new node together, and new node are suspended on the task queue chained list of data encryption thread, and then win next node in proper order from task queue, carry out the calculating of next batch.
Step S705 encrypts thread and win a node from the task queue of oneself, respectively metadata and data is encrypted, and data encrypted is sent to backup node by network stores.
Data compression is carried out after data go to weigh, and is primarily aimed at following two applied environments:
1, the data compression algorithm of selecting with respect to other algorithms situation more consuming time under, data compression is placed on carries out after heavy, can reduce the task amount of data compression, thereby reduce the consuming time of total system.
2, use coprocessor execution, other calculating to use in the environment of GPU execution in digest calculations and computations, data compression is put into data and goes heavy execution afterwards, can reduce the number of times that data copy between internal memory and video memory.Because Bloomfilter calculates and data compression is calculated all to be placed among the GPU and carried out, executing after Bloomfilter calculates, the data video memory that can swap out, directly execution is compressed and being got final product.
Embodiment 7:
Consult Fig. 8, use GPU of the present invention is shown does the streamline mechanism of tentatively looking into re-computation, concrete principle and operation steps are as described below:
Step S801, main thread obtains data from network, extracts metadata information, carries out a batch encapsulation, at last packaged data block is mounted in the processing queue of digest calculations thread, waits the processing of tentatively looking into the re-computation thread.
Step S802, tentatively look into the re-computation thread and win a node from the task queue chained list of oneself, according to metadata information, to each data block in the data field, at first compress processing, carry out digest calculations then, carry out bloomfilter at last and tentatively look into heavily processing, then three kinds of result calculated are preserved.And then from task queue, win next node in proper order, carry out the data of next batch and tentatively look into re-computation.
Step S803 encrypts thread and win a node from the task queue of oneself, respectively metadata and data is encrypted, and data encrypted is sent to backup node by network stores.

Claims (10)

1. the high-performance data de-duplication system of a low-power consumption, this system is made up of following three parts: the production center, computing center and backup center; The production center produces the data that need backup and sends to computing center, is responsible for user interaction process simultaneously; Computing center receives the data from the production center, carries out data repeatability and judges, the data that deletion repeats are also encrypted not repeating data after unduplicated data will be encrypted then and sent to backup center and do permanent storage; Backup center is responsible for receiving the storage of the unduplicated merging reason after the encryption that computing center sends; The data de-duplication system generally adopts the method for abstracting of comparing data to look into heavily, and has introduced the Bloomfilter technology and carried out tentatively looking into heavily with further minimizing data comparative quantity of data; It is characterized in that computing center adopts the computation process of following method acceleration and reduces power consumption:
1st, use the computational problem of coprocessor accelerating system
1.1st, use the association of Weisheng processor VIA to handle digest calculations and the data encryption task quickened
1.2nd, the use figure quickens the compression process in the video card GPU accelerating system
1.3rd, use the data summarization in the GPU accelerating system to calculate
1.4th, use GPU to quicken the Bloomfilter process
1.5th, use GPU to handle the heavy process of tentatively looking into of total system
2nd, the streamline mechanism of multi-threaded parallel processing
2.1st, multithreading streamline mechanism
2.2nd, compress streamline the preceding
2.3rd, the streamline after being compressed in
2.4th, use GPU to make the streamline of main calculating.
2. according to the described system of claim 1, it is as follows to it is characterized in that use VIA coprocessor described in the step 1.1 quickens the concrete grammar of digest calculations and data encryption task:
The cryptographic tasks of digest calculations task and data is assigned to the co-processor module PadLock of VIA processor;
The coprocessor PadLock of VIA processor has five accelerating engines: random number generator engine RNG, superencipherment engine ACE, superencipherment Engine Version 2ACE2, digest calculations engine PHE and montgomery multiplication device PMM; PHE can be used for quickening digest calculations, and ACE and ACE2 can be used for quickening computations; PadLock only provides the instruction of a few cover compilation, and we use the C language that these assembly instructions are encapsulated, and generate two wieldy C language function call VIA_SHA1 and VIA_AES, and these two functions of system call quicken digest calculations and ciphering process.
3. according to the described system of claim 1, it is characterized in that the concrete grammar of the use GPU acceleration pressure compression process described in the step 1.2 is as follows: the single program multiple data computation model that GPU provides very is fit to handle the lot data task of computation-intensive; Utilize this programming model of unified calculation framework CUDA of GPU to be written as a compression calculating storehouse CompressionGPU; CompressionGPU comprises following function interface: ComGPUInit, ComGPUKernel, CompressionGPUDestroy; Preliminary data preliminary work is at first done to data by system on CPU, the function interface that calls CompressionGPU then is with distribution of computation tasks acceleration pressure compression process to the GPU.
4. according to the described system of claim 1, it is as follows to it is characterized in that use GPU described in the step 1.3 quickens the concrete grammar of digest calculations process: data summarization also is the computation process of typical calculation intensity, and same being fit to come speed-up computation with GPU; At first utilize the CUDA programming model that data summarization is calculated and be packaged into digest calculations storehouse SHA1GPU; SHA1GPU comprises following function interface: SHA1GPUInit, SHA1GPUKernel, SHA1GPUDestroy; System data compression thread carries out turning back to CPU after the pre-service to packed data, and the function interface that CPU calls SHA1GPU carries out the digest calculations Task Distribution to GPU.
5. according to the described system of claim 1, it is as follows to it is characterized in that use GPU described in the step 1.4 quickens the concrete grammar of Bloomfilter computation process: in system, Bloomfilter is the highest process of computational intensity in the data de-duplication system, the most suitablely does speed-up computation with GPU; At first we utilize CUDA programming model encapsulation Bloomfilter computation process to become a storehouse BloomfilterGPU; BloomfilterGPU comprises following function interface: BFGPUInit
, BFGPUKernel, BFGPUDestroy; We call function interface among the BloomfilterGPU to the result data of digest calculations and Bloomfilter is assigned on the GPU processes then.
6. according to the described system of claim 1, it is characterized in that use GPU described in the step 1.5 handling whole data, tentatively to look into the concrete grammar of heavy process as follows: at first we can call the CUDA programming model and generate a preliminary data and look into heavy storehouse DuplicateGPU; DuplicateGPU provides following function interface, DuplicateGPUInit, DuplicateGPUKernel, DuplicateGPUDestroy; DuplicateGPU calculates at compression calculating, digest calculations and Bloomfilter and has done unified encapsulation, GPU is for the data of input, be data compression, digest calculations and the Bloomfilter of succession calculate, and then each result calculated are once turned back to CPU;
Use storehouse DuplicateGPU, secondary data input and result just spread out of and can unify to handle to above-mentioned three kinds of calculating, have reduced the number of times and the data volume of data transmission.
7. according to the described system of claim 1, it is characterized in that the concrete method of the multithreading streamline mechanism described in the step 2.1 is:
Thread independently below system opens: Data Receiving thread, data compression thread, digest calculations thread, Bloomfilter processing threads, data encryption and transmission thread; The Data Receiving thread is responsible for receiving data, generates the data metadata corresponding and then metadata is mounted to a later processing of metadata chained list wait, and data are put into a global data buffer zone according to the sequencing that arrives computing center;
System preestablishes a buffer size value BUFSIZE, and when the data bulk that receives reaches BUFSIZE, receiving thread sends data to the data compression thread; The data compression thread is responsible for carrying out the data compression process of batch data, and Ya Suo data are sent to the digest calculations thread then; The digest calculations thread is responsible for calculating the summary through the batch data of overcompression, and the summary with batch data sends to the BloomFilter thread then; The BloomFilter thread is responsible for summary data Bloomfilter calculating operation in batches, tentatively looking into of data is heavily sent to the result to encrypt and the data transmission line journey then; Encrypt with the transmission thread and be responsible for data repeatability is done further checking, then non-repetitive data block is encrypted, at last metadata is write local disk, and write remote back-up server by the nbd module with metadata with through the Backup Data of encrypting.
8. according to the described system of claim 1, it is characterized in that described in the step 2.2 compression the preceding streamline be fit to compression time respectively less than the data de-duplication system of the time and the data encryption time of digest calculations, concrete grammar is as follows: the execution sequence of adjusting each computational threads, the compression thread is handled as first calculation procedure of streamline, data after will compressing then pass to the thread of digest calculations, such streamline order can reduce the data volume of digest calculations, reduces the time of digest calculations.
9. according to the described system of claim 1, it is characterized in that the streamline after use described in the step 2.3 is compressed in, the repetition rate that is suitable for data is greater than 50%, need the backed up data amount to be less than 25% and compressometer evaluation time of total amount of data summation greater than the time of digest calculations and computations, only non-repetitive data are done compression and handle the time that can reduce compression, improve the handling capacity of total system; Concrete grammar is as follows: cancellation compression computational threads, and compression process is integrated into encrypts and the data transmission line journey, when finishing further the looking into after heavy industry does of data, non-repetitive data are compressed processing; And then do data encryption and data transmission work.
10. according to the described system of claim 1, it is characterized in that the use GPU described in the step 2.4 makees the streamline of main calculating; According to the characteristics of CUDA programming model, the data transmission that needs to handle could utilize the powerful computation capability of GPU to come deal with data to calculating video card; Concrete method is as follows: data compression thread, data summarization computational threads, Bloomfilter thread are merged into a computational threads, such streamline order can reduce data interaction number of times and the data volume size between CPU and the GPU, the performance of elevator system.
CN2011100247439A 2011-01-24 2011-01-24 Low-power consumption high-performance repeating data deleting system Pending CN102156703A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2011100247439A CN102156703A (en) 2011-01-24 2011-01-24 Low-power consumption high-performance repeating data deleting system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2011100247439A CN102156703A (en) 2011-01-24 2011-01-24 Low-power consumption high-performance repeating data deleting system

Publications (1)

Publication Number Publication Date
CN102156703A true CN102156703A (en) 2011-08-17

Family

ID=44438202

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2011100247439A Pending CN102156703A (en) 2011-01-24 2011-01-24 Low-power consumption high-performance repeating data deleting system

Country Status (1)

Country Link
CN (1) CN102156703A (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103034564A (en) * 2012-12-05 2013-04-10 华为技术有限公司 Data disaster tolerance demonstration and practicing method and data disaster tolerance demonstration and practicing device and system
CN103488734A (en) * 2013-09-17 2014-01-01 华为技术有限公司 Data processing method and deduplication engine
CN103544729A (en) * 2013-09-24 2014-01-29 Tcl集团股份有限公司 Animation data processing method and system
CN103593264A (en) * 2013-11-28 2014-02-19 中国南方电网有限责任公司超高压输电公司南宁局 System and method for remote wide area network disaster recovery backup
CN105389387A (en) * 2015-12-11 2016-03-09 上海爱数信息技术股份有限公司 Compression based deduplication performance and deduplication rate improving method and system
WO2016041127A1 (en) * 2014-09-15 2016-03-24 华为技术有限公司 Data duplication method and storage array
CN105681273A (en) * 2015-12-17 2016-06-15 西安电子科技大学 Client data deduplication method
CN104199637B (en) * 2014-07-16 2017-02-08 珠海金山网络游戏科技有限公司 Method for comparing packaged files and device and system thereof
CN106934293A (en) * 2015-12-29 2017-07-07 航天信息股份有限公司 The collision calculation device and collision calculation method of digital digest
CN109040653A (en) * 2018-06-28 2018-12-18 苏州科达科技股份有限公司 Data encrypting and deciphering expense determines method, apparatus and electronic equipment
WO2021033072A1 (en) * 2019-08-19 2021-02-25 International Business Machines Corporation Opaque encryption for data deduplication
CN113242593A (en) * 2015-08-26 2021-08-10 手持产品公司 Queue power management through information storage sharing
WO2022143405A1 (en) * 2020-12-30 2022-07-07 欧普照明股份有限公司 Energy consumption data processing method, cloud server, and energy consumption data processing system
CN114840515A (en) * 2022-06-30 2022-08-02 中科声龙科技发展(北京)有限公司 Method, device and chip for realizing batch data duplicate checking

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101609466A (en) * 2009-07-01 2009-12-23 中兴通讯股份有限公司 Mass data is looked into heavy method and system
CN101751423A (en) * 2008-12-08 2010-06-23 北大方正集团有限公司 Article duplicate checking method and system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101751423A (en) * 2008-12-08 2010-06-23 北大方正集团有限公司 Article duplicate checking method and system
CN101609466A (en) * 2009-07-01 2009-12-23 中兴通讯股份有限公司 Mass data is looked into heavy method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
《中国科技论文在线》 20100417 马良,甄彩君,赵彬,马井玮,王刚,刘晓光 利用低能协处理器的快速重复数据删除系统 本篇论文正文第2-13页 1-10 , *

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103034564B (en) * 2012-12-05 2016-06-15 华为技术有限公司 Data disaster tolerance drilling method, data disaster tolerance practice device and system
CN103034564A (en) * 2012-12-05 2013-04-10 华为技术有限公司 Data disaster tolerance demonstration and practicing method and data disaster tolerance demonstration and practicing device and system
CN103488734A (en) * 2013-09-17 2014-01-01 华为技术有限公司 Data processing method and deduplication engine
CN103544729A (en) * 2013-09-24 2014-01-29 Tcl集团股份有限公司 Animation data processing method and system
CN103593264A (en) * 2013-11-28 2014-02-19 中国南方电网有限责任公司超高压输电公司南宁局 System and method for remote wide area network disaster recovery backup
CN103593264B (en) * 2013-11-28 2017-07-07 中国南方电网有限责任公司超高压输电公司南宁局 Remote Wide Area Network disaster tolerant backup system and method
CN104199637B (en) * 2014-07-16 2017-02-08 珠海金山网络游戏科技有限公司 Method for comparing packaged files and device and system thereof
CN105612489B (en) * 2014-09-15 2017-08-29 华为技术有限公司 Data de-duplication method and storage array
CN105612489A (en) * 2014-09-15 2016-05-25 华为技术有限公司 Data duplication method and storage array
WO2016041127A1 (en) * 2014-09-15 2016-03-24 华为技术有限公司 Data duplication method and storage array
CN113242593A (en) * 2015-08-26 2021-08-10 手持产品公司 Queue power management through information storage sharing
CN105389387A (en) * 2015-12-11 2016-03-09 上海爱数信息技术股份有限公司 Compression based deduplication performance and deduplication rate improving method and system
CN105389387B (en) * 2015-12-11 2018-12-14 上海爱数信息技术股份有限公司 A kind of data de-duplication performance based on compression and the method and system for deleting rate promotion again
CN105681273A (en) * 2015-12-17 2016-06-15 西安电子科技大学 Client data deduplication method
CN105681273B (en) * 2015-12-17 2018-11-20 西安电子科技大学 Client-side deduplication method
CN106934293B (en) * 2015-12-29 2020-04-24 航天信息股份有限公司 Collision calculation device and method for digital abstract
CN106934293A (en) * 2015-12-29 2017-07-07 航天信息股份有限公司 The collision calculation device and collision calculation method of digital digest
CN109040653A (en) * 2018-06-28 2018-12-18 苏州科达科技股份有限公司 Data encrypting and deciphering expense determines method, apparatus and electronic equipment
CN109040653B (en) * 2018-06-28 2020-09-29 苏州科达科技股份有限公司 Data encryption and decryption overhead determining method and device and electronic equipment
WO2021033072A1 (en) * 2019-08-19 2021-02-25 International Business Machines Corporation Opaque encryption for data deduplication
GB2602216A (en) * 2019-08-19 2022-06-22 Ibm Opaque encryption for data deduplication
GB2602216B (en) * 2019-08-19 2022-11-02 Ibm Opaque encryption for data deduplication
US11836267B2 (en) 2019-08-19 2023-12-05 International Business Machines Corporation Opaque encryption for data deduplication
WO2022143405A1 (en) * 2020-12-30 2022-07-07 欧普照明股份有限公司 Energy consumption data processing method, cloud server, and energy consumption data processing system
CN114840515A (en) * 2022-06-30 2022-08-02 中科声龙科技发展(北京)有限公司 Method, device and chip for realizing batch data duplicate checking
CN114840515B (en) * 2022-06-30 2022-09-02 中科声龙科技发展(北京)有限公司 Method, device and chip for realizing batch data duplicate checking

Similar Documents

Publication Publication Date Title
CN102156703A (en) Low-power consumption high-performance repeating data deleting system
Jiang et al. Scaling up MapReduce-based big data processing on multi-GPU systems
Park et al. Secure hadoop with encrypted HDFS
CN105051695B (en) It is immutable to share zero replicate data and spread defeated
Wu et al. ParaStream: A parallel streaming Delaunay triangulation algorithm for LiDAR points on multicore architectures
Negrevergne et al. Discovering closed frequent itemsets on multicore: Parallelizing computations and optimizing memory accesses
US20120072117A1 (en) System and method for generating images of subsurface structures
Docan et al. Moving the code to the data-dynamic code deployment using activespaces
Gharaibeh et al. A GPU accelerated storage system
CN105103136B (en) Shared and managed memory is unified to be accessed
CN107924327A (en) System and method for multiple threads
Sharma et al. A technical review for efficient virtual machine migration
Docan et al. Activespaces: Exploring dynamic code deployment for extreme scale data processing
CN101341471B (en) Apparatus and method for dynamic cache management
Xia et al. Redundancy-free high-performance dynamic GNN training with hierarchical pipeline parallelism
Tian et al. A heterogeneous CPU-GPU implementation for discrete elements simulation with multiple GPUs
Vo et al. HyperFlow: A Heterogeneous Dataflow Architecture.
Liu et al. Large scale caching and streaming of training data for online deep learning
Liao et al. A new multi-core pipelined architecture for executing sequential programs for parallel geospatial computing
Moreland et al. Visualization for exascale: Portable performance is critical
Pioli et al. Research characterization on I/O improvements of storage environments
Biswas et al. An Efficient Reduced-Memory GPU-based Dynamic Programming Strategy for Bounded Knapsack Problems
Kal et al. AESPA: Asynchronous Execution Scheme to Exploit Bank-Level Parallelism of Processing-in-Memory
Yogatama et al. Accelerating User-Defined Aggregate Functions (UDAF) with Block-wide Execution and JIT Compilation on GPUs
Guzman et al. Towards the inclusion of FPGAs on commodity heterogeneous systems

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20110817