CN107608769A

CN107608769A - A kind of data processing method and device

Info

Publication number: CN107608769A
Application number: CN201710824734.5A
Authority: CN
Inventors: 郗睿
Original assignee: Zhengzhou Yunhai Information Technology Co Ltd
Current assignee: Zhengzhou Yunhai Information Technology Co Ltd
Priority date: 2017-09-13
Filing date: 2017-09-13
Publication date: 2018-01-19

Abstract

The present invention provides a kind of data processing method and device, the above method comprise the following steps：Pending data block is divided into multiple target data blocks by graphics processor GPU；The GPU carries out parallel processing to the multiple target data block, obtains cryptographic Hash among corresponding to each target data block；The GPU carries out Hash operation to the middle cryptographic Hash, obtains objective result.In above-mentioned technical proposal, the task that by the execution of multiple thread parallels multiple target data blocks are carried out with MD5 Hash operations is realized, parallel processing manner improves data-handling efficiency.

Description

A kind of data processing method and device

Technical field

The invention belongs to field of computer technology, more particularly to a kind of data processing method and device.

Background technology

MD5 (Message-Digest Algorithm 5, Message-Digest Algorithm 5), for ensuring information transfer complete one Cause, be one of the widely used hash algorithm of computer or digest algorithm, main flow programming language generally has MD5 and realizes that it is A kind of irreversible conversion, the message of random length can be transformed into the cryptographic Hash that a length is 128 bits, but nothing by it Method switches back to the cryptographic Hash of one original message；MD5 algorithms handle disappearing for input using 512 bits as a packet Breath, during computing the packet of each 512 bit be divided into the subgroup of 16 32 bits, after a series of processing, MD5 again Algorithm can export the packet of four 32 bits, and the cryptographic Hash of 128 bits is just generated after this four 32 bit groupings cascades.It is false The fixed message now with a random length is used as input, and it is desirable that calculating its cryptographic Hash, then with reference to Fig. 1 to message Processing step is described in detail：

Step 1：Origination message is filled so that the bit length of data afterwards can be by 512 plus 64 after filling Divide exactly, the content of filling be using first bit as 1, behind mend the 0 of several bits；

Step 2：The length information of origination message is finally added in message, the length of origination message is entered with the two of 64 bits Number processed represents, then the binary number of this 64 bit is added to behind the data filled in step 1；

Step 3：Message is grouped and link variable is initialized, by pending message using 512 bits as Unit is divided into L data block, as shown in Figure 1；

Step 4：Packet to 512 bits carries out Hash operation, and the step is the core of MD5 algorithms, is included Four-wheel computing, the step are represented that its logical process is as shown in Fig. 2 four-wheel processing has similar knot by HMD5 in Fig. 1 Structure, but the logical function used in often wheel processing is different；

Step 5：The output of cryptographic Hash.After the data block of all L 512 bits has been handled, last data The value of four 32 link variables obtained after block computing is complete be chained up be exactly 128 bits of whole message eap-message digest, Also referred to as hashed value or cryptographic Hash.

MD5 has been widely used for multiple fields as one of algorithm the most frequently used in hash function, including encrypting and decrypting, Digital signature, data completeness guarantee etc., in data completeness guarantee applies, data block is all bigger, typically from thousands of Byte is to several Mbytes, because the data amount of calculation in data-storage system is very big, to the throughput demands of MD5 Hash operations Can be very high, the Hash operation in this application is referred to as direct MD5 Hash operations.

From MD5 principle analysis, it is a kind of method of sequential processes, in the method for this sequential processes, waits to locate Reason data can be divided into multiple data blocks, support, based in content addressed data-storage system, first to divide large file Into multiple data blocks, piecemeal has two methods：Fixed size and variable-size, fixed size be file is divided into it is equal in magnitude Data block, and the method for partition of variable-size is divided according to the content of data file, is logical in LBFS systems The cryptographic Hash of the window of continuous 48 byte in file is crossed to judge border, if last 20 of value are entirely zero, that The window of this 48 byte is just taken as border.

But the processed output for producing a regular length of a data block in above-mentioned technical proposal, this data block Output can be used as the input of next data block processing computing, and serial processing mode causes data-handling efficiency to substantially reduce.

Therefore, there is an urgent need to provide a kind of data processing scheme to solve above-mentioned technical problem.

The content of the invention

The present invention provides a kind of data processing method and device, to solve the above problems.

The embodiment of the present invention provides a kind of data processing method, comprises the following steps：Graphics processor GPU is by pending number Multiple target data blocks are divided into according to block；

The GPU carries out parallel processing to the multiple target data block, obtains among corresponding to each target data block Cryptographic Hash；

The GPU carries out Hash operation to the middle cryptographic Hash, obtains objective result.

The embodiment of the present invention also provides a kind of data processing equipment, including processor, is adapted for carrying out each instruction；Storage is set Standby, suitable for storing a plurality of instruction, the instruction is suitable to be loaded and performed by the processor；

Pending data block is divided into multiple target data blocks by graphics processor GPU；

Technical scheme provided in an embodiment of the present invention：Pending data block is divided into multiple targets by graphics processor GPU Data block；The GPU carries out parallel processing to the multiple target data block, obtains Kazakhstan among corresponding to each target data block Uncommon value；The GPU carries out Hash operation to the middle cryptographic Hash, obtains objective result.

In above-mentioned technical proposal, realize and MD5 Kazakhstan is carried out to multiple target data blocks by the execution of multiple thread parallels The task of uncommon computing, parallel processing manner improve data-handling efficiency.

Brief description of the drawings

Accompanying drawing described herein is used for providing a further understanding of the present invention, forms the part of the application, this hair Bright schematic description and description is used to explain the present invention, does not form inappropriate limitation of the present invention.In the accompanying drawings：

Fig. 1 show the schematic diagram using MD5 algorithms to Message Processing in the prior art；

Fig. 2 show the logical process figure for carrying out Hash operation to packet in the prior art；

Fig. 3 show the direct MD5 Hash operations concurrent operation framework accelerated based on GPU the signal of the embodiment of the present invention Figure；

Fig. 4 show the MD5_GPU_Direct function execution flow charts of the embodiment of the present invention 4；

Fig. 5 show the MD5_Direct kernel thread execution route schematic diagrams of the embodiment of the present invention 5；

The thread cached using shared drive to grouped data that Fig. 6 show the embodiment of the present invention 6 performs flow Figure；

Fig. 7 show global memory's Access Optimization structural representation of the embodiment of the present invention 7；

Fig. 8 show the data processing method flow chart of the embodiment of the present invention 8；

Fig. 9 show the data processing equipment structure chart of the embodiment of the present invention 9.

Embodiment

Describe the present invention in detail below with reference to accompanying drawing and in conjunction with the embodiments.It should be noted that do not conflicting In the case of, the feature in embodiment and embodiment in the application can be mutually combined.

The realization principle of the present invention is as follows：Pending data block is divided into the multiple target data blocks of size identical, so Parallel computing is carried out to each target data block using the MD5 algorithms of standard afterwards, each target data block after the completion of computing Corresponding cryptographic Hash is saved together in order, and finally all output results are handled using the MD5 algorithms of standard again Objective result to the end.

Fig. 3 show the direct MD5 Hash operations concurrent operation framework accelerated based on GPU the signal of the embodiment of the present invention Figure, as shown in figure 3, the H in figure represents the MD5 Hash operations of standard, parallel pair can be thus carried out using substantial amounts of thread Each target data block carries out MD5 Hash operations.

But when the enough Thread Counts of deblocking are very big, it is necessary to by thread dividing be multiple thread blocks, due to not It can not be synchronized between the thread in thread block, so finally to the Hash for the middle cryptographic Hash tried to achieve by target data block Computing can not with above the step of complete in same kernel.Then two kernel are devised, first kernel's Task is that the multiple target data blocks being divided into using a large amount of threads concurrently performed to pending data block carry out Hash operation, and The middle cryptographic Hash of acquisition is stored in equipment DRAM in sequence；After the completion of being performed Deng first kernel, opened by main frame Second kernel is moved, address and size of data the calculating to first kernel that this kernel can provide according to main frame The middle cryptographic Hash arrived carries out Hash operation, and this step need to only be performed with a thread, because the size of middle cryptographic Hash is remote Much smaller than initial data, so the computing cost of this step is very little.

Hash operation is carried out to data using GPU in order to which upper layer application can be convenient to use direct MD5 hash functions, its API design is as follows：

Bool MD5_GPU_Direct (unsigned char*InputData, int InputDataSize, Unsigned char*HashOutput, int HashOutputSize)

The input and output of function are realized by the pointer in parameter or address, the return type of function be for The operation conditions of representative function, returns to 0 if function operation is out of joint, 1 is returned if running successfully.

The parameter declaration of function is as follows：InputData represents the address of pending data, and InputDataSize represents to treat The size of processing data, HashOutput represent to need the address for exporting cryptographic Hash, to export needed for HashOutputSize expressions The byte number of cryptographic Hash, design this parameter be because in some application scenarios and need not export whole digits of cryptographic Hash, And partial bytes need to be only used, the expense of output result is reduced in the case of meet demand by this parameter can.

MD5_GPU_Direct is designed as can be by the function of the function call of other on main frame, the task bag of its execution The preparations such as the data prediction on main frame are included, transfer data to GPU, then call MD5_Direct kernel in GPU On to data carry out concurrent operation, operation result is stored in GPU internal memories, wait this kernel perform after the completion of restart MD5_Std kernel carry out last time MD5 Hash operations to result above, and final cryptographic Hash is stored in GPU internal memories On, finally final result is sent on main frame and returns to upper level applications, idiographic flow as shown in figure 4, including with Lower step：

Step 401：Algorithm performs context calculates；

Step 402：Distribute GPU internal memories；

Step 403：Distribute host memory；

Step 404：Establish and perform configuration parameter；

Step 405：Pending data block is transferred to GPU by main frame；

Step 406：Start MD5_Direct kernel；

Step 407：Kernel implementation status is monitored, judges whether kernel performs mistake, if it is not, step 408 is then performed, If so, then perform step 413；

Step 408：Wait all threads to perform, obtain cryptographic Hash among corresponding to each target data block；

Step 409：Start MD5_Std kernel, Hash operation is carried out to the middle cryptographic Hash, obtains objective result；

Step 410：Objective result is transferred to main frame by GPU；

Step 411：Discharge the memory source of distribution；

Step 412：Returning result

Step 413：Return and prompt mistake.

MD5_Direct kernel are the parts performed on GPU, and the programming in kernel design and CPU has Institute is different, and a kernel is made up of huge number of thread, and design kernel is also just designed to each thread, if The execution route of each thread is different, then these threads are possible to sequentially to perform, so as to waste GPU parallel computation Ability.

Therefore, the design to thread is preferably such that the execution route between each thread is identical.To thread in kernel Execution route design such as Fig. 5.

Fig. 5 show the MD5_Direct kernel thread execution route schematic diagrams of the embodiment of the present invention 5, including with Lower step：

Step 501：Obtain the initial address of target data block corresponding to each thread；

Step 502：Preserve the GPU memory address to the target data block result；

Step 503：Judge whether current thread is last thread, if so, then performing step 504；

If it is not, then perform step 505；

Step 504：Distributed data are filled；

Step 505：Execution standard MD5 algorithms carry out Hash operation；

Step 506：Preserved according to the GPU memory address.

Detailed design and realization have been carried out to the MD5 algorithms based on GPU above, to have obtained higher performance, fully profit With GPU resource, then just also need to be further optimized and improve for GPU structure and resources characteristic, GPU performance Optimization mainly includes three elementary tactics：It is farthest parallel perform, using efficient instruction and memory optimization using with up to To maximum memory bandwidth.

Structure and resources characteristic and optimisation strategy for GPU, the design that MD5 algorithms are performed to GPU have been done further Optimization.

1) shared-memory management：If by shared drive to input data when carrying out MD5 Hash operations to input data Cached, then can reduces the access times of global memory, so as to reduce the time overhead for accessing global memory.

, can not be to input but the space very little of shared drive, each polycaryon processor only have 16K shared memory space Data are all cached, because the processing in MD5 algorithms to data is being grouped to be handled with 512 bits, so The space of 512 bits is linearly distributed in shared drive for each, when thread carries out computing to a packet, first this The data duplication of packet in the space of shared drive, is then handled the data in shared drive again to the thread.Make It is as shown in Figure 6 that the thread cached with shared drive to grouped data performs flow.

The thread cached using shared drive to grouped data that Fig. 6 show the embodiment of the present invention 6 performs flow Figure, comprises the following steps：

Step 601：Based on context the packet count of data is calculated；

Step 602：Judge whether current group is last packet, if so, step 606 is then performed, if it is not, then performing Step 603；

Step 603：64 byte datas are continuously read from the grouping address and this thread be present in the space of shared drive；

Step 604：The step computing of four-wheel 64 is carried out to the packet；

Step 605：Link variable is updated, performs step 602；

Step 606：Based on context the byte number of last packet is calculated；

Step 607：Replicate last packet data to this thread in the space of shared drive；

Step 608：Filling and length information data are preserved into the thread in the space of shared drive；

Step 609：The step computing of four-wheel 64 is carried out to last packet；

Step 610：Renewal link variable is simultaneously stored in global memory.

2) global memory accesses：The size of the data of each thread process is upper level applications in the design of MD5 algorithms Setting, so can not meet that thread accesses are aggregated the condition to an internal storage access, a finger can used using GPU The characteristics of making can read 128 bit words from global memory optimizes to reading data from global memory.

Need to be grouped into shared drive from global memory's duplication MD5 computings in shared-memory management design, and except most The latter packet is outer, and others packet is all 64 bytes, because shared drive is the progress piecemeal in units of 4 bytes, so It is to carry out digital independent with 32 bits when accessing global memory, and if reading the number of 128 bits by once command According to, then can reduces the access times to global memory.Due to having substantial amounts of register resources on GPU, can utilize The register of four 32 bits forms the structure of 128 bits, when needs copy data to shared drive from global memory When, the access of the bit of how many times 128 is needed to calculate according to the size (being less than or equal to 64 bytes) of current replicate data, then often It is secondary access global memory and all read the data of 128 bits be stored in the structure being made up of register, then this 128 is compared again Valid data in spy are copied in corresponding shared drive by several times in units of 32 bits, and implementation is as shown in Figure 7.

3) data transfer of main frame and equipment：It is the DMA engine by GPU that data are transmitted from host memory to device memory Realize, and this just needs that data to be transmitted formerly is copied into one of GPU driving occupancy from the core position of host assignment In the caching of page locking page in memory, data are then sent to GPU from this caching again.

If application program on main frame directly for input data distribute page locking page in memory if, it is possible to avoid internal memory it Between replicate, however, distribution page locking page in memory can increase certain expense because first to be found before locking page in memory The region of memory of continuous page, the page locking page in memory distributed is bigger, and the time of institute's cost is bigger.If data are greatly to one The normal operation of operating system will may be influenced whether by determining degree, but this expense can't produce in actual use Raw very big influence, because this page of locking page in memory space may be reused after deciding, expense before is only Be the increase in the time of initialization, and this expense will be bisected into during page locking page in memory is reused for a long time and Hide.

The embodiment of the present invention devises MD5 function interfaces API and parallel data processing structure on GPU, to MD5 on GPU Realization carried out detailed design, and access in shared-memory management, global memory and the data between main frame and equipment Several aspects are transmitted design is optimized.

Realize and the MD5 Hash operations of a chunk data are performed parallel, make full use of GPU operational capability, make Come parallel execution data are carried out with the task of MD5 Hash operations with extensive thread, substantially increases data-handling efficiency.

Fig. 8 show the data processing method flow chart of the embodiment of the present invention 8, comprises the following steps：

Step 801：Pending data block is divided into multiple target data blocks by graphics processor GPU；

Further, before pending data block is divided into multiple target data blocks by the graphics processor GPU, also wrap Include：

Main frame transmits the pending data block to the GPU.

Further, the main frame, which transmits the pending data block to the process of the GPU, is：

By the page locking page in memory distributed for the pending data block on main frame, the pending data block is transmitted To the GPU.

Preferably, the size of the multiple target data block is identical.

Step 802：The GPU carries out parallel processing to the multiple target data block, obtains each target data block pair The middle cryptographic Hash answered；

Further, the GPU preserves middle cryptographic Hash corresponding to each target data block to dynamic random in order Access DRAM memory.

Further, parallel processing is carried out to the multiple target data block by multiple threads.

Further, before carrying out parallel processing to the multiple target data block by multiple threads, in addition to：

Obtain the initial address of target data block corresponding to each thread, preserve to the target data block result GPU memory address.

In the shared drive of the GPU, the space of predetermined bit number is distributed for each thread, wherein, the default ratio The space of special number is used to store target data block corresponding to the thread.

Further, the thread obtained from the shared drive corresponding to target data block and handled.

Further, the thread obtained from the shared drive corresponding to target data block and before being handled, Also include：

The target data block is read from the global memory of the GPU by preset number register in the GPU；

The shared drive reads the target data block from the preset number register.

Step 803：The GPU carries out Hash operation to the middle cryptographic Hash, obtains objective result.

Further, the GPU preserves the objective result and sends the objective result to main frame.

Fig. 9 show the data processing equipment structure chart of the embodiment of the present invention 9, including processor, is adapted for carrying out each instruction； Storage device, suitable for storing a plurality of instruction, the instruction is suitable to be loaded and performed by the processor；

Main frame transmits the pending data block to the GPU.

Preferably, the size of the multiple target data block is identical.

The shared drive reads the target data block from the preset number register

The preferred embodiments of the present invention are the foregoing is only, are not intended to limit the invention, for the skill of this area For art personnel, the present invention can have various modifications and variations.Within the spirit and principles of the invention, that is made any repaiies Change, equivalent substitution, improvement etc., should be included in the scope of the protection.

Claims

1. a kind of data processing method, it is characterised in that comprise the following steps：

The GPU carries out parallel processing to the multiple target data block, obtains Hash among corresponding to each target data block Value；

2. data processing method according to claim 1, it is characterised in that the graphics processor GPU is by pending number Before multiple target data blocks being divided into according to block, in addition to：

Main frame transmits the pending data block to the GPU.

3. data processing method according to claim 2, it is characterised in that the main frame passes the pending data block The process for transporting to the GPU is：

By the page locking page in memory distributed for the pending data block on main frame, the pending data block is transmitted to institute State GPU.

4. data processing method according to claim 1, it is characterised in that the size phase of the multiple target data block Together.

5. data processing method according to claim 1, it is characterised in that the GPU is corresponding by each target data block Middle cryptographic Hash preserve in order to dynamic random access memory DRAM.

6. data processing method according to claim 1, it is characterised in that the GPU preserves the objective result and will The objective result is sent to main frame.

7. data processing method according to claim 1, it is characterised in that by multiple threads to the multiple number of targets Parallel processing is carried out according to block.

8. data processing method according to claim 7, it is characterised in that by multiple threads to the multiple number of targets Before parallel processing being carried out according to block, in addition to：

Obtain the initial address of target data block corresponding to each thread, preserve to the GPU of the target data block result Memory address.

9. data processing method according to claim 7, it is characterised in that by multiple threads to the multiple number of targets Before parallel processing being carried out according to block, in addition to：

In the shared drive of the GPU, the space of predetermined bit number is distributed for each thread, wherein, the predetermined bit number Space be used for store target data block corresponding to the thread.

10. data processing method according to claim 9, it is characterised in that the thread obtains from the shared drive Target data block corresponding to taking simultaneously is handled.

11. data processing method according to claim 9, it is characterised in that the thread obtains from the shared drive Before target data block parallel processing corresponding to taking, in addition to：

The shared drive reads the target data block from the preset number register.

12. a kind of data processing equipment, it is characterised in that including processor, be adapted for carrying out each instruction；Storage device, suitable for depositing A plurality of instruction is stored up, the instruction is suitable to be loaded and performed by the processor；

13. data processing equipment according to claim 12, it is characterised in that the graphics processor GPU will be pending Data block is divided into before multiple target data blocks, in addition to：

Main frame transmits the pending data block to the GPU.

14. data processing equipment according to claim 13, it is characterised in that the main frame is by the pending data block Transmit to the process of the GPU and be：

15. data processing equipment according to claim 12, it is characterised in that the size phase of the multiple target data block Together.

16. data processing equipment according to claim 12, it is characterised in that the GPU is by each target data block pair The middle cryptographic Hash answered preserves to dynamic random access memory DRAM in order.

17. data processing equipment according to claim 12, it is characterised in that the GPU preserves the objective result simultaneously The objective result is sent to main frame.

18. data processing equipment according to claim 12, it is characterised in that by multiple threads to the multiple target Data block carries out parallel processing.

19. data processing equipment according to claim 18, it is characterised in that by multiple threads to the multiple target Before data block carries out parallel processing, in addition to：

20. data processing equipment according to claim 18, it is characterised in that by multiple threads to the multiple target Before data block carries out parallel processing, in addition to：

21. data processing equipment according to claim 20, it is characterised in that the thread obtains from the shared drive Target data block corresponding to taking simultaneously is handled.

22. data processing equipment according to claim 20, it is characterised in that the thread obtains from the shared drive Before taking corresponding target data block and being handled, in addition to：

The shared drive reads the target data block from the preset number register.