CN107707330A

CN107707330A - SC LDPC codes decoding acceleration system based on GPU

Info

Publication number: CN107707330A
Application number: CN201710770624.5A
Authority: CN
Inventors: 吴晓楠
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2017-08-31
Filing date: 2017-08-31
Publication date: 2018-02-16

Abstract

The invention discloses a kind of SC LDPC codes based on GPU to decode acceleration system, including programming platform CUDA, hybrid decoding device, host side and equipment end, the programming platform CUDA is the multiple programming platform CUDA that NVIDIA companies provide, programming platform CUDA is used for the source code for writing SC LDPC codes, and programming platform CUDA is used to the source code for the SC LDPC codes newly write being transferred to hybrid decoding device, the inside of hybrid decoding device is provided with multiple processors, and each processor possesses J+1 shift register and stores intermediate variable information for it, hybrid decoding device enters row decoding using iterative decoding algorithm, hybrid decoding device is used to the source code of the SC LDPC codes of reception being converted to the instruction that host side and equipment end can perform.The present invention can then be write on CPU codes and GPU code together using programming platform CUDA, it is convenient and efficient to realize, the multithreading advantage of equipment end is made full use of in addition to be write again to SC LDPC codes, is reduced the system emulation time of decoding, is improved decoding speed.

Description

SC-LDPC codes decoding acceleration system based on GPU

Technical field

The present invention relates to SC-LDPC codes decoding acceleration technique field, more particularly to a kind of SC-LDPC codes based on GPU to translate Code acceleration system.

Background technology

When signal is transmitted by noisy channel, interference and error are inevitable, in order to improve the reliability of communication, Need the purpose for reaching Error Control by channel coding, wherein Space Coupling Technology of Multi causes all kinds of codes all table under a variety of channels Reveal more excellent performance and be widely studied, the concept of Space Coupling is proposed with the introducing of LDPC convolutional-code, SC- LDPC code is the most frequently used convolution LDPC code, and in the case where the equivalent form of value of check matrix H represents, its H-matrix is equivalent to LDPC packets Code is coupled by the process of duplication, reconnection line, and its decoder architecture is simple, and has concurrency, in same hardware Error-correcting performance is more prominent under implementation complexity.Code of the present invention is exactly such convolution LDPC code, its check matrix band Have periodically, coding and decoding flow can be greatly simplified, and its decoder is pipeline decoding device structure, has concurrency, is more easy to Realized in hardware.Although channel decoding algorithm is varied, it is still at most traditional CPU platforms to use at present, is translated The system emulation time loss of code module is excessive, and decoding speed is slower.

The content of the invention

Based on technical problem existing for background technology, the present invention proposes the SC-LDPC codes decoding based on GPU and accelerates system System.

SC-LDPC codes decoding acceleration system proposed by the present invention based on GPU, including programming platform CUDA, hybrid decoding Device, host side and equipment end, the multiple programming platform CUDA that the programming platform CUDA provides for NVIDIA companies, the programming Platform CUDA is used for the source code for writing SC-LDPC codes, and programming platform CUDA is for the original for the SC-LDPC codes that will newly write Beginning code transfer is on hybrid decoding device, and the inside of the hybrid decoding device is provided with multiple processors, and each processor possesses J + 1 shift register stores intermediate variable information for it, and the hybrid decoding device enters row decoding using iterative decoding algorithm, institute Hybrid decoding device is stated to be used to the source code of the SC-LDPC codes of reception being converted to the finger that host side and equipment end can perform Order, the hybrid decoding device is used to the instruction that host side can perform being transferred to host side, and host side is provided with original CPU Program, the original CPU programs include Void LDPCCCDecode::LdpcccDecodeText () function, PipelineDecodeInitial () function and PipelineDecodeInitial_Phase2_CUDA () kernel function, institute State Void LDPCCCDecode::LdpcccDecodeText () function is Void LDPCCCDecode:: LdpcccDecodeText (double*Lchan, int time, int*CodeIn, int*errNo) function, Void LDPCCCDecode::LdpcccDecodeText (double*Lchan, int time, int*CodeIn, int*errNo) letter Several programs is as follows：

PipelineDecodeInitial (ldpccc_chan, time) ..., in this section of code PipelineDecodeInitial () function can be separated, and be placed on before circulation, and the code after optimization is as follows：

The PipelineDecodeInitial () function mainly distributes memory space for H and V, and H is used to deposit Check matrix, V are used to deposit channel information, and the hybrid decoding device is used to the instruction that equipment end can perform being transferred to equipment End, and equipment end is provided with PipelineDecodeInitial_Phase2_CUDA () kernel function and Decode_CUDA.cu File, the code of the PipelineDecodeInitial_Phase2_CUDA () kernel function are stored in Decode_ On CUDA.cu files.

Preferably, the hybrid decoding device is the decoder of convolution LDPC code, and hybrid decoding device is pipeline decoding device Structure.

Preferably, the host side is simple decoding module CPU, and the data of the host side can copy equipment to End, and the data of equipment end can also copy host side to.

Preferably, the host side includes the core processors of intel-i5 4,590 4,8G internal memories and 3.3GHz dominant frequency, and main Generator terminal is programmed using the platforms of Visual Studio 2010.

Preferably, the equipment end for can concurrent operation and the high decoding module GPU of complexity, the equipment end include GTX970 video cards, 1664 core processors, 4G video memorys and 1.02GHz dominant frequency, and equipment end uses the cuda7.5 that NVIDIA is released Platform is programmed.

Preferably, thread block and Thread Count number configuration optimization module, and thread block and line are additionally provided with the equipment end Number of passes number configuration optimization module uses GTx970 processors, and the GTx970 processors have 13 SM, and each SM has 128 CUDA core, the configuration of the thread block and Thread Count number configuration optimization module are as follows：Set

(1*BLKSZ+255) 256 blocks, in each block during 64 threads：

dim3 gridDim_Horizontal(1,/*(CHKSZ+127)/128*/12)；

PipelineDecodeProcessorHorizontal_CUDA<<<gridDim_Horizontal,128>>> (dev_H,time,ID,d_Ltab)；

dim3 gridDim_Vertical(1,/*(BLKSZ+256)/256*/14)；

PipelineDecodeProcessor Vertical_CUDA<<<gridDim_Vertical,192>>>(dev_ H,dev_V,time,ID)；

PipelineDecodeFinalProcessorDecision_CUDA<<</*(BLKSZ+511)/512*/14,192 >>>(dev_H,dev_V,dev_Beta,dev_dout,dev_ldpcccCode,time,dev_errNo)。

Preferably, the PipelineDecodeInitial_Phase2_CUDA () kernel function includes while () letter Number, and Al in while () function and As, using register storage, the while () function is while (tid<BLKSZ), And while () function uses equipment end multi-threaded parallel assignment, internal memory is covered as far as possible and reads the delay brought.

Beneficial effects of the present invention：

1st, the code that uses of the present invention is the more preferable SC-LDPC codes of error-correcting performance, and the data copy of host side and equipment end Number is few, greatly reduces the time consumed during the decoding of SC-LDPC codes, and then the decoding speed of SC-LDPC codes Great lifting is obtained；

2nd, the present invention makes programming become simple efficient using being compiled using CUDA platforms, write CUDA codes when Wait, equipment end can be regarded as a computing device for being able to carry out and controlling thousands of individual threads, equipment end equivalent to The coprocessor of one host side, the copy function provided in addition using CUDA storehouses are completed between host side and equipment end back and forth Transfer, the decoding speed of SC-LDPC codes have obtained great lifting；

3rd, equipment end is provided with while () function, and the decoding of SC-LDPC codes is square from horizontal steps, vertical step and judgement etc. Face realizes optimization so that the decoding speed of SC-LDPC codes is fast；

The present invention can then be write on CPU codes and GPU code together using programming platform CUDA, and it is convenient and high to realize Effect, the multithreading advantage of equipment end is made full use of in addition to be write again to SC-LDPC codes, reduces the system emulation of decoding Time, improve decoding speed.

Brief description of the drawings

Fig. 1 is the fundamental diagram that the SC-LDPC codes proposed by the present invention based on GPU decode acceleration system；

Fig. 2 is the convolution LDPC for the hybrid decoding device that the SC-LDPC codes proposed by the present invention based on GPU decode acceleration system Code Tanner figures；

Fig. 3 is the decoding frame for the hybrid decoding device that the SC-LDPC codes proposed by the present invention based on GPU decode acceleration system Figure.

Embodiment

The present invention is made with reference to specific embodiment further to explain.

Embodiment

With reference to figure 1-3, the SC-LDPC codes decoding acceleration system based on GPU, including programming platform are proposed in the present embodiment CUDA, hybrid decoding device, host side and equipment end, programming platform CUDA are the multiple programming platform that NVIDIA companies provide CUDA, programming platform CUDA are used to writing the source code of SC-LDPC codes, and programming platform CUDA is used for the SC- that will newly write The source code of LDPC code is transferred on hybrid decoding device, and the inside of hybrid decoding device is provided with multiple processors, and each processing Device possesses J+1 shift register and stores intermediate variable information for it, i.e., each processor possesses J+1 shift register and supplies it Store intermediate variable information, hybrid decoding device enters row decoding using iterative decoding algorithm, and iterative decoding algorithm include initializing, Five steps, the hybrid decoding devices such as shift register displacement, vertical renewal, horizontal renewal and judgement are used for the SC- of reception The source code of LDPC code is converted to the instruction that host side and equipment end can perform, and hybrid decoding device is used for host side institute energy The instruction of execution is transferred to host side, and host side is provided with original CPU programs, and original CPU programs include Void LDPCCCDecode::LdpcccDecodeText () function, PipelineDecodeInitial () function and PipelineDecodeInitial_Phase2_CUDA () kernel function, Void LDPCCCDecode:: LdpcccDecodeText () function is Void LDPCCCDecode::ldpcccDecodeText(double*Lchan,int Time, int*CodeIn, int*errNo) function, Void LDPCCCDecode::ldpcccDecodeText(double* Lchan, int time, int*CodeIn, int*errNo) function program it is as follows：

PipelineDecodeInitial () function mainly distributes memory space for H and V, and H, which is used to deposit, to be verified Matrix, V are used to deposit channel information,

PipelineDecodeInitial () function is located at

LDPCCCDecode:In ldpcccDecodeTest () function, and

The function of PipelineDecodeInitial () function can be divided into two stages：First stage is reading H_ [] [] .txt files, are loaded data into H；Second stage to A the and B variable assignments in H, The reading part of txt data is corresponded in fresh code Decode.cu in PipelineDecodeInitial (),

Void LDPCCCDecode::PipelineDecodeInitial_Phase1 (int time) this function will be read The data got are saved in internal memory, and the memory address of dynamically distributes each time is preserved by sg_tmp array of pointers, pass through for All txt data files can be loaded into internal memory by () function, and for () function is in SimulationProcess () function Middle calling, the program of for () function are as follows：

For (int i=0；i<T；i++){

Ldpc.PipelineDecodeInitial_Phase1(i)；

, hybrid decoding device is used to the instruction that equipment end can perform being transferred to equipment end, and equipment end is provided with PipelineDecodeInitial_Phase2_CUDA () kernel function and Decode_CUDA.cu files, The code of PipelineDecodeInitial_Phase2_CUDA () kernel function is stored on Decode_CUDA.cu files.

In the present embodiment, hybrid decoding device is the decoder of convolution LDPC code, and hybrid decoding device is pipeline decoding device Structure, host side are simple decoding module CPU, and the data of host side can copy equipment end to, and the data of host side are being set Storage output data in video memory space is opened up on standby end, now the kernel functions in equipment end are run, now kernel letters Number after video memory space reading input data is handled, copy echo and deposit space by the output data after the processing of kernel functions, if The data at standby end can also copy host side to, and host side copies handling for data to equipment end, and equipment end release is aobvious Space is deposited, host side includes the core processors of intel-i54590 tetra-, 8G internal memories and 3.3GHz dominant frequency, and host side uses Visual The platforms of Studio 2010 are programmed, equipment end for can concurrent operation and the high decoding module GPU of complexity, equipment end include GTX970 video cards, 1664 core processors, 4G video memorys and 1.02GHz dominant frequency, and equipment end uses the cuda7.5 that NVIDIA is released Platform is programmed, and thread block and Thread Count number configuration optimization module is additionally provided with equipment end, and thread block and thread are several Number configuration optimization module uses GTx970 processors, and GTx970 processors have 13 SM, and each SM has 128 CUDA core, The configuration of thread block and Thread Count number configuration optimization module is as follows：Set (1*BLKSZ+255) 256 blocks, in each block During 64 threads：

dim3 gridDim_Horizontal(1,/*(CHKSZ+127)/128*/12)；

PipelineDecodeProcessorHorizontal_CUDA<<<

gridDim_Horizontal,128>>>(dev_H,time,ID,d_Ltab)；

dim3 gridDim_Vertical(1,/*(BLKSZ+256)/256*/14)；

PipelineDecodeProcessor Vertical_CUDA<<<gridDim_

Vertical,192>>>(dev_H,dev_V,time,ID)；

PipelineDecodeFinalProcessorDecision_CUDA<<</*(BLKSZ+511)/512*/14,192 >>>(dev_H, dev_V, dev_Beta, dev_dout, dev_ldpcccCode, time, dev_errNo), PipelineDecodeInitial_Phase2_CUDA () kernel function includes while () function, and in while () function Al and As using register store, while () function is while (tid<BLKSZ), and while () function uses equipment end Multi-threaded parallel assignment, internal memory being covered as far as possible and reads the delay brought, while () function is also from horizontal steps, vertical step in addition Optimization is realized to the decoding speed of SC-LDPC codes in terms of rapid and judgement, the present invention then can be by CPU using programming platform CUDA Code and GPU code write on together, and it is convenient and efficient to realize, and makes full use of the multithreading advantage of equipment end to come to SC- in addition LDPC code is write again, is reduced the system emulation time of decoding, is improved decoding speed.

The foregoing is only a preferred embodiment of the present invention, but protection scope of the present invention be not limited thereto, Any one skilled in the art the invention discloses technical scope in, technique according to the invention scheme and its Inventive concept is subject to equivalent substitution or change, should all be included within the scope of the present invention.

Claims

1. the SC-LDPC codes decoding acceleration system based on GPU, including programming platform CUDA, hybrid decoding device, host side and equipment End, it is characterised in that the programming platform CUDA is used for the source code for writing SC-LDPC codes, and programming platform CUDA is used for The source code for the SC-LDPC codes newly write is transferred on hybrid decoding device, the inside of the hybrid decoding device is provided with multiple Processor, and each processor possesses J+1 shift register and intermediate variable information is stored for it, the hybrid decoding device uses Iterative decoding algorithm enters row decoding, and the hybrid decoding device is used to the source code of the SC-LDPC codes of reception being converted to main frame The instruction that end and equipment end can perform, the hybrid decoding device are used to the instruction that host side can perform being transferred to main frame End, and host side is provided with original CPU programs, the original CPU programs include Void LDPCCCDecode:: LdpcccDecodeText () function, PipelineDecodeInitial () function and PipelineDecodeInitial_ Phase2_CUDA () kernel function, the hybrid decoding device are used to the instruction that equipment end can perform being transferred to equipment end, And equipment end is provided with PipelineDecodeInitial_Phase2_CUDA () kernel function and Decode_CUDA.cu texts Part, the code of the PipelineDecodeInitial_Phase2_CUDA () kernel function are stored in Decode_CUDA.cu On file.

2. the SC-LDPC codes decoding acceleration system according to claim 1 based on GPU, it is characterised in that the mixing is translated Code device is the decoder of convolution LDPC code, and hybrid decoding device is pipeline decoding device structure.

3. the SC-LDPC codes decoding acceleration system according to claim 1 based on GPU, it is characterised in that the host side For simple decoding module CPU, the data of the host side can copy equipment end to, and the data of equipment end can also copy To host side.

4. the SC-LDPC codes decoding acceleration system according to claim 1 based on GPU, it is characterised in that the host side Including the core processors of intel-i5 4,590 4,8G internal memories and 3.3GHz dominant frequency, and host side uses Visual Studio 2010 Platform is programmed.

5. the SC-LDPC codes decoding acceleration system according to claim 1 based on GPU, it is characterised in that the equipment end For can concurrent operation decoding module GPU, the equipment end include GTX970 video cards, 1664 core processors, 4G video memorys and 1.02GHz dominant frequency, and the cuda7.5 platforms that equipment end is released using NVIDIA are programmed.

6. the SC-LDPC codes decoding acceleration system according to claim 1 based on GPU, it is characterised in that the equipment end On be additionally provided with thread block and Thread Count number configuration optimization module, and thread block and Thread Count number configuration optimization module use GTx970 processors, the GTx970 processors have 13 SM, and each SM has 128 CUDA core.

7. the SC-LDPC codes decoding acceleration system according to claim 1 based on GPU, it is characterised in that described PipelineDecodeInitial_Phase2_CUDA () kernel function includes while () function, and in while () function Al and As using register store.