CN107707330A - SC LDPC codes decoding acceleration system based on GPU - Google Patents

SC LDPC codes decoding acceleration system based on GPU Download PDF

Info

Publication number
CN107707330A
CN107707330A CN201710770624.5A CN201710770624A CN107707330A CN 107707330 A CN107707330 A CN 107707330A CN 201710770624 A CN201710770624 A CN 201710770624A CN 107707330 A CN107707330 A CN 107707330A
Authority
CN
China
Prior art keywords
cuda
ldpc codes
equipment end
decoding
gpu
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710770624.5A
Other languages
Chinese (zh)
Inventor
吴晓楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN201710770624.5A priority Critical patent/CN107707330A/en
Publication of CN107707330A publication Critical patent/CN107707330A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/004Arrangements for detecting or preventing errors in the information received by using forward error control
    • H04L1/0056Systems characterized by the type of code used
    • H04L1/0059Convolutional codes
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M13/00Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes
    • H03M13/03Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words
    • H03M13/05Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words using block codes, i.e. a predetermined number of check bits joined to a predetermined number of information bits
    • H03M13/11Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words using block codes, i.e. a predetermined number of check bits joined to a predetermined number of information bits using multiple parity bits
    • H03M13/1102Codes on graphs and decoding on graphs, e.g. low-density parity check [LDPC] codes
    • H03M13/1105Decoding
    • H03M13/1131Scheduling of bit node or check node processing
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M13/00Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes
    • H03M13/03Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words
    • H03M13/05Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words using block codes, i.e. a predetermined number of check bits joined to a predetermined number of information bits
    • H03M13/11Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words using block codes, i.e. a predetermined number of check bits joined to a predetermined number of information bits using multiple parity bits
    • H03M13/1102Codes on graphs and decoding on graphs, e.g. low-density parity check [LDPC] codes
    • H03M13/1148Structural properties of the code parity-check or generator matrix
    • H03M13/1154Low-density parity-check convolutional codes [LDPC-CC]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/004Arrangements for detecting or preventing errors in the information received by using forward error control
    • H04L1/0045Arrangements at the receiver end
    • H04L1/0047Decoding adapted to other signal detection operation
    • H04L1/005Iterative decoding, including iteration between signal detection and decoding operation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Mathematical Physics (AREA)
  • Error Detection And Correction (AREA)

Abstract

The invention discloses a kind of SC LDPC codes based on GPU to decode acceleration system, including programming platform CUDA, hybrid decoding device, host side and equipment end, the programming platform CUDA is the multiple programming platform CUDA that NVIDIA companies provide, programming platform CUDA is used for the source code for writing SC LDPC codes, and programming platform CUDA is used to the source code for the SC LDPC codes newly write being transferred to hybrid decoding device, the inside of hybrid decoding device is provided with multiple processors, and each processor possesses J+1 shift register and stores intermediate variable information for it, hybrid decoding device enters row decoding using iterative decoding algorithm, hybrid decoding device is used to the source code of the SC LDPC codes of reception being converted to the instruction that host side and equipment end can perform.The present invention can then be write on CPU codes and GPU code together using programming platform CUDA, it is convenient and efficient to realize, the multithreading advantage of equipment end is made full use of in addition to be write again to SC LDPC codes, is reduced the system emulation time of decoding, is improved decoding speed.

Description

SC-LDPC codes decoding acceleration system based on GPU
Technical field
The present invention relates to SC-LDPC codes decoding acceleration technique field, more particularly to a kind of SC-LDPC codes based on GPU to translate Code acceleration system.
Background technology
When signal is transmitted by noisy channel, interference and error are inevitable, in order to improve the reliability of communication, Need the purpose for reaching Error Control by channel coding, wherein Space Coupling Technology of Multi causes all kinds of codes all table under a variety of channels Reveal more excellent performance and be widely studied, the concept of Space Coupling is proposed with the introducing of LDPC convolutional-code, SC- LDPC code is the most frequently used convolution LDPC code, and in the case where the equivalent form of value of check matrix H represents, its H-matrix is equivalent to LDPC packets Code is coupled by the process of duplication, reconnection line, and its decoder architecture is simple, and has concurrency, in same hardware Error-correcting performance is more prominent under implementation complexity.Code of the present invention is exactly such convolution LDPC code, its check matrix band Have periodically, coding and decoding flow can be greatly simplified, and its decoder is pipeline decoding device structure, has concurrency, is more easy to Realized in hardware.Although channel decoding algorithm is varied, it is still at most traditional CPU platforms to use at present, is translated The system emulation time loss of code module is excessive, and decoding speed is slower.
The content of the invention
Based on technical problem existing for background technology, the present invention proposes the SC-LDPC codes decoding based on GPU and accelerates system System.
SC-LDPC codes decoding acceleration system proposed by the present invention based on GPU, including programming platform CUDA, hybrid decoding Device, host side and equipment end, the multiple programming platform CUDA that the programming platform CUDA provides for NVIDIA companies, the programming Platform CUDA is used for the source code for writing SC-LDPC codes, and programming platform CUDA is for the original for the SC-LDPC codes that will newly write Beginning code transfer is on hybrid decoding device, and the inside of the hybrid decoding device is provided with multiple processors, and each processor possesses J + 1 shift register stores intermediate variable information for it, and the hybrid decoding device enters row decoding using iterative decoding algorithm, institute Hybrid decoding device is stated to be used to the source code of the SC-LDPC codes of reception being converted to the finger that host side and equipment end can perform Order, the hybrid decoding device is used to the instruction that host side can perform being transferred to host side, and host side is provided with original CPU Program, the original CPU programs include Void LDPCCCDecode::LdpcccDecodeText () function, PipelineDecodeInitial () function and PipelineDecodeInitial_Phase2_CUDA () kernel function, institute State Void LDPCCCDecode::LdpcccDecodeText () function is Void LDPCCCDecode:: LdpcccDecodeText (double*Lchan, int time, int*CodeIn, int*errNo) function, Void LDPCCCDecode::LdpcccDecodeText (double*Lchan, int time, int*CodeIn, int*errNo) letter Several programs is as follows:
PipelineDecodeInitial (ldpccc_chan, time) ..., in this section of code PipelineDecodeInitial () function can be separated, and be placed on before circulation, and the code after optimization is as follows:
The PipelineDecodeInitial () function mainly distributes memory space for H and V, and H is used to deposit Check matrix, V are used to deposit channel information, and the hybrid decoding device is used to the instruction that equipment end can perform being transferred to equipment End, and equipment end is provided with PipelineDecodeInitial_Phase2_CUDA () kernel function and Decode_CUDA.cu File, the code of the PipelineDecodeInitial_Phase2_CUDA () kernel function are stored in Decode_ On CUDA.cu files.
Preferably, the hybrid decoding device is the decoder of convolution LDPC code, and hybrid decoding device is pipeline decoding device Structure.
Preferably, the host side is simple decoding module CPU, and the data of the host side can copy equipment to End, and the data of equipment end can also copy host side to.
Preferably, the host side includes the core processors of intel-i5 4,590 4,8G internal memories and 3.3GHz dominant frequency, and main Generator terminal is programmed using the platforms of Visual Studio 2010.
Preferably, the equipment end for can concurrent operation and the high decoding module GPU of complexity, the equipment end include GTX970 video cards, 1664 core processors, 4G video memorys and 1.02GHz dominant frequency, and equipment end uses the cuda7.5 that NVIDIA is released Platform is programmed.
Preferably, thread block and Thread Count number configuration optimization module, and thread block and line are additionally provided with the equipment end Number of passes number configuration optimization module uses GTx970 processors, and the GTx970 processors have 13 SM, and each SM has 128 CUDA core, the configuration of the thread block and Thread Count number configuration optimization module are as follows:Set
(1*BLKSZ+255) 256 blocks, in each block during 64 threads:
dim3 gridDim_Horizontal(1,/*(CHKSZ+127)/128*/12);
PipelineDecodeProcessorHorizontal_CUDA<<<gridDim_Horizontal,128>>> (dev_H,time,ID,d_Ltab);
dim3 gridDim_Vertical(1,/*(BLKSZ+256)/256*/14);
PipelineDecodeProcessor Vertical_CUDA<<<gridDim_Vertical,192>>>(dev_ H,dev_V,time,ID);
PipelineDecodeFinalProcessorDecision_CUDA<<</*(BLKSZ+511)/512*/14,192 >>>(dev_H,dev_V,dev_Beta,dev_dout,dev_ldpcccCode,time,dev_errNo)。
Preferably, the PipelineDecodeInitial_Phase2_CUDA () kernel function includes while () letter Number, and Al in while () function and As, using register storage, the while () function is while (tid<BLKSZ), And while () function uses equipment end multi-threaded parallel assignment, internal memory is covered as far as possible and reads the delay brought.
Beneficial effects of the present invention:
1st, the code that uses of the present invention is the more preferable SC-LDPC codes of error-correcting performance, and the data copy of host side and equipment end Number is few, greatly reduces the time consumed during the decoding of SC-LDPC codes, and then the decoding speed of SC-LDPC codes Great lifting is obtained;
2nd, the present invention makes programming become simple efficient using being compiled using CUDA platforms, write CUDA codes when Wait, equipment end can be regarded as a computing device for being able to carry out and controlling thousands of individual threads, equipment end equivalent to The coprocessor of one host side, the copy function provided in addition using CUDA storehouses are completed between host side and equipment end back and forth Transfer, the decoding speed of SC-LDPC codes have obtained great lifting;
3rd, equipment end is provided with while () function, and the decoding of SC-LDPC codes is square from horizontal steps, vertical step and judgement etc. Face realizes optimization so that the decoding speed of SC-LDPC codes is fast;
The present invention can then be write on CPU codes and GPU code together using programming platform CUDA, and it is convenient and high to realize Effect, the multithreading advantage of equipment end is made full use of in addition to be write again to SC-LDPC codes, reduces the system emulation of decoding Time, improve decoding speed.
Brief description of the drawings
Fig. 1 is the fundamental diagram that the SC-LDPC codes proposed by the present invention based on GPU decode acceleration system;
Fig. 2 is the convolution LDPC for the hybrid decoding device that the SC-LDPC codes proposed by the present invention based on GPU decode acceleration system Code Tanner figures;
Fig. 3 is the decoding frame for the hybrid decoding device that the SC-LDPC codes proposed by the present invention based on GPU decode acceleration system Figure.
Embodiment
The present invention is made with reference to specific embodiment further to explain.
Embodiment
With reference to figure 1-3, the SC-LDPC codes decoding acceleration system based on GPU, including programming platform are proposed in the present embodiment CUDA, hybrid decoding device, host side and equipment end, programming platform CUDA are the multiple programming platform that NVIDIA companies provide CUDA, programming platform CUDA are used to writing the source code of SC-LDPC codes, and programming platform CUDA is used for the SC- that will newly write The source code of LDPC code is transferred on hybrid decoding device, and the inside of hybrid decoding device is provided with multiple processors, and each processing Device possesses J+1 shift register and stores intermediate variable information for it, i.e., each processor possesses J+1 shift register and supplies it Store intermediate variable information, hybrid decoding device enters row decoding using iterative decoding algorithm, and iterative decoding algorithm include initializing, Five steps, the hybrid decoding devices such as shift register displacement, vertical renewal, horizontal renewal and judgement are used for the SC- of reception The source code of LDPC code is converted to the instruction that host side and equipment end can perform, and hybrid decoding device is used for host side institute energy The instruction of execution is transferred to host side, and host side is provided with original CPU programs, and original CPU programs include Void LDPCCCDecode::LdpcccDecodeText () function, PipelineDecodeInitial () function and PipelineDecodeInitial_Phase2_CUDA () kernel function, Void LDPCCCDecode:: LdpcccDecodeText () function is Void LDPCCCDecode::ldpcccDecodeText(double*Lchan,int Time, int*CodeIn, int*errNo) function, Void LDPCCCDecode::ldpcccDecodeText(double* Lchan, int time, int*CodeIn, int*errNo) function program it is as follows:
PipelineDecodeInitial (ldpccc_chan, time) ..., in this section of code PipelineDecodeInitial () function can be separated, and be placed on before circulation, and the code after optimization is as follows:
PipelineDecodeInitial () function mainly distributes memory space for H and V, and H, which is used to deposit, to be verified Matrix, V are used to deposit channel information,
PipelineDecodeInitial () function is located at
LDPCCCDecode:In ldpcccDecodeTest () function, and
The function of PipelineDecodeInitial () function can be divided into two stages:First stage is reading H_ [] [] .txt files, are loaded data into H;Second stage to A the and B variable assignments in H, The reading part of txt data is corresponded in fresh code Decode.cu in PipelineDecodeInitial (),
Void LDPCCCDecode::PipelineDecodeInitial_Phase1 (int time) this function will be read The data got are saved in internal memory, and the memory address of dynamically distributes each time is preserved by sg_tmp array of pointers, pass through for All txt data files can be loaded into internal memory by () function, and for () function is in SimulationProcess () function Middle calling, the program of for () function are as follows:
For (int i=0;i<T;i++){
Ldpc.PipelineDecodeInitial_Phase1(i);
, hybrid decoding device is used to the instruction that equipment end can perform being transferred to equipment end, and equipment end is provided with PipelineDecodeInitial_Phase2_CUDA () kernel function and Decode_CUDA.cu files, The code of PipelineDecodeInitial_Phase2_CUDA () kernel function is stored on Decode_CUDA.cu files.
In the present embodiment, hybrid decoding device is the decoder of convolution LDPC code, and hybrid decoding device is pipeline decoding device Structure, host side are simple decoding module CPU, and the data of host side can copy equipment end to, and the data of host side are being set Storage output data in video memory space is opened up on standby end, now the kernel functions in equipment end are run, now kernel letters Number after video memory space reading input data is handled, copy echo and deposit space by the output data after the processing of kernel functions, if The data at standby end can also copy host side to, and host side copies handling for data to equipment end, and equipment end release is aobvious Space is deposited, host side includes the core processors of intel-i54590 tetra-, 8G internal memories and 3.3GHz dominant frequency, and host side uses Visual The platforms of Studio 2010 are programmed, equipment end for can concurrent operation and the high decoding module GPU of complexity, equipment end include GTX970 video cards, 1664 core processors, 4G video memorys and 1.02GHz dominant frequency, and equipment end uses the cuda7.5 that NVIDIA is released Platform is programmed, and thread block and Thread Count number configuration optimization module is additionally provided with equipment end, and thread block and thread are several Number configuration optimization module uses GTx970 processors, and GTx970 processors have 13 SM, and each SM has 128 CUDA core, The configuration of thread block and Thread Count number configuration optimization module is as follows:Set (1*BLKSZ+255) 256 blocks, in each block During 64 threads:
dim3 gridDim_Horizontal(1,/*(CHKSZ+127)/128*/12);
PipelineDecodeProcessorHorizontal_CUDA<<<
gridDim_Horizontal,128>>>(dev_H,time,ID,d_Ltab);
dim3 gridDim_Vertical(1,/*(BLKSZ+256)/256*/14);
PipelineDecodeProcessor Vertical_CUDA<<<gridDim_
Vertical,192>>>(dev_H,dev_V,time,ID);
PipelineDecodeFinalProcessorDecision_CUDA<<</*(BLKSZ+511)/512*/14,192 >>>(dev_H, dev_V, dev_Beta, dev_dout, dev_ldpcccCode, time, dev_errNo), PipelineDecodeInitial_Phase2_CUDA () kernel function includes while () function, and in while () function Al and As using register store, while () function is while (tid<BLKSZ), and while () function uses equipment end Multi-threaded parallel assignment, internal memory being covered as far as possible and reads the delay brought, while () function is also from horizontal steps, vertical step in addition Optimization is realized to the decoding speed of SC-LDPC codes in terms of rapid and judgement, the present invention then can be by CPU using programming platform CUDA Code and GPU code write on together, and it is convenient and efficient to realize, and makes full use of the multithreading advantage of equipment end to come to SC- in addition LDPC code is write again, is reduced the system emulation time of decoding, is improved decoding speed.
The foregoing is only a preferred embodiment of the present invention, but protection scope of the present invention be not limited thereto, Any one skilled in the art the invention discloses technical scope in, technique according to the invention scheme and its Inventive concept is subject to equivalent substitution or change, should all be included within the scope of the present invention.

Claims (7)

1. the SC-LDPC codes decoding acceleration system based on GPU, including programming platform CUDA, hybrid decoding device, host side and equipment End, it is characterised in that the programming platform CUDA is used for the source code for writing SC-LDPC codes, and programming platform CUDA is used for The source code for the SC-LDPC codes newly write is transferred on hybrid decoding device, the inside of the hybrid decoding device is provided with multiple Processor, and each processor possesses J+1 shift register and intermediate variable information is stored for it, the hybrid decoding device uses Iterative decoding algorithm enters row decoding, and the hybrid decoding device is used to the source code of the SC-LDPC codes of reception being converted to main frame The instruction that end and equipment end can perform, the hybrid decoding device are used to the instruction that host side can perform being transferred to main frame End, and host side is provided with original CPU programs, the original CPU programs include Void LDPCCCDecode:: LdpcccDecodeText () function, PipelineDecodeInitial () function and PipelineDecodeInitial_ Phase2_CUDA () kernel function, the hybrid decoding device are used to the instruction that equipment end can perform being transferred to equipment end, And equipment end is provided with PipelineDecodeInitial_Phase2_CUDA () kernel function and Decode_CUDA.cu texts Part, the code of the PipelineDecodeInitial_Phase2_CUDA () kernel function are stored in Decode_CUDA.cu On file.
2. the SC-LDPC codes decoding acceleration system according to claim 1 based on GPU, it is characterised in that the mixing is translated Code device is the decoder of convolution LDPC code, and hybrid decoding device is pipeline decoding device structure.
3. the SC-LDPC codes decoding acceleration system according to claim 1 based on GPU, it is characterised in that the host side For simple decoding module CPU, the data of the host side can copy equipment end to, and the data of equipment end can also copy To host side.
4. the SC-LDPC codes decoding acceleration system according to claim 1 based on GPU, it is characterised in that the host side Including the core processors of intel-i5 4,590 4,8G internal memories and 3.3GHz dominant frequency, and host side uses Visual Studio 2010 Platform is programmed.
5. the SC-LDPC codes decoding acceleration system according to claim 1 based on GPU, it is characterised in that the equipment end For can concurrent operation decoding module GPU, the equipment end include GTX970 video cards, 1664 core processors, 4G video memorys and 1.02GHz dominant frequency, and the cuda7.5 platforms that equipment end is released using NVIDIA are programmed.
6. the SC-LDPC codes decoding acceleration system according to claim 1 based on GPU, it is characterised in that the equipment end On be additionally provided with thread block and Thread Count number configuration optimization module, and thread block and Thread Count number configuration optimization module use GTx970 processors, the GTx970 processors have 13 SM, and each SM has 128 CUDA core.
7. the SC-LDPC codes decoding acceleration system according to claim 1 based on GPU, it is characterised in that described PipelineDecodeInitial_Phase2_CUDA () kernel function includes while () function, and in while () function Al and As using register store.
CN201710770624.5A 2017-08-31 2017-08-31 SC LDPC codes decoding acceleration system based on GPU Pending CN107707330A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710770624.5A CN107707330A (en) 2017-08-31 2017-08-31 SC LDPC codes decoding acceleration system based on GPU

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710770624.5A CN107707330A (en) 2017-08-31 2017-08-31 SC LDPC codes decoding acceleration system based on GPU

Publications (1)

Publication Number Publication Date
CN107707330A true CN107707330A (en) 2018-02-16

Family

ID=61170039

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710770624.5A Pending CN107707330A (en) 2017-08-31 2017-08-31 SC LDPC codes decoding acceleration system based on GPU

Country Status (1)

Country Link
CN (1) CN107707330A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108462495A (en) * 2018-04-03 2018-08-28 北京航空航天大学 A kind of multielement LDPC code high-speed parallel decoder and its interpretation method based on GPU

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102932003A (en) * 2012-09-07 2013-02-13 上海交通大学 Accelerated QC-LDPC (Quasi-Cyclic Low-Density Parity-Check Code) decoding method based on GPU (Graphics Processing Unit) framework
US8930789B1 (en) * 2013-01-23 2015-01-06 Viasat, Inc. High-speed LDPC decoder
CN106788467A (en) * 2016-11-28 2017-05-31 华中科技大学 A kind of Raptor Code coding methods based on CUDA, interpretation method and system
CN106992856A (en) * 2017-03-29 2017-07-28 山西大学 The data coordinating method of extensive continuous variable quantum key distribution based on GPU

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102932003A (en) * 2012-09-07 2013-02-13 上海交通大学 Accelerated QC-LDPC (Quasi-Cyclic Low-Density Parity-Check Code) decoding method based on GPU (Graphics Processing Unit) framework
US8930789B1 (en) * 2013-01-23 2015-01-06 Viasat, Inc. High-speed LDPC decoder
CN106788467A (en) * 2016-11-28 2017-05-31 华中科技大学 A kind of Raptor Code coding methods based on CUDA, interpretation method and system
CN106992856A (en) * 2017-03-29 2017-07-28 山西大学 The data coordinating method of extensive continuous variable quantum key distribution based on GPU

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHI H. CHAN等: "Parallel decoding of LDPC convolutional codes using OpenMP and GPU", 《2012 IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS (ISCC)》 *
鲁邹晨: "LDPC码并行译码算法的研究及其基于CUDA的实现", 《中国优秀博士学位论文全文数据库》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108462495A (en) * 2018-04-03 2018-08-28 北京航空航天大学 A kind of multielement LDPC code high-speed parallel decoder and its interpretation method based on GPU

Similar Documents

Publication Publication Date Title
US20220012598A1 (en) Methods and apparatus for matrix and vector storage and operations
US8554820B2 (en) Optimized corner turns for local storage and bandwidth reduction
US11080227B2 (en) Compiler flow logic for reconfigurable architectures
KR20220129107A (en) Matrix multiplier
US20140181477A1 (en) Compressing Execution Cycles For Divergent Execution In A Single Instruction Multiple Data (SIMD) Processor
CN101061460B (en) Micro processor device and method for shuffle operations
CN102053816B (en) Data shuffling unit with switch matrix memory and shuffling method thereof
CN105612509A (en) Methods, apparatus, instructions and logic to provide vector sub-byte decompression functionality
CN105049061A (en) Advanced calculation-based high-dimensional polarization code decoder and polarization code decoding method
CN106911336B (en) High-speed parallel low-density parity check decoder with multi-core scheduling and decoding method thereof
Kang et al. Parallel LDPC decoder implementation on GPU based on unbalanced memory coalescing
CN116775518A (en) Method and apparatus for efficient access to multidimensional data structures and/or other large data blocks
US11580397B2 (en) Tensor dropout using a mask having a different ordering than the tensor
Balevic Parallel variable-length encoding on GPGPUs
CN107707330A (en) SC LDPC codes decoding acceleration system based on GPU
US20130262787A1 (en) Scalable memory architecture for turbo encoding
US11500962B1 (en) Emulating fine-grained sparsity in a systolic array
CN102201817B (en) Low-power-consumption LDPC decoder based on optimization of memory folding architecture
Huang et al. RM-STC: Row-Merge Dataflow Inspired GPU Sparse Tensor Core for Energy-Efficient Sparse Acceleration
CN102594369A (en) Quasi-cyclic low-density parity check code decoder based on FPGA (field-programmable gate array) and decoding method
US11328209B1 (en) Dual cycle tensor dropout in a neural network
CN111966405B (en) Polar code high-speed parallel decoding method based on GPU
US11803736B1 (en) Fine-grained sparsity computations in systolic array
CN111368250B (en) Data processing system, method and equipment based on Fourier transformation/inverse transformation
US20040111589A1 (en) Asynchronous multiple-order issue system architecture

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20180216