CN107707330A - SC LDPC codes decoding acceleration system based on GPU - Google Patents
SC LDPC codes decoding acceleration system based on GPU Download PDFInfo
- Publication number
- CN107707330A CN107707330A CN201710770624.5A CN201710770624A CN107707330A CN 107707330 A CN107707330 A CN 107707330A CN 201710770624 A CN201710770624 A CN 201710770624A CN 107707330 A CN107707330 A CN 107707330A
- Authority
- CN
- China
- Prior art keywords
- cuda
- ldpc codes
- equipment end
- decoding
- gpu
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L1/00—Arrangements for detecting or preventing errors in the information received
- H04L1/004—Arrangements for detecting or preventing errors in the information received by using forward error control
- H04L1/0056—Systems characterized by the type of code used
- H04L1/0059—Convolutional codes
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M13/00—Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes
- H03M13/03—Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words
- H03M13/05—Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words using block codes, i.e. a predetermined number of check bits joined to a predetermined number of information bits
- H03M13/11—Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words using block codes, i.e. a predetermined number of check bits joined to a predetermined number of information bits using multiple parity bits
- H03M13/1102—Codes on graphs and decoding on graphs, e.g. low-density parity check [LDPC] codes
- H03M13/1105—Decoding
- H03M13/1131—Scheduling of bit node or check node processing
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M13/00—Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes
- H03M13/03—Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words
- H03M13/05—Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words using block codes, i.e. a predetermined number of check bits joined to a predetermined number of information bits
- H03M13/11—Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words using block codes, i.e. a predetermined number of check bits joined to a predetermined number of information bits using multiple parity bits
- H03M13/1102—Codes on graphs and decoding on graphs, e.g. low-density parity check [LDPC] codes
- H03M13/1148—Structural properties of the code parity-check or generator matrix
- H03M13/1154—Low-density parity-check convolutional codes [LDPC-CC]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L1/00—Arrangements for detecting or preventing errors in the information received
- H04L1/004—Arrangements for detecting or preventing errors in the information received by using forward error control
- H04L1/0045—Arrangements at the receiver end
- H04L1/0047—Decoding adapted to other signal detection operation
- H04L1/005—Iterative decoding, including iteration between signal detection and decoding operation
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Probability & Statistics with Applications (AREA)
- Theoretical Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Mathematical Physics (AREA)
- Error Detection And Correction (AREA)
Abstract
The invention discloses a kind of SC LDPC codes based on GPU to decode acceleration system, including programming platform CUDA, hybrid decoding device, host side and equipment end, the programming platform CUDA is the multiple programming platform CUDA that NVIDIA companies provide, programming platform CUDA is used for the source code for writing SC LDPC codes, and programming platform CUDA is used to the source code for the SC LDPC codes newly write being transferred to hybrid decoding device, the inside of hybrid decoding device is provided with multiple processors, and each processor possesses J+1 shift register and stores intermediate variable information for it, hybrid decoding device enters row decoding using iterative decoding algorithm, hybrid decoding device is used to the source code of the SC LDPC codes of reception being converted to the instruction that host side and equipment end can perform.The present invention can then be write on CPU codes and GPU code together using programming platform CUDA, it is convenient and efficient to realize, the multithreading advantage of equipment end is made full use of in addition to be write again to SC LDPC codes, is reduced the system emulation time of decoding, is improved decoding speed.
Description
Technical field
The present invention relates to SC-LDPC codes decoding acceleration technique field, more particularly to a kind of SC-LDPC codes based on GPU to translate
Code acceleration system.
Background technology
When signal is transmitted by noisy channel, interference and error are inevitable, in order to improve the reliability of communication,
Need the purpose for reaching Error Control by channel coding, wherein Space Coupling Technology of Multi causes all kinds of codes all table under a variety of channels
Reveal more excellent performance and be widely studied, the concept of Space Coupling is proposed with the introducing of LDPC convolutional-code, SC-
LDPC code is the most frequently used convolution LDPC code, and in the case where the equivalent form of value of check matrix H represents, its H-matrix is equivalent to LDPC packets
Code is coupled by the process of duplication, reconnection line, and its decoder architecture is simple, and has concurrency, in same hardware
Error-correcting performance is more prominent under implementation complexity.Code of the present invention is exactly such convolution LDPC code, its check matrix band
Have periodically, coding and decoding flow can be greatly simplified, and its decoder is pipeline decoding device structure, has concurrency, is more easy to
Realized in hardware.Although channel decoding algorithm is varied, it is still at most traditional CPU platforms to use at present, is translated
The system emulation time loss of code module is excessive, and decoding speed is slower.
The content of the invention
Based on technical problem existing for background technology, the present invention proposes the SC-LDPC codes decoding based on GPU and accelerates system
System.
SC-LDPC codes decoding acceleration system proposed by the present invention based on GPU, including programming platform CUDA, hybrid decoding
Device, host side and equipment end, the multiple programming platform CUDA that the programming platform CUDA provides for NVIDIA companies, the programming
Platform CUDA is used for the source code for writing SC-LDPC codes, and programming platform CUDA is for the original for the SC-LDPC codes that will newly write
Beginning code transfer is on hybrid decoding device, and the inside of the hybrid decoding device is provided with multiple processors, and each processor possesses J
+ 1 shift register stores intermediate variable information for it, and the hybrid decoding device enters row decoding using iterative decoding algorithm, institute
Hybrid decoding device is stated to be used to the source code of the SC-LDPC codes of reception being converted to the finger that host side and equipment end can perform
Order, the hybrid decoding device is used to the instruction that host side can perform being transferred to host side, and host side is provided with original CPU
Program, the original CPU programs include Void LDPCCCDecode::LdpcccDecodeText () function,
PipelineDecodeInitial () function and PipelineDecodeInitial_Phase2_CUDA () kernel function, institute
State Void LDPCCCDecode::LdpcccDecodeText () function is Void LDPCCCDecode::
LdpcccDecodeText (double*Lchan, int time, int*CodeIn, int*errNo) function, Void
LDPCCCDecode::LdpcccDecodeText (double*Lchan, int time, int*CodeIn, int*errNo) letter
Several programs is as follows:
PipelineDecodeInitial (ldpccc_chan, time) ..., in this section of code
PipelineDecodeInitial () function can be separated, and be placed on before circulation, and the code after optimization is as follows:
The PipelineDecodeInitial () function mainly distributes memory space for H and V, and H is used to deposit
Check matrix, V are used to deposit channel information, and the hybrid decoding device is used to the instruction that equipment end can perform being transferred to equipment
End, and equipment end is provided with PipelineDecodeInitial_Phase2_CUDA () kernel function and Decode_CUDA.cu
File, the code of the PipelineDecodeInitial_Phase2_CUDA () kernel function are stored in Decode_
On CUDA.cu files.
Preferably, the hybrid decoding device is the decoder of convolution LDPC code, and hybrid decoding device is pipeline decoding device
Structure.
Preferably, the host side is simple decoding module CPU, and the data of the host side can copy equipment to
End, and the data of equipment end can also copy host side to.
Preferably, the host side includes the core processors of intel-i5 4,590 4,8G internal memories and 3.3GHz dominant frequency, and main
Generator terminal is programmed using the platforms of Visual Studio 2010.
Preferably, the equipment end for can concurrent operation and the high decoding module GPU of complexity, the equipment end include
GTX970 video cards, 1664 core processors, 4G video memorys and 1.02GHz dominant frequency, and equipment end uses the cuda7.5 that NVIDIA is released
Platform is programmed.
Preferably, thread block and Thread Count number configuration optimization module, and thread block and line are additionally provided with the equipment end
Number of passes number configuration optimization module uses GTx970 processors, and the GTx970 processors have 13 SM, and each SM has 128
CUDA core, the configuration of the thread block and Thread Count number configuration optimization module are as follows:Set
(1*BLKSZ+255) 256 blocks, in each block during 64 threads:
dim3 gridDim_Horizontal(1,/*(CHKSZ+127)/128*/12);
PipelineDecodeProcessorHorizontal_CUDA<<<gridDim_Horizontal,128>>>
(dev_H,time,ID,d_Ltab);
dim3 gridDim_Vertical(1,/*(BLKSZ+256)/256*/14);
PipelineDecodeProcessor Vertical_CUDA<<<gridDim_Vertical,192>>>(dev_
H,dev_V,time,ID);
PipelineDecodeFinalProcessorDecision_CUDA<<</*(BLKSZ+511)/512*/14,192
>>>(dev_H,dev_V,dev_Beta,dev_dout,dev_ldpcccCode,time,dev_errNo)。
Preferably, the PipelineDecodeInitial_Phase2_CUDA () kernel function includes while () letter
Number, and Al in while () function and As, using register storage, the while () function is while (tid<BLKSZ),
And while () function uses equipment end multi-threaded parallel assignment, internal memory is covered as far as possible and reads the delay brought.
Beneficial effects of the present invention:
1st, the code that uses of the present invention is the more preferable SC-LDPC codes of error-correcting performance, and the data copy of host side and equipment end
Number is few, greatly reduces the time consumed during the decoding of SC-LDPC codes, and then the decoding speed of SC-LDPC codes
Great lifting is obtained;
2nd, the present invention makes programming become simple efficient using being compiled using CUDA platforms, write CUDA codes when
Wait, equipment end can be regarded as a computing device for being able to carry out and controlling thousands of individual threads, equipment end equivalent to
The coprocessor of one host side, the copy function provided in addition using CUDA storehouses are completed between host side and equipment end back and forth
Transfer, the decoding speed of SC-LDPC codes have obtained great lifting;
3rd, equipment end is provided with while () function, and the decoding of SC-LDPC codes is square from horizontal steps, vertical step and judgement etc.
Face realizes optimization so that the decoding speed of SC-LDPC codes is fast;
The present invention can then be write on CPU codes and GPU code together using programming platform CUDA, and it is convenient and high to realize
Effect, the multithreading advantage of equipment end is made full use of in addition to be write again to SC-LDPC codes, reduces the system emulation of decoding
Time, improve decoding speed.
Brief description of the drawings
Fig. 1 is the fundamental diagram that the SC-LDPC codes proposed by the present invention based on GPU decode acceleration system;
Fig. 2 is the convolution LDPC for the hybrid decoding device that the SC-LDPC codes proposed by the present invention based on GPU decode acceleration system
Code Tanner figures;
Fig. 3 is the decoding frame for the hybrid decoding device that the SC-LDPC codes proposed by the present invention based on GPU decode acceleration system
Figure.
Embodiment
The present invention is made with reference to specific embodiment further to explain.
Embodiment
With reference to figure 1-3, the SC-LDPC codes decoding acceleration system based on GPU, including programming platform are proposed in the present embodiment
CUDA, hybrid decoding device, host side and equipment end, programming platform CUDA are the multiple programming platform that NVIDIA companies provide
CUDA, programming platform CUDA are used to writing the source code of SC-LDPC codes, and programming platform CUDA is used for the SC- that will newly write
The source code of LDPC code is transferred on hybrid decoding device, and the inside of hybrid decoding device is provided with multiple processors, and each processing
Device possesses J+1 shift register and stores intermediate variable information for it, i.e., each processor possesses J+1 shift register and supplies it
Store intermediate variable information, hybrid decoding device enters row decoding using iterative decoding algorithm, and iterative decoding algorithm include initializing,
Five steps, the hybrid decoding devices such as shift register displacement, vertical renewal, horizontal renewal and judgement are used for the SC- of reception
The source code of LDPC code is converted to the instruction that host side and equipment end can perform, and hybrid decoding device is used for host side institute energy
The instruction of execution is transferred to host side, and host side is provided with original CPU programs, and original CPU programs include Void
LDPCCCDecode::LdpcccDecodeText () function, PipelineDecodeInitial () function and
PipelineDecodeInitial_Phase2_CUDA () kernel function, Void LDPCCCDecode::
LdpcccDecodeText () function is Void LDPCCCDecode::ldpcccDecodeText(double*Lchan,int
Time, int*CodeIn, int*errNo) function, Void LDPCCCDecode::ldpcccDecodeText(double*
Lchan, int time, int*CodeIn, int*errNo) function program it is as follows:
PipelineDecodeInitial (ldpccc_chan, time) ..., in this section of code
PipelineDecodeInitial () function can be separated, and be placed on before circulation, and the code after optimization is as follows:
PipelineDecodeInitial () function mainly distributes memory space for H and V, and H, which is used to deposit, to be verified
Matrix, V are used to deposit channel information,
PipelineDecodeInitial () function is located at
LDPCCCDecode:In ldpcccDecodeTest () function, and
The function of PipelineDecodeInitial () function can be divided into two stages:First stage is reading H_
[] [] .txt files, are loaded data into H;Second stage to A the and B variable assignments in H,
The reading part of txt data is corresponded in fresh code Decode.cu in PipelineDecodeInitial (),
Void LDPCCCDecode::PipelineDecodeInitial_Phase1 (int time) this function will be read
The data got are saved in internal memory, and the memory address of dynamically distributes each time is preserved by sg_tmp array of pointers, pass through for
All txt data files can be loaded into internal memory by () function, and for () function is in SimulationProcess () function
Middle calling, the program of for () function are as follows:
For (int i=0;i<T;i++){
Ldpc.PipelineDecodeInitial_Phase1(i);
, hybrid decoding device is used to the instruction that equipment end can perform being transferred to equipment end, and equipment end is provided with
PipelineDecodeInitial_Phase2_CUDA () kernel function and Decode_CUDA.cu files,
The code of PipelineDecodeInitial_Phase2_CUDA () kernel function is stored on Decode_CUDA.cu files.
In the present embodiment, hybrid decoding device is the decoder of convolution LDPC code, and hybrid decoding device is pipeline decoding device
Structure, host side are simple decoding module CPU, and the data of host side can copy equipment end to, and the data of host side are being set
Storage output data in video memory space is opened up on standby end, now the kernel functions in equipment end are run, now kernel letters
Number after video memory space reading input data is handled, copy echo and deposit space by the output data after the processing of kernel functions, if
The data at standby end can also copy host side to, and host side copies handling for data to equipment end, and equipment end release is aobvious
Space is deposited, host side includes the core processors of intel-i54590 tetra-, 8G internal memories and 3.3GHz dominant frequency, and host side uses Visual
The platforms of Studio 2010 are programmed, equipment end for can concurrent operation and the high decoding module GPU of complexity, equipment end include
GTX970 video cards, 1664 core processors, 4G video memorys and 1.02GHz dominant frequency, and equipment end uses the cuda7.5 that NVIDIA is released
Platform is programmed, and thread block and Thread Count number configuration optimization module is additionally provided with equipment end, and thread block and thread are several
Number configuration optimization module uses GTx970 processors, and GTx970 processors have 13 SM, and each SM has 128 CUDA core,
The configuration of thread block and Thread Count number configuration optimization module is as follows:Set (1*BLKSZ+255) 256 blocks, in each block
During 64 threads:
dim3 gridDim_Horizontal(1,/*(CHKSZ+127)/128*/12);
PipelineDecodeProcessorHorizontal_CUDA<<<
gridDim_Horizontal,128>>>(dev_H,time,ID,d_Ltab);
dim3 gridDim_Vertical(1,/*(BLKSZ+256)/256*/14);
PipelineDecodeProcessor Vertical_CUDA<<<gridDim_
Vertical,192>>>(dev_H,dev_V,time,ID);
PipelineDecodeFinalProcessorDecision_CUDA<<</*(BLKSZ+511)/512*/14,192
>>>(dev_H, dev_V, dev_Beta, dev_dout, dev_ldpcccCode, time, dev_errNo),
PipelineDecodeInitial_Phase2_CUDA () kernel function includes while () function, and in while () function
Al and As using register store, while () function is while (tid<BLKSZ), and while () function uses equipment end
Multi-threaded parallel assignment, internal memory being covered as far as possible and reads the delay brought, while () function is also from horizontal steps, vertical step in addition
Optimization is realized to the decoding speed of SC-LDPC codes in terms of rapid and judgement, the present invention then can be by CPU using programming platform CUDA
Code and GPU code write on together, and it is convenient and efficient to realize, and makes full use of the multithreading advantage of equipment end to come to SC- in addition
LDPC code is write again, is reduced the system emulation time of decoding, is improved decoding speed.
The foregoing is only a preferred embodiment of the present invention, but protection scope of the present invention be not limited thereto,
Any one skilled in the art the invention discloses technical scope in, technique according to the invention scheme and its
Inventive concept is subject to equivalent substitution or change, should all be included within the scope of the present invention.
Claims (7)
1. the SC-LDPC codes decoding acceleration system based on GPU, including programming platform CUDA, hybrid decoding device, host side and equipment
End, it is characterised in that the programming platform CUDA is used for the source code for writing SC-LDPC codes, and programming platform CUDA is used for
The source code for the SC-LDPC codes newly write is transferred on hybrid decoding device, the inside of the hybrid decoding device is provided with multiple
Processor, and each processor possesses J+1 shift register and intermediate variable information is stored for it, the hybrid decoding device uses
Iterative decoding algorithm enters row decoding, and the hybrid decoding device is used to the source code of the SC-LDPC codes of reception being converted to main frame
The instruction that end and equipment end can perform, the hybrid decoding device are used to the instruction that host side can perform being transferred to main frame
End, and host side is provided with original CPU programs, the original CPU programs include Void LDPCCCDecode::
LdpcccDecodeText () function, PipelineDecodeInitial () function and PipelineDecodeInitial_
Phase2_CUDA () kernel function, the hybrid decoding device are used to the instruction that equipment end can perform being transferred to equipment end,
And equipment end is provided with PipelineDecodeInitial_Phase2_CUDA () kernel function and Decode_CUDA.cu texts
Part, the code of the PipelineDecodeInitial_Phase2_CUDA () kernel function are stored in Decode_CUDA.cu
On file.
2. the SC-LDPC codes decoding acceleration system according to claim 1 based on GPU, it is characterised in that the mixing is translated
Code device is the decoder of convolution LDPC code, and hybrid decoding device is pipeline decoding device structure.
3. the SC-LDPC codes decoding acceleration system according to claim 1 based on GPU, it is characterised in that the host side
For simple decoding module CPU, the data of the host side can copy equipment end to, and the data of equipment end can also copy
To host side.
4. the SC-LDPC codes decoding acceleration system according to claim 1 based on GPU, it is characterised in that the host side
Including the core processors of intel-i5 4,590 4,8G internal memories and 3.3GHz dominant frequency, and host side uses Visual Studio 2010
Platform is programmed.
5. the SC-LDPC codes decoding acceleration system according to claim 1 based on GPU, it is characterised in that the equipment end
For can concurrent operation decoding module GPU, the equipment end include GTX970 video cards, 1664 core processors, 4G video memorys and
1.02GHz dominant frequency, and the cuda7.5 platforms that equipment end is released using NVIDIA are programmed.
6. the SC-LDPC codes decoding acceleration system according to claim 1 based on GPU, it is characterised in that the equipment end
On be additionally provided with thread block and Thread Count number configuration optimization module, and thread block and Thread Count number configuration optimization module use
GTx970 processors, the GTx970 processors have 13 SM, and each SM has 128 CUDA core.
7. the SC-LDPC codes decoding acceleration system according to claim 1 based on GPU, it is characterised in that described
PipelineDecodeInitial_Phase2_CUDA () kernel function includes while () function, and in while () function
Al and As using register store.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710770624.5A CN107707330A (en) | 2017-08-31 | 2017-08-31 | SC LDPC codes decoding acceleration system based on GPU |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710770624.5A CN107707330A (en) | 2017-08-31 | 2017-08-31 | SC LDPC codes decoding acceleration system based on GPU |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107707330A true CN107707330A (en) | 2018-02-16 |
Family
ID=61170039
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710770624.5A Pending CN107707330A (en) | 2017-08-31 | 2017-08-31 | SC LDPC codes decoding acceleration system based on GPU |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107707330A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108462495A (en) * | 2018-04-03 | 2018-08-28 | 北京航空航天大学 | A kind of multielement LDPC code high-speed parallel decoder and its interpretation method based on GPU |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102932003A (en) * | 2012-09-07 | 2013-02-13 | 上海交通大学 | Accelerated QC-LDPC (Quasi-Cyclic Low-Density Parity-Check Code) decoding method based on GPU (Graphics Processing Unit) framework |
US8930789B1 (en) * | 2013-01-23 | 2015-01-06 | Viasat, Inc. | High-speed LDPC decoder |
CN106788467A (en) * | 2016-11-28 | 2017-05-31 | 华中科技大学 | A kind of Raptor Code coding methods based on CUDA, interpretation method and system |
CN106992856A (en) * | 2017-03-29 | 2017-07-28 | 山西大学 | The data coordinating method of extensive continuous variable quantum key distribution based on GPU |
-
2017
- 2017-08-31 CN CN201710770624.5A patent/CN107707330A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102932003A (en) * | 2012-09-07 | 2013-02-13 | 上海交通大学 | Accelerated QC-LDPC (Quasi-Cyclic Low-Density Parity-Check Code) decoding method based on GPU (Graphics Processing Unit) framework |
US8930789B1 (en) * | 2013-01-23 | 2015-01-06 | Viasat, Inc. | High-speed LDPC decoder |
CN106788467A (en) * | 2016-11-28 | 2017-05-31 | 华中科技大学 | A kind of Raptor Code coding methods based on CUDA, interpretation method and system |
CN106992856A (en) * | 2017-03-29 | 2017-07-28 | 山西大学 | The data coordinating method of extensive continuous variable quantum key distribution based on GPU |
Non-Patent Citations (2)
Title |
---|
CHI H. CHAN等: "Parallel decoding of LDPC convolutional codes using OpenMP and GPU", 《2012 IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS (ISCC)》 * |
鲁邹晨: "LDPC码并行译码算法的研究及其基于CUDA的实现", 《中国优秀博士学位论文全文数据库》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108462495A (en) * | 2018-04-03 | 2018-08-28 | 北京航空航天大学 | A kind of multielement LDPC code high-speed parallel decoder and its interpretation method based on GPU |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20220012598A1 (en) | Methods and apparatus for matrix and vector storage and operations | |
US8554820B2 (en) | Optimized corner turns for local storage and bandwidth reduction | |
US11080227B2 (en) | Compiler flow logic for reconfigurable architectures | |
KR20220129107A (en) | Matrix multiplier | |
US20140181477A1 (en) | Compressing Execution Cycles For Divergent Execution In A Single Instruction Multiple Data (SIMD) Processor | |
CN101061460B (en) | Micro processor device and method for shuffle operations | |
CN102053816B (en) | Data shuffling unit with switch matrix memory and shuffling method thereof | |
CN105612509A (en) | Methods, apparatus, instructions and logic to provide vector sub-byte decompression functionality | |
CN105049061A (en) | Advanced calculation-based high-dimensional polarization code decoder and polarization code decoding method | |
CN106911336B (en) | High-speed parallel low-density parity check decoder with multi-core scheduling and decoding method thereof | |
Kang et al. | Parallel LDPC decoder implementation on GPU based on unbalanced memory coalescing | |
CN116775518A (en) | Method and apparatus for efficient access to multidimensional data structures and/or other large data blocks | |
US11580397B2 (en) | Tensor dropout using a mask having a different ordering than the tensor | |
Balevic | Parallel variable-length encoding on GPGPUs | |
CN107707330A (en) | SC LDPC codes decoding acceleration system based on GPU | |
US20130262787A1 (en) | Scalable memory architecture for turbo encoding | |
US11500962B1 (en) | Emulating fine-grained sparsity in a systolic array | |
CN102201817B (en) | Low-power-consumption LDPC decoder based on optimization of memory folding architecture | |
Huang et al. | RM-STC: Row-Merge Dataflow Inspired GPU Sparse Tensor Core for Energy-Efficient Sparse Acceleration | |
CN102594369A (en) | Quasi-cyclic low-density parity check code decoder based on FPGA (field-programmable gate array) and decoding method | |
US11328209B1 (en) | Dual cycle tensor dropout in a neural network | |
CN111966405B (en) | Polar code high-speed parallel decoding method based on GPU | |
US11803736B1 (en) | Fine-grained sparsity computations in systolic array | |
CN111368250B (en) | Data processing system, method and equipment based on Fourier transformation/inverse transformation | |
US20040111589A1 (en) | Asynchronous multiple-order issue system architecture |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20180216 |