CN112398487A - Implementation method and system for reducing complexity of Turbo parallel decoding - Google Patents

Implementation method and system for reducing complexity of Turbo parallel decoding Download PDF

Info

Publication number
CN112398487A
CN112398487A CN202011474548.1A CN202011474548A CN112398487A CN 112398487 A CN112398487 A CN 112398487A CN 202011474548 A CN202011474548 A CN 202011474548A CN 112398487 A CN112398487 A CN 112398487A
Authority
CN
China
Prior art keywords
state
value
metric
interleaving
branch
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011474548.1A
Other languages
Chinese (zh)
Inventor
邓红梅
郭军平
赵峰
施渊籍
周一青
石晶林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN202011474548.1A priority Critical patent/CN112398487A/en
Publication of CN112398487A publication Critical patent/CN112398487A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M13/00Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes
    • H03M13/29Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes combining two or more codes or code structures, e.g. product codes, generalised product codes, concatenated codes, inner and outer codes
    • H03M13/2957Turbo codes and decoding

Abstract

The invention relates to a realization method and a system for reducing the complexity of Turbo parallel decoding, which can index inner interleaving coefficients f1 and f2 only according to frame length information by specific calculation in an FPGA (field programmable gate array), thereby carrying out real-time simplified calculation of interleaving addresses without storing the interleaving addresses in the calculation process. Through a large amount of simulation verification, the method is improved on the basis of the existing algorithm, and the maximum value of the branch state measurement is calculated due to the fact that no feedback exists in the branch state measurement, so that compared with the existing scheme, the method can save the process of calculating the branch measurement for 2 times in each iteration, and effectively reduces the calculation complexity. Can be compatible with all 188 frame lengths of the LTE protocol and is suitable for various rates. Aiming at the problems of higher complexity and larger time delay of the current decoding implementation, the invention effectively reduces the calculation complexity and the time delay, and simultaneously, the scheme also has lower bit error rate.

Description

Implementation method and system for reducing complexity of Turbo parallel decoding
Technical Field
The invention relates to a realization method and a system for reducing the complexity of Turbo parallel decoding, and relates to the technical field of communication coding and decoding.
Background
The concept of Turbo code proposed by berrou et al in 1993 obtains decoding performance almost close to the theoretical limit of shannon due to the good application of random encoding and decoding conditions in the shannon channel encoding scheme. With the continuous research of Turbo codes by various scholars in recent years, Turbo codes have been put into practical use as one of the mainstream schemes for LTE channel coding in the 3GPP standard.
Although the Turbo code algorithm gradually matures in continuous research, more problems still exist in practical systems in application, and particularly in the aspect of hardware implementation, the problems are mainly due to the fact that the Turbo code decoding algorithm is high in complexity, feedback exists inside the Turbo code decoding algorithm, and large time delay exists in multiple iterations.
In order to reduce time delay, Turbo decoding firstly needs to solve the problem of parallel interleaving/de-interleaving addresses, two schemes are generally adopted for generating interleaving addresses, the first scheme is to generate a specific interleaving table in advance according to the frame length for storage, and if different frame lengths are supported, different interleaving tables need to be stored, so that more resources are occupied; the second is real-time calculation, which needs to obtain interleaving coefficient by computer search to calculate interleaving address in real time. The existing parallel interleaving scheme generally interleaves the grouped subblocks respectively, and different interleaving addresses are adopted for interleaving and deinterleaving, so that the computational complexity is increased or more storage resources are occupied.
According to the MAX-LOG-MAP algorithm, the core of the decoding algorithm is to calculate branch state metric, forward state metric and backward state metric, feedback exists in the internal calculation of the forward state metric and the backward state metric, multiple iterations are needed, the algorithm for solving the maximum value of the forward state metric and the backward state metric has high complexity, and the decoding time delay is in direct proportion to the calculation complexity and the iteration times.
The problem to be solved in the prior art is to effectively reduce the complexity of a decoding algorithm, improve the decoding throughput rate, reduce the decoding delay and reduce the hardware resource overhead.
Disclosure of Invention
The purpose of the invention is as follows: an object is to provide an implementation method for reducing the complexity of Turbo parallel decoding, so as to solve the above problems in the prior art. A further object is to propose a system implementing the above method.
The technical scheme is as follows: a realization method for reducing the complexity of Turbo parallel decoding comprises the following steps:
step 1, inputting a ping-pong buffer unit, and caching the input soft information sequence by a ping-pong RAM according to the state of the component decoder.
And 2, an interleaving/de-interleaving address index unit sequentially stores the internal interleaving coefficients, acquires the interleaving coefficients corresponding to different frame lengths in real time by specific calculation according to the frame length information index RAM addresses, generates and stores fixed interleaving parallel index addresses by specific calculation of the interleaving addresses, and adopts the same index address for interleaving and de-interleaving.
Further, the parallel interleaving RAM index address is obtained according to the following formula:
Figure BDA0002834783900000021
wherein k represents parallelism, L represents subblock length, i satisfies 0 ≤ i ≤ k-1, j satisfies 0 ≤ j ≤ L-1, and x satisfies 0 ≤ x ≤ N-1.
And 3, the first component decoder unit sets the length of an input information frame as N and divides the input information frame into P blocks, the length L of each block of data is N/P, P paths of parallelism are adopted in the realization, P sub-component decoders are adopted, namely the first component decoder unit is realized by dividing the input information frame into P windows, each window processes L data, and the decoding is realized by adopting an MAX-LOG-MAP algorithm.
Further, when detecting that the ping or pong RAM buffering is completed, the decoding iteration starts, and an indication signal occypy 1 of the currently occupied ping or pong RAM is fed back to indicate that the data is read from the ping or pong RAM.
Further, let the decoding current timeWhen the time is k, the state at the time k is skThe state at the time k-1 is sk-1The state at the time k +1 is sk+1When decoding iteration starts, the first window adopts the first block data to forward calculate branch state measurement, and the calculation method is as follows:
Figure BDA0002834783900000022
where the branch state metric has four possible state values in total, using gamma, due to the uncertainty of the received information bits and check bitsk(sk-1,sk) Which is hereinafter referred to simply as gamma,
Figure BDA0002834783900000023
respectively representing system information and check information before encoding,
Figure BDA0002834783900000024
indicating the reception of the decoded information bits and check bits.
Further, the current state branch metric gamma and the maximum value thereof are calculated, and the calculation results are sequentially stored. And simultaneously, initializing the backward state metric by a second window, pre-calculating the backward state metric by adopting second block data, and storing a boundary value as an initial value of backward state metric calculation at the next moment.
And 4, calculating the forward state metric. The value of the forward state metric at the current moment is determined by the value of the previous moment, the branch transition metric value at the current moment and the maximum value, and when the first gamma maximum value is calculated, the first window starts to calculate the forward state metric in the forward direction. The calculation method is shown as the following formula:
αk(sk)=αk-1(sk-1)+γ(sk-1,sk)+max(γ(sk-1,sk))
wherein alpha isk(sk) Denotes the time k skThe forward metric of a state is referred to below as alpha.
Further, firstly, the alpha is initialized by adopting uniform distribution, and the sum operation is carried out by utilizing the alpha of the previous state, the current branch state metric and the maximum value of gamma, so as to obtain the alpha value of the forward state metric.
And 5, calculating backward state measurement, namely beta for short. The value of the backward state metric at the current moment is determined by the value of the next moment, the branch transition metric value at the current moment and the maximum value thereof, as shown in the formula:
βk(sk)=βk+1(sk+1)+γ(sk+1,sk)+max(γ(sk+1,sk))
furthermore, beta adopts a second window to pre-calculate a boundary value stored by beta of the second block data as an initial value, and the backward state metric beta value is obtained by adding the backward state (initial value) of beta, the branch state metric and the maximum value of gamma.
And 6, calculating the external information Le, wherein the calculation of the external information Le and the calculation of the beta are carried out simultaneously, reading alpha and gamma values from the cache in sequence, and carrying out addition ratio operation through the alpha, the gamma and the real-time calculated beta values to obtain the external information Le value.
The working principle of the second component decoding is the same as that of the first component decoding, and the difference 1 is that the two component decoding work has a window-length delay, so that compared with the traditional decoding, the decoding delay is reduced by P times; difference 2 is that the soft information sequence it inputs is different, so that the extrinsic information sequence it outputs is different.
Furthermore, the system soft information sequence input originally and the extrinsic information Le output by the first block of data of the first component decoder are indexed by parallel interleaving to obtain the system information and the prior information input by the second component decoder. When detecting the first component decoding first block data external information calculation completion indication signal, the second component decoding feeds back an indication signal occupy2 signal of the input buffer module, which indicates that the system check information is read from the input buffer RAM
Figure BDA0002834783900000031
Further, when the second component decoding unit completes the computation of the extrinsic information Le2 and LLR of the first block data, the interleaving/deinterleaving address index computation unit performs deinterleaving as the prior information La1 of the first component decoding input.
And 7, repeating the steps 2-6 for the calculation of the remaining P-1 windows, stopping iteration when the data iteration of the P windows meets a certain number of times, sequentially outputting final likelihood ratio information LLR by the second component decoder, performing de-interleaving on the LLR through an interleaving address index unit, performing output buffer processing after hard decision, and waiting for the next module to read data.
Based on the realization method, the invention provides a system for reducing the complexity of Turbo parallel decoding, which comprises a first module, a second module and a third module, wherein the first module is used for inputting a ping-pong cache unit and caching an input soft information sequence by a ping-pong RAM according to the state of a component decoder; a second module for interleaving/deinterleaving the address index unit;
a third module for maximizing branch metrics to reduce coding complexity; a fourth module for calculating a forward state metric; a fifth module for calculating a backward state metric; a sixth module for calculating extrinsic information Le; and a seventh module for repeating the iteration among the second module, the third module, the fourth module, the fifth module and the sixth module, and stopping the iteration when the data iteration of the P windows satisfies a predetermined number of times.
Further, the second module is further configured to sequentially store the intra-interleaving coefficients, index an RAM address according to the frame length information, and obtain the interleaving coefficients corresponding to different frame lengths in real time:
Figure BDA0002834783900000041
wherein addriThe index i is greater than or equal to 0 and less than or equal to 187, and N represents the frame length information;
and generating fixed interleaving parallel index addresses by computing the interleaving addresses and storing the fixed interleaving parallel index addresses, wherein the same index address is adopted for interleaving and deinterleaving:
f(x)=(f1*x+f2*x2)mod N 0≤x≤N-1
wherein f1 and f2 respectively represent inner interleaving coefficients, x represents an information address before interleaving, f (x) represents an information address after interleaving, mod represents remainder, and N represents a frame length, namely interleaving depth;
the third module is further configured to set an input information frame length to be N, divide the frame length into P blocks, where each block of data length L is N/P, adopt P-way parallelism, and adopt P sub-component decoders, that is, divide the frame length into P windows to implement, where each window processes L data;
when detecting that the ping or pong RAM buffer finishes the indication signal, the decoding iteration starts, and an indication signal copy 1 of the current occupied ping or pong RAM is fed back to represent that the data is read from the ping or pong RAM;
setting the current decoding time as k, the state of k time is skThe state at the time k-1 is sk-1The state at the time k +1 is sk+1When decoding iteration starts, the first window adopts the first block data to forward calculate branch state measurement, and the calculation method is as follows:
Figure BDA0002834783900000042
where the branch state metric has four possible state values in total, using gamma, due to the uncertainty of the received information bits and check bitsk(sk-1,sk) Which means, in the following simply called branch metrics,
Figure BDA0002834783900000043
respectively representing system information and check information before encoding,
Figure BDA0002834783900000044
information bits and check bits representing a reception decoding;
calculating the branch measurement and the maximum value of the current state, and respectively and sequentially storing the calculation results; meanwhile, the second window initializes the backward state measurement, pre-calculates the backward state measurement by adopting second block data, and stores a boundary value as an initial value of backward state measurement calculation at the next moment;
the fourth module is further used for initializing the forward metric value by adopting uniform distribution, and performing summation operation by utilizing the forward metric value of the previous state, the current branch state metric and the maximum value of the branch metric to obtain the forward metric value of the forward state metric; the value of the forward state metric at the current moment is determined by the value of the previous moment, the branch transition metric value at the current moment and the maximum value, when the calculation of the maximum value of the first branch metric is completed, the first window starts to calculate the forward state metric in the forward direction, and the calculation method is shown as the following formula:
αk(sk)=αk-1(sk-1)+γ(sk-1,sk)+max(γ(sk-1,sk))
wherein alpha isk(sk) Denotes the time k skForward metric values for the states.
Further, the fifth module further adopts a second window to pre-calculate a boundary value stored in the backward metric value of the second block data as an initial value, and performs addition operation on the backward state of the backward metric value, the branch state metric value and the maximum value of the state branch metric value to obtain a backward state metric value; the value of the backward state metric at the current moment is determined by the value of the next moment, the branch transition metric value at the current moment and the maximum value thereof, as shown in the formula:
βk(sk)=βk+1(sk+1)+γ(sk+1,sk)+max(γ(sk+1,sk))
in the formula, betak(sk) Denotes the time k skA backward measure of state;
the calculation of the external information Le and the backward state metric value in the seventh module is carried out simultaneously, the forward state metric value and the state branch metric value are read from the cache in sequence, and the addition ratio operation is carried out through the forward state metric value, the state branch metric value and the backward state metric value calculated in real time to obtain the external information Le value;
repeating the steps 2-6 for the calculation of the remaining P-1 windows, stopping iteration when the data iteration of the P windows meets a certain number of times, sequentially outputting final likelihood ratio information LLR by a second component decoder, performing de-interleaving on the LLR through an interleaving address index unit, performing output buffer processing after hard decision, and waiting for the next module to read data;
Λ(sk0)=αk-1(sk-1)+γ(sk-1,sk)+βk(sk)
Λ(sk1)=αk-1(sk-1)+γ(sk-1,sk)+βk(sk)
Figure BDA0002834783900000051
LLR=max*0(i))-max*1(i))
wherein, Λ(s)k0) When s represents input of 0kCorresponding LLR, Λ(s)k1) Indicating that input is 1 hour skCorresponding LLR, LeAnd LLRs respectively represent external information and external likelihood ratio information of the first component-decoded output.
Has the advantages that: the invention relates to a realization method and a system for reducing the complexity of Turbo parallel decoding, which can index inner interleaving coefficients f1 and f2 only according to frame length information by specific calculation in an FPGA (field programmable gate array), thereby carrying out real-time simplified calculation of interleaving addresses without storing the interleaving addresses in the calculation process. Through a large amount of simulation verification, the method is improved on the basis of the existing algorithm, and the maximum value of the branch state measurement is calculated due to the fact that no feedback exists in the branch state measurement, so that compared with the existing scheme, the method can save the process of calculating the branch measurement for 2 times in each iteration, and effectively reduces the calculation complexity. Can be compatible with all 188 frame lengths of the LTE protocol and is suitable for various rates. Aiming at the problems of higher complexity and larger time delay of the current decoding implementation, the invention effectively reduces the calculation complexity and the time delay, and simultaneously, the scheme also has lower bit error rate.
Drawings
Fig. 1 is a schematic structural diagram of a method for effectively reducing the complexity of Turbo parallel decoding according to an embodiment of the present application.
Fig. 2 is a schematic flowchart of a method for effectively reducing the complexity of Turbo parallel decoding according to an embodiment of the present application.
Fig. 3 is a flowchart of a component decoding sub-decoder according to an embodiment of the present application.
Fig. 4 is a timing diagram illustrating a method for effectively reducing the complexity of Turbo parallel decoding according to an embodiment of the present application.
Fig. 5 is a performance curve diagram of a method for effectively reducing the complexity of Turbo parallel decoding according to an embodiment of the present application.
Fig. 6 is an overall work flow diagram of the present invention.
Detailed Description
In the following description, numerous specific details are set forth in order to provide a more thorough understanding of the present invention. It will be apparent, however, to one skilled in the art, that the present invention may be practiced without one or more of these specific details. In other instances, well-known features have not been described in order to avoid obscuring the invention.
In an embodiment, a low-complexity Turbo parallel decoding principle structure diagram provided by the embodiment of the present application is provided. As shown in fig. 1, the method includes: step S101 to step S105.
Step S101, inputting a ping-pong buffer unit, caching the input decoding soft information according to the state of the component decoder by using a ping-pong RAM, if the frame length is 288, 8 paths are parallel, and the number of each group of soft information is 36, respectively performing ping-pong buffer on the input decoding external information by using 2 groups of RAMs, wherein the decoding external information comprises system external information, check external information 1 and check external information 2, each group is 8 RAMs, and the depth of each RAM is 36.
Step S102, an interleaving/de-interleaving address index calculation unit adopts a RAM with the depth of 188 to sequentially store the interleaving coefficients in 188 groups of the 3GPP LTE protocol, and each group comprises 3 pieces of parameter information, the frame length and 2 interleaving coefficients f1 and f 2.
Further, RAM addresses are indexed according to different frame lengths using the following equation:
Figure BDA0002834783900000071
wherein addriThe index i is greater than or equal to 0 and less than or equal to 187, and N represents the frame length information. The division in the formula can be realized by right shift operation without any complex calculation, and the inner interleaving coefficients f1 and f2 of 188 different frame lengths of the 3GPP LTE can be obtained in real time.
Further, the interleaving addresses of different frame lengths can be obtained by interleaving coefficients f1 and f 2:
f(x)=(f1*x+f2*x2)mod N 0≤x≤N-1
wherein f1 and f2 respectively represent inner interleaving coefficients, x represents an information address before interleaving, f (x) represents an information address after interleaving, mod represents remainder, and N represents a frame length, that is, an interleaving depth.
Further, the above formula can be further transformed to obtain the following formula:
f(x)=(f(x-1)+f1+f2+2f2*(x-1))mod N 0≤x≤N-1
furthermore, the optimized calculation complexity is obviously reduced, and the hardware implementation is facilitated.
Further, the parallel interleaving RAM index address is obtained according to the following formula:
Figure BDA0002834783900000072
wherein k represents parallelism, L represents subblock length, i satisfies 0 ≤ i ≤ k-1, j satisfies 0 ≤ j ≤ L-1, and x satisfies 0 ≤ x ≤ N-1.
Further, the information outside the system is subjected to packet interleaving through the index address.
Step S103, the first component decoding unit adopts 8 paths of parallelism in implementation, adopts 8 sub-component decoders, namely is realized by dividing into 8 windows, each window processes one block of data, namely 36, and the decoding is realized by adopting an MAX-LOG-MAP algorithm.
Further, when detecting that the ping or pong RAM buffering is completed, the decoding iteration starts, and an indication signal occypy 1 of the currently occupied ping or pong RAM is fed back to indicate that the data is read from the ping or pong RAM.
Further, the current state branch metric gamma and the maximum value thereof are calculated, and the calculation results are sequentially stored. And simultaneously, initializing the backward state metric by a second window, pre-calculating the backward state metric by adopting second block data, and storing a boundary value as an initial value of backward state metric calculation at the next moment.
Further, initializing the alpha by adopting uniform distribution, and performing summation operation by utilizing the alpha of the previous state, the current branch state metric and the maximum value of gamma to obtain a forward state metric alpha value.
Furthermore, beta adopts a second window to pre-calculate a boundary value stored by beta of the second block data as an initial value, and the backward state metric beta value is obtained by adding the backward state (initial value) of beta, the branch state metric and the maximum value of gamma.
Further, as shown in the following formula, calculating the extrinsic information Le, and calculating the extrinsic information Le and beta simultaneously, reading the alpha and gamma values from the cache in sequence, and performing a sum-ratio operation through the alpha, gamma and the real-time calculated beta value to obtain the extrinsic information Le value.
Λ(sk0)=αk-1(sk-1)+γ(sk-1,sk)+βk(sk)
Λ(sk1)=αk-1(sk-1)+γ(sk-1,sk)+βk(sk)
Figure BDA0002834783900000081
LLR=max*0(i))-max*1(i))
Wherein the content of the first and second substances,Λ(sk0) When s represents input of 0kCorresponding LLR, Λ(s)k1) Indicating that input is 1 hour skCorresponding LLR, LeAnd LLRs respectively represent external information and external likelihood ratio information of the first component-decoded output.
Step S104, a second component decoding unit, wherein the working principle of second component decoding is the same as that of first component decoding, the difference 1 is that the two component decoding work has a time delay with a window length, and compared with the traditional decoding, the decoding time delay is reduced by P times; difference 2 is that the soft information sequence it inputs is different, so that the extrinsic information sequence it outputs is different.
Further, the original input system soft information sequence and the external information Le output by the first block data of the first component decoder are interleaved by adopting a parallel interleaving index address to obtain the system information input by the second component decoder
Figure BDA0002834783900000083
And a priori information La 2.
Furthermore, according to the parallel interleaving index formula, it can be known that P pieces of external system information with the RAM address of 1 need to be stored in the address 19 of the P pieces of interleaving RAM through the index address to obtain interleaved data.
Furthermore, when detecting the external information calculation completion indication signal of the first block of data of the first component decoding, the second component decoding unit feeds back an indication signal of an occupy2 of the input buffer module, which indicates that the system check information is read from the input buffer RAM
Figure BDA0002834783900000082
Further, when the second component decoding unit completes the computation of the extrinsic information Le2 and LLR of the first block data, the interleaving/deinterleaving address index computation unit performs deinterleaving as the prior information La1 of the first component decoding input.
And S105, repeating the steps 2-8 for the calculation of the remaining P-1 windows, stopping iteration when the data iteration of the P windows meets a certain number of times, sequentially outputting the final likelihood ratio information LLR by the second component decoder, performing de-interleaving on the LLR through an interleaving index unit, performing output buffer processing after a hard decision module is passed, and waiting for the next module to read data.
In an embodiment, a low-complexity Turbo parallel decoding flow diagram provided in an embodiment of the present application is provided. As shown in fig. 2, the method includes: step S106 to step S110.
And step S106, performing ping-pong cache of input decoding information according to the component decoding working state.
And step S107, when the decoding is started, the first component decoder starts to work, judges whether the calculation of the P-th block data external information Le1 is finished or not, enters an interleaving address index unit for parallel interleaving to obtain the prior information La2 of the second component decoding if the calculation is finished, and continues to perform the first component decoding if the calculation is not finished.
Step S108, reading the system information bit input into the ping-pong buffer
Figure BDA0002834783900000091
And a second check bit
Figure BDA0002834783900000092
The system information bit is entered into an interleaving address index unit for parallel interleaving to obtain the system information bit
Figure BDA0002834783900000093
With a priori information La2 and second verification information
Figure BDA0002834783900000094
And entering the second component decoding together for calculation.
Step S109, during the second component decoding, whether the calculation of the P-th block data extrinsic information Le2 and LLR is completed or not is judged, if yes, the parallel de-interleaving is carried out in an interleaving address index unit to obtain the prior information La1 of the first component decoding, and if not, the second component decoding is continued.
And step S110, when the iteration meets a certain number of times, stopping the iteration, outputting the final likelihood ratio information LLR interleaving address index unit by the second component decoder according to the window sequence for de-interleaving, and performing output buffer processing after passing through the hard decision unit.
In an embodiment, an implementation flow diagram of a component decoding sub-decoder is provided in an embodiment of the present application. As shown in fig. 3, the method includes: step S111 to step S115.
Step S111, calculating gamma values of the current 4 possible states, and simultaneously sequentially caching the results.
Step S112, calculating the gamma maximum values of the 4 possible states, and simultaneously sequentially caching the results.
And step S113, reading the gamma value and the gamma maximum value of the cache in a positive sequence, calculating alpha and sequentially caching the result.
And step S114, reading the cached gamma value and the gamma maximum value in a reverse order, and calculating beta.
And step S115, reading the alpha cache value and the gamma cache value in a positive sequence, and calculating extrinsic information Le and likelihood ratio information LLR together by combining the current beta value.
In an embodiment, a low-complexity Turbo parallel decoding timing diagram is provided in an embodiment of the present application. As shown in fig. 4, the abscissa represents decoding time in units of one sliding window calculation time, the ordinate represents window length in units of one sliding window length, blue represents first component decoding timing, and orange represents second component decoding timing, both of which have a sliding window delay, the method comprising: step S116 to step S118.
In step S116, before the time T0, the first window W0 of the first component decoding calculates gamma, gamma maximum value and alpha in forward direction, the second window W1 pre-calculates beta in reverse direction, and the time T0 sequentially stores gamma, gamma maximum value and alpha value.
At step S116, at time T1, the first component decoding first window W0 reversely calculates beta, extrinsic information Le, and likelihood ratio information LLR. The initial value of beta adopts the beta value pre-calculated reversely at the moment of the first component decoding T0, the second window W1 calculates gamma, the maximum value of gamma and alpha in a forward direction, and the third window W2 pre-calculates beta reversely.
Further, at time T1, the first window W0 of the second component decoding starts to forward calculate gamma, the maximum value of gamma, and the alpha value, and the external information stored in the first component decoding at time T0 is interleaved by the interleaving address index unit, so as to obtain the external information La2 required by the second component decoding for calculating gamma. The second window W1 pre-calculates beta in reverse, time T1, saves gamma, gamma maximum and alpha values.
And step S117, at the time of T2, the first component decoding second window W1 reversely calculates beta, the external information Le and the likelihood ratio information LLR, the initial value of the beta adopts the beta value reversely pre-calculated at the time of T1, the third window W2 forwardly calculates gamma, the maximum value of the gamma and alpha, and the fourth window W3 reversely pre-calculates the beta.
Further, at time T2, the second component decoding first window W0 reversely calculates beta, extrinsic information Le and likelihood ratio information LLR, the initial value of beta adopts the beta value reversely pre-calculated at time T1 of the second component decoding, the second window W1 forwardly calculates gamma, the maximum value of gamma and alpha, and the third window W2 reversely pre-calculates beta.
And S118, repeating the steps, and then completing the calculation of all LLR likelihood ratio information.
In one embodiment, a low-complexity Turbo parallel decoding performance graph is provided.
As shown in fig. 5, for BPSK modulation under AWGN channel, 1/3 code rate, after 6 iterations of bit error rate performance curves, the MAX-LOG-MAP algorithm and the improved MAX-LOG-MAP algorithm are respectively adopted for simulation comparison, and it can be seen that the two curves almost completely coincide.
As noted above, while the present invention has been shown and described with reference to certain preferred embodiments, it is not to be construed as limited thereto. Various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. A realization method for reducing the complexity of Turbo parallel decoding is characterized by comprising the following steps:
step 1, inputting a ping-pong cache unit, and caching an input soft information sequence into a ping-pong RAM according to the state of a component decoder;
step 2, interleaving/deinterleaving address index unit;
step 3, solving the maximum value of the branch metric to reduce the decoding complexity;
step 4, calculating forward state measurement;
step 5, calculating backward state measurement;
step 6, calculating extrinsic information Le;
and 7, repeating the steps 2-6, and stopping iteration when the data iteration of the P windows meets the preset times.
2. The method according to claim 1, wherein step 2 further comprises: sequentially storing the internal interleaving coefficients, indexing the RAM address according to the frame length information, and acquiring the interleaving coefficients corresponding to different frame lengths in real time:
Figure FDA0002834783890000011
wherein addriThe index i is greater than or equal to 0 and less than or equal to 187, and N represents the frame length information;
and generating fixed interleaving parallel index addresses by computing the interleaving addresses and storing the fixed interleaving parallel index addresses, wherein the same index address is adopted for interleaving and deinterleaving:
f(x)=(f1*x+f2*x2)mod N 0≤x≤N-1
wherein f1 and f2 respectively represent inner interleaving coefficients, x represents an information address before interleaving, f (x) represents an information address after interleaving, mod represents remainder, and N represents a frame length, that is, an interleaving depth.
3. The method according to claim 2, wherein step 2 further comprises:
obtaining parallel interleaving RAM index address:
Figure FDA0002834783890000012
wherein k represents parallelism, L represents subblock length, i satisfies 0 ≤ i ≤ k-1, j satisfies 0 ≤ j ≤ L-1, x satisfies 0 ≤ x ≤ N-1, and map _ addri,jThe method represents parallel interleaving RAM index address, and the other symbols have the same meaning.
4. The implementation method for reducing the complexity of Turbo parallel decoding according to claim 1, wherein step 3 further comprises:
step 3-1, setting the frame length of input information as N, dividing the frame length into P blocks, wherein the length L of each block of data is N/P, adopting P-path parallelism, adopting P sub-component decoders, namely dividing the decoder into P windows to realize, and processing L data by each window;
step 3-2, when detecting that the ping or pong RAM buffering is finished and indicating signals are detected, decoding iteration starts, and an indicating signal occopy 1 of the ping or pong RAM which is currently occupied is fed back to indicate that data are read from the ping or pong RAM;
step 3-3, setting the current decoding time as k, and setting the state of the k time as skThe state at the time k-1 is sk-1The state at the time k +1 is sk+1When decoding iteration starts, the first window adopts the first block data to forward calculate branch state measurement, and the calculation method is as follows:
Figure FDA0002834783890000021
where the branch state metric has four possible state values in total, using gamma, due to the uncertainty of the received information bits and check bitsk(sk-1,sk) Which means, in the following simply called branch metrics,
Figure FDA0002834783890000022
respectively representing system information and check information before encoding,
Figure FDA0002834783890000023
respectively representing information bits and check bits of the received decoding;
3-4, starting to calculate the branch metric and the maximum value of the current state, and respectively and sequentially storing the calculation results; and simultaneously, initializing the backward state metric by a second window, pre-calculating the backward state metric by adopting second block data, and storing a boundary value as an initial value of backward state metric calculation at the next moment.
5. The method according to claim 1, wherein the value of the forward state metric at the current time in step 4 is determined by the value of the previous time, the branch transition metric value at the current time, and the maximum value, and when the calculation of the first maximum value of the branch metric is completed, the first window starts to calculate the forward state metric in the forward direction, and the calculation method is as follows:
αk(sk)=αk-1(sk-1)+γ(sk-1,sk)+max(γ(sk-1,sk))
wherein alpha isk(sk) Denotes the time k skForward metric of state, max (gamma(s)k-1,sk) Represents the maximum value of the branch state metric; alpha is alphak-1(sk-1) Denotes the k-1 time sk-1Forward metrics of the states; the other symbols have the same meanings as above;
and initializing the forward metric value by adopting uniform distribution, and performing summation operation by utilizing the forward metric value of the previous state, the current branch state metric and the maximum value of the branch metric to obtain the forward metric value of the forward state metric.
6. The method according to claim 1, wherein the backward state metric value at the current time in step 5 is determined by a value at a subsequent time, a branch transition metric value at the current time, and a maximum value thereof, as shown in the following formula:
βk(sk)=βk+1(sk+1)+γ(sk+1,sk)+max(γ(sk+1,sk))
in the formula, betak(sk) Denotes the time k skBackward measure of state, betak+1(sk+1) Denotes the k +1 time sk+1Backward measure of state, gamma(s)k+1,sk) Represents the branch transition metric at time k +1, max (γ(s)k+1,sk) Represents the maximum value of the branch transition metric at time k + 1; the other symbols have the same meanings as above;
and adopting a second window to pre-calculate a boundary value stored by the backward measurement value of the second block data as an initial value, and carrying out addition operation through the backward state of the backward measurement value, the branch state measurement and the maximum value of the state branch measurement to obtain the backward measurement value of the backward state measurement.
7. The method for realizing the reduction of the Turbo parallel decoding complexity according to claim 1, wherein the calculation of the external information Le and the backward state metric value in step 7 is performed simultaneously, a forward state metric value and a state branch metric value are read from a cache in sequence, and the addition ratio operation is performed through the forward state metric value, the state branch metric value and the backward state metric value calculated in real time to obtain the external information Le value;
repeating the steps 2-6 for the calculation of the remaining P-1 windows, stopping iteration when the data iteration of the P windows meets a certain number of times, sequentially outputting final likelihood ratio information LLR by a second component decoder, performing de-interleaving on the LLR through an interleaving address index unit, performing output buffer processing after hard decision, and waiting for the next module to read data;
Λ(sk0)=αk-1(sk-1)+γ(sk-1,sk)+βk(sk)
Λ(sk1)=αk-1(sk-1)+γ(sk-1,sk)+βk(sk)
Figure FDA0002834783890000031
LLR=max*0(i))-max*1(i))
wherein, Λ(s)k0) When s represents input of 0kCorresponding LLR, Λ(s)k1) Indicating that input is 1 hour skCorresponding LLR, LeAnd LLR represents external information and external likelihood ratio information of the first component decoding output, respectively; the other symbols have the same meanings as above.
8. A realization method for reducing the complexity of Turbo parallel decoding is characterized by comprising the following modules:
a first module for inputting ping-pong buffer unit, and performing ping-pong RAM buffer to the input soft information sequence according to the state of the component decoder;
a second module for interleaving/deinterleaving the address index unit;
a third module for maximizing branch metrics to reduce coding complexity;
a fourth module for calculating a forward state metric;
a fifth module for calculating a backward state metric;
a sixth module for calculating extrinsic information Le;
and the seventh module is used for repeating iteration in the second module, the third module, the fourth module, the fifth module and the sixth module, and stopping iteration when the data iteration of the P windows meets the preset times.
9. The method according to claim 8, wherein the second module is further configured to sequentially store the intra-interleaving coefficients, index the RAM address according to the frame length information, and obtain the interleaving coefficients corresponding to different frame lengths in real time:
Figure FDA0002834783890000041
wherein the content of the first and second substances,addrithe index i is greater than or equal to 0 and less than or equal to 187, and N represents the frame length information;
and generating fixed interleaving parallel index addresses by computing the interleaving addresses and storing the fixed interleaving parallel index addresses, wherein the same index address is adopted for interleaving and deinterleaving:
f(x)=(f1*x+f2*x2)mod N 0≤x≤N-1
wherein f1 and f2 respectively represent inner interleaving coefficients, x represents an information address before interleaving, f (x) represents an information address after interleaving, mod represents remainder, and N represents a frame length, namely interleaving depth;
the third module is further configured to set an input information frame length to be N, divide the frame length into P blocks, where each block of data length L is N/P, adopt P-way parallelism, and adopt P sub-component decoders, that is, divide the frame length into P windows to implement, where each window processes L data;
when detecting that the ping or pong RAM buffer finishes the indication signal, the decoding iteration starts, and an indication signal copy 1 of the current occupied ping or pong RAM is fed back to represent that the data is read from the ping or pong RAM;
setting the current decoding time as k, the state of k time is skThe state at the time k-1 is sk-1The state at the time k +1 is sk+1When decoding iteration starts, the first window adopts the first block data to forward calculate branch state measurement, and the calculation method is as follows:
Figure FDA0002834783890000042
where the branch state metric has four possible state values in total, using gamma, due to the uncertainty of the received information bits and check bitsk(sk-1,sk) Which means, in the following simply called branch metrics,
Figure FDA0002834783890000043
respectively representing system information and check information before encoding,
Figure FDA0002834783890000044
information bits and check bits representing a reception decoding;
calculating the branch measurement and the maximum value of the current state, and respectively and sequentially storing the calculation results; meanwhile, the second window initializes the backward state measurement, pre-calculates the backward state measurement by adopting second block data, and stores a boundary value as an initial value of backward state measurement calculation at the next moment;
the fourth module is further used for initializing the forward metric value by adopting uniform distribution, and performing summation operation by utilizing the forward metric value of the previous state, the current branch state metric and the maximum value of the branch metric to obtain the forward metric value of the forward state metric; the value of the forward state metric at the current moment is determined by the value of the previous moment, the branch transition metric value at the current moment and the maximum value, when the calculation of the maximum value of the first branch metric is completed, the first window starts to calculate the forward state metric in the forward direction, and the calculation method is shown as the following formula:
αk(sk)=αk-1(sk-1)+γ(sk-1,sk)+max(γ(sk-1,sk))
wherein alpha isk(sk) Denotes the time k skForward metric of state, max (gamma(s)k-1,sk) Represents the maximum value of the branch state metric; alpha is alphak-1(sk-1) Denotes the k-1 time sk-1Forward metrics of the states; the other symbols have the same meanings as above.
10. The implementation method for reducing the complexity of Turbo parallel decoding according to claim 7, wherein: the fifth module further adopts a second window to pre-calculate a boundary value stored by the backward measurement value of the second block data as an initial value, and performs addition operation through the backward state of the backward measurement value, the branch state measurement and the maximum value of the state branch measurement to obtain the backward measurement value of the backward state measurement; the value of the backward state metric at the current moment is determined by the value of the next moment, the branch transition metric value at the current moment and the maximum value thereof, as shown in the formula:
βk(sk)=βk+1(sk+1)+γ(sk+1,sk)+max(γ(sk+1,sk))
in the formula, betak(sk) Denotes the time k skBackward measure of state, betak+1(sk+1) Denotes the k +1 time sk+1Backward measure of state, gamma(s)k+1,sk) Represents the branch transition metric at time k +1, max (γ(s)k+1,sk) Represents the maximum value of the branch transition metric at time k + 1; the other symbols have the same meanings as above;
the calculation of the external information Le and the backward state metric value in the seventh module is carried out simultaneously, the forward state metric value and the state branch metric value are read from the cache in sequence, and the addition ratio operation is carried out through the forward state metric value, the state branch metric value and the backward state metric value calculated in real time to obtain the external information Le value;
repeating the steps 2-6 for the calculation of the remaining P-1 windows, stopping iteration when the data iteration of the P windows meets a certain number of times, sequentially outputting final likelihood ratio information LLR by a second component decoder, performing de-interleaving on the LLR through an interleaving address index unit, performing output buffer processing after hard decision, and waiting for the next module to read data;
Λ(sk0)=αk-1(sk-1)+γ(sk-1,sk)+βk(sk)
Λ(sk1)=αk-1(sk-1)+γ(sk-1,sk)+βk(sk)
Figure FDA0002834783890000051
LLR=max*0(i))-max*1(i))
wherein, Λ(s)k0) When s represents input of 0kCorresponding LLR, Λ(s)k1) Indicating that input is 1 hour skCorresponding LLR, LeAnd LLR represents the first component, respectivelyAnd decoding the output external information and the external likelihood ratio information, wherein the meanings of the rest symbols are the same as the above.
CN202011474548.1A 2020-12-14 2020-12-14 Implementation method and system for reducing complexity of Turbo parallel decoding Pending CN112398487A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011474548.1A CN112398487A (en) 2020-12-14 2020-12-14 Implementation method and system for reducing complexity of Turbo parallel decoding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011474548.1A CN112398487A (en) 2020-12-14 2020-12-14 Implementation method and system for reducing complexity of Turbo parallel decoding

Publications (1)

Publication Number Publication Date
CN112398487A true CN112398487A (en) 2021-02-23

Family

ID=74625240

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011474548.1A Pending CN112398487A (en) 2020-12-14 2020-12-14 Implementation method and system for reducing complexity of Turbo parallel decoding

Country Status (1)

Country Link
CN (1) CN112398487A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113765622A (en) * 2021-08-26 2021-12-07 希诺麦田技术(深圳)有限公司 Branch measurement initialization method, device, equipment and storage medium
CN114553370A (en) * 2022-01-19 2022-05-27 北京理工大学 Decoding method, decoder, electronic device, and storage medium
CN114567411A (en) * 2022-01-19 2022-05-31 北京理工大学 Decoding method, decoding device, electronic equipment and storage medium
CN114598419A (en) * 2021-11-12 2022-06-07 北京智芯微电子科技有限公司 Interleaver, deinterleaver, and methods of performing the same

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040039769A1 (en) * 2002-08-20 2004-02-26 Nec Electronics Corporation Method for decoding error correcting code, its program and its device
CN101026439A (en) * 2007-02-07 2007-08-29 重庆重邮信科股份有限公司 Decoding method for increasing Turbo code decoding rate
CN103873073A (en) * 2014-03-20 2014-06-18 北京遥测技术研究所 Turbo code high-speed decoding method based on parallel and windowing structure
CN104092470A (en) * 2014-07-25 2014-10-08 中国人民解放军国防科学技术大学 Turbo code coding device and method
CN104168032A (en) * 2014-08-16 2014-11-26 复旦大学 High-performance 16-base Turbo decoder with four degrees of parallelism and compatibility with LTE and WiMAX
CN105634508A (en) * 2015-12-21 2016-06-01 西安空间无线电技术研究所 Realization method of low complexity performance limit approximate Turbo decoder

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040039769A1 (en) * 2002-08-20 2004-02-26 Nec Electronics Corporation Method for decoding error correcting code, its program and its device
CN101026439A (en) * 2007-02-07 2007-08-29 重庆重邮信科股份有限公司 Decoding method for increasing Turbo code decoding rate
CN103873073A (en) * 2014-03-20 2014-06-18 北京遥测技术研究所 Turbo code high-speed decoding method based on parallel and windowing structure
CN104092470A (en) * 2014-07-25 2014-10-08 中国人民解放军国防科学技术大学 Turbo code coding device and method
CN104168032A (en) * 2014-08-16 2014-11-26 复旦大学 High-performance 16-base Turbo decoder with four degrees of parallelism and compatibility with LTE and WiMAX
CN105634508A (en) * 2015-12-21 2016-06-01 西安空间无线电技术研究所 Realization method of low complexity performance limit approximate Turbo decoder

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
沈业兵;罗常青;安建平;程波;: "突发通信中Turbo码的FPGA实现", 电子技术应用, no. 10, 30 November 2006 (2006-11-30) *
赵旦峰;罗清华;焉晓贞;: "一种改进Turbo码译码器的FPGA设计与实现", 电子技术应用, no. 03, 6 March 2008 (2008-03-06) *
陈俊霖;朱光喜;: "下一代移动通信系统高速并行Turbo译码研究与FPGA实现", 电子技术应用, no. 09, 30 October 2006 (2006-10-30) *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113765622A (en) * 2021-08-26 2021-12-07 希诺麦田技术(深圳)有限公司 Branch measurement initialization method, device, equipment and storage medium
CN113765622B (en) * 2021-08-26 2024-01-23 希诺麦田技术(深圳)有限公司 Branch metric initializing method, device, equipment and storage medium
CN114598419A (en) * 2021-11-12 2022-06-07 北京智芯微电子科技有限公司 Interleaver, deinterleaver, and methods of performing the same
CN114598419B (en) * 2021-11-12 2023-08-01 北京智芯微电子科技有限公司 Interleaver, deinterleaver, and methods of performing the same
CN114553370A (en) * 2022-01-19 2022-05-27 北京理工大学 Decoding method, decoder, electronic device, and storage medium
CN114567411A (en) * 2022-01-19 2022-05-31 北京理工大学 Decoding method, decoding device, electronic equipment and storage medium
CN114567411B (en) * 2022-01-19 2024-03-15 北京理工大学 Decoding method, decoding device, electronic equipment and storage medium
CN114553370B (en) * 2022-01-19 2024-03-15 北京理工大学 Decoding method, decoder, electronic device and storage medium

Similar Documents

Publication Publication Date Title
CN112398487A (en) Implementation method and system for reducing complexity of Turbo parallel decoding
JP5479580B2 (en) Method and apparatus for parallel TURBO decoding in LTE
CN102638278B (en) Iterative decoder
US7530011B2 (en) Turbo decoding method and turbo decoding apparatus
US6799295B2 (en) High speed turbo codes decoder for 3G using pipelined SISO log-map decoders architecture
CN101388674B (en) Decoding method, decoder and Turbo code decoder
US7539256B2 (en) Decoding device and decoding method
US8112698B2 (en) High speed turbo codes decoder for 3G using pipelined SISO Log-MAP decoders architecture
CN1461528A (en) Iteration terminating using quality index criteria of Turbo codes
CN101026439A (en) Decoding method for increasing Turbo code decoding rate
WO2012034097A1 (en) Accessing memory during parallel turbo decoding
CN105634508A (en) Realization method of low complexity performance limit approximate Turbo decoder
RU2571597C2 (en) Turbocode decoding method and device
US20130007568A1 (en) Error correcting code decoding device, error correcting code decoding method and error correcting code decoding program
US6868518B2 (en) Look-up table addressing scheme
US7925964B2 (en) High-throughput memory-efficient BI-SOVA decoder architecture
CN103856218B (en) Decoding process method and decoder
US6886127B2 (en) Implementation of a turbo decoder
US20030023919A1 (en) Stop iteration criterion for turbo decoding
CN1133276C (en) Decoding method and decoder for high-speed parallel cascade codes
US7917834B2 (en) Apparatus and method for computing LLR
CN107565983B (en) Turbo code decoding method and device
CN214707679U (en) Convolutional Turbo code decoder based on MAP algorithm
KR100625242B1 (en) Apparatus and method for turbo decoder
CN113872615A (en) Variable-length Turbo code decoder device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 211100 floor 1-3, auxiliary building, building 6, artificial intelligence Industrial Park, Nanjing, Jiangsu Province

Applicant after: Zhongke Nanjing mobile communication and computing Innovation Research Institute

Address before: 211100 floor 1-3, auxiliary building, building 6, Nanjing artificial intelligence Industrial Park, Nanjing City, Jiangsu Province

Applicant before: INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES, NANJING INSTITUTE OF MOBILE COMMUNICATIONS AND COMPUTING INNOVATION

CB02 Change of applicant information