CN1148881C

CN1148881C - Realising method for parallel cascade convolution code hardware decoder

Info

Publication number: CN1148881C
Application number: CNB021004293A
Authority: CN
Inventors: 国卫; 卫国; 黄源良; 赵春明
Original assignee: University of Science and Technology of China USTC; Research Institute of Telecommunications Transmission Ministry of Industry and Information Technology
Current assignee: University of Science and Technology of China USTC; Research Institute of Telecommunications Transmission Ministry of Industry and Information Technology
Priority date: 2002-01-30
Filing date: 2002-01-30
Publication date: 2004-05-05
Anticipated expiration: 2022-01-30
Also published as: CN1378345A

Abstract

The present invention relates to an achieving method for a parallel cascade convolution code hardware decoder, and in order to meet the saving requirements of FSMC and RSMC for a storage, the parallel cascade convolution code hardware decoder decreases the loading and unloading capacity of the data of storages by increasing the use of logical resources when the use of a site programmable logical gate array (FPGA) is achieved. The parallel cascade convolution code hardware decoder has the particular method that two sets of RSMC are used, the used two sets of RSMC simultaneously work and start iteration from different initial moments, and a correct data output part is selected. The achieving method for a parallel cascade convolution code hardware decoder has the advantages that the loading and unloading capacity of the data of a storage unit in the achieving procedure of a decoded process only depends on a numerical value L instead of the number N of the data to be decoded, and the requirements can be met when the L is larger than 5 times of constraint length of a member cascade convolution code encoder. When the N is remotely larger than the L, the requirements can be met under general condition, and the using number of the storages can be greatly decreased. Because of synchronous calculation, the number of FSM needing to be stored is correspondingly decreased, and therefore, the loading and unloading capacity of the data of the storages can be decreased.

Description

A kind of implementation method of parallel cascade convolution code hardware decoder

(1) technical field:

The invention belongs to communication technical field, the simplification that relates in particular to a kind of error correcting code decoding device realizes.

(2) background technology:

Parallel cascade convolution code (Turbo code) is a kind of new error control coding that the people such as Berrou of France proposed in 1993, in additive white Gaussian noise channel, its error-correcting performance approaches shannon limit, reaches than the better error-correcting performance of original error control coding.The coding principle of Turbo code is with the parallel connection of two encoder for convolution codess, uses interleave unit to separate.From the overall effect, the structure of encoder has improved the range distribution of entire coded sequence, makes it correct the ability of error code, and the ability of particularly correcting the channel burst error code has obtained reinforcement.

Traditional decoding device adopts the multi-stage iteration decoding, (Berrou in existing hard-wired iterative decoding algorithm, C., Glavieux, A.And Thitimajshima, P.Near Shannon Limit Error-CorrectingCoding and Decoding:Turbo Codes, Proc.Of ICC ' 93,1064-1070), need memory to deposit the intermediateness value.When data volume is bigger; The usage quantity of memory will sharply increase, and read-write becomes very frequent.If adopt the chip external memory chip, can make that again the arithmetic speed of decoder is restricted.The time delay of decoding and data bulk to be decoded are proportional, can increase sharply along with increasing of data to be decoded.

(3) summary of the invention:

The object of the present invention is to provide a kind of device that can improve Turbo code hardware decoding efficiency.

Device of the present invention is the specific implementation that reduces the throughput of memory data in the Turbo code decoding algorithm, relates to the using method of hardware logic resource.

The employed decoding algorithm of device of the present invention is the maximum posteriori decoding criterion (LOG-MAP) on the log-domain, comprise following four elementary cells: forward state metric unit (Forward State Metric Calculator, be FSMC), reverse state metric unit (Reverse State Metric Calculator, be RSMC), path metric unit (Branch Metric Calculator, be BMC) and log-likelihood calculations unit (Log LikehoodRatio Calculator, i.e. LLRC).Employing can become signed magnitude arithmetic(al) with the multiplication and division computing in the calculating of log-domain, reduces computation complexity.

The intermediateness value that device of the present invention need be stored is forward state metric (FSM) and branch road state measurement (BM), can calculate reverse state metric (RSM) easily like this, and guarantees that the likelihood ratio that finally obtains is output synchronously.

Device of the present invention is in order to save FSMC and the RSMC demand for memory, when adopting field programmable gate array (FPGA) to realize, by increasing the minimizing that is used for exchanging for the memory data throughput that makes of logical resource.Concrete grammar is to use two cover RSMC, and two RSMC of use work simultaneously, and begin iteration from different initial times, choose correct data output unit, the selection of iteration initial time does not rely on the quantity N of data to be decoded, and chooses a numerical value L, satisfy L＜＜N.State measurement can begin recursion from any moment.If the initial recursion of selecting is not the moment that last data to be decoded enters decoding device constantly, then the metric that calculates in the some length of beginning is incorrect certainly, but after state transitions after a while (general required time is the constraint length of the several convolution codes of experience), metric will with the same from the situation of last current state recursion be correct, this has just guaranteed the correctness of decoding algorithm.

The present invention can require to adopt different L values according to specific performance when practical application.The value of L is big more, and then the usage quantity of memory is just many more, and throughput is also just big more.Otherwise throughput can reduce.But it should be noted that in application process of the present invention, the value of L can not obtain too small, otherwise the result that RSMC calculates can can't approach the performance of traditional algorithm because the number of times of reverse recursion is very few.Generally, can be integer, but generally be taken as 2 integral number power, constant 16,32 or 64 to N.

Beneficial effect of the present invention: superiority of the present invention be the to decode data throughout of memory cell in the implementation procedure does not rely on the quantity N of data to be decoded, and only depend on numerical value L, and L can meet the demands greater than 5 times of the constraint length of member's encoder for convolution codes.Generally speaking, N＞＞L promptly can meet the demands, and memory-aided quantity can be significantly reduced.Need the also corresponding minimizing that is directly proportional of FSM of storage because calculate reason synchronously, make the data throughout of memory reduce.The decode time of apparatus of the present invention postpones low, as long as finish the calculating and the storage of L forward state metric and reverse state metric, can carry out the calculating of log-likelihood ratio, has shortened the idle waiting time of LLRC, makes the moment of decoded data output shift to an earlier date.

(4) description of drawings:

Fig. 1 is the realization block diagram of the maximum posteriori decoding algorithm on traditional log-domain;

The realization block diagram that Fig. 2 uses for the present invention;

Fig. 3 is the decoding unit time chart of the maximum posteriori decoding algorithm on traditional log-domain;

Fig. 4 is a decoding unit time chart of the present invention;

Fig. 5 is a decoding unit performance of BER curve of the present invention;

Fig. 6 realizes block diagram for the decoding unit that increases by two reverse state metric unit;

Fig. 7 realizes block diagram for the decoding unit that uses non-log-domain MAP algorithm;

Fig. 8 is that the decoding unit of max-log-MAP decoding algorithm is realized block diagram;

Fig. 9 is for realizing a kind of hardware configuration of BMC;

Figure 10 is for realizing a kind of hardware configuration of FSMC.

(5) embodiment:

Below in conjunction with accompanying drawing the embodiment of the invention is further described.

The realization block diagram of the maximum posteriori decoding algorithm on accompanying drawing 1. traditional log-domains.Wherein 1 is path metric unit B MC, and 2 is forward state metric unit F SMC, and 3 is reverse state metric unit R SMC, and 4 is log-likelihood calculations unit LLRC, and 5 is memory cell, and the size of memory is identical with data length N to be decoded.

Make following agreement for convenience of description: original information bits part x in the data to be decoded _k, verification information bit part y in the data to be decoded _k, log-likelihood specific output L _k

The realization block diagram that accompanying drawing 2. the present invention use.Wherein 3_I and 3_II are two reverse state metric unit R SMC, and 6 also is memory cell, but and accompanying drawing 1 in memory cell 5 be distinguishing, its memory capacity is identical with the L value.The capacity that is to say memory cell 6 is generally much smaller than the capacity of memory cell 5.

In the traditional algorithm, wait until the FSMC of data correspondence to be decoded of all N length and RSMC calculates and storage after, could start L _kCalculating, the time-delay of decoding is big.Concrete timing diagram is referring to accompanying drawing 3.

On original architecture basics, increased by one group of reverse state metric unit 3, and worked simultaneously among the present invention.The sequential relationship of decoding is referring to shown in the accompanying drawing 4: x _kAnd y _kBe input to this unit, calculate path metric value and storage by path metric unit 1, postpone 2L constantly, start forward state metric unit 2 and calculate the forward state metric value from initial condition, its result of calculation is correct all the time; Existing two reverse state metric calculation unit 3, unit 3_I starts constantly at 2L, read BM (2L)---BM (0) value is calculated reverse state metric RSM_I, attention is from RSM_I (2L)--and-RSM_I (L) is insecure, dot in the drawings, unit 3_II starts constantly at 3L, reads BM (3L)--and BM (L) value is calculated reverse state metric RSM_II, and RSM_II (3L)---RSM_II (2L) is insecure, dots in the drawings.From 3L constantly, from RSM_I, RSM_II, select a reliable side, get the forward state metric FSM that calculated and stored before this moment, calculate output log-likelihood ratio Lk as reverse state metric RSM.The present invention brings the significantly minimizing that memory uses because increased a reverse state metric calculation unit.

Business with descending 384k among the WCDMA is an example below, and in a decoding unit, depositing BM needs 4L * 32=128L bit memory space, and depositing FSM needs 2L * 64=128L bit memory space, gets L=32, then needs 8192 bit memory spaces altogether.As can be seen, this shortcut calculation has following advantage: consider that input encoded data 6bit quantizes, simulation result is referring to accompanying drawing 5, can show that intermediate variable and common LOG-MAP algorithm that the present invention obtains are as good as, and do not influence decoding performance, but can save a large amount of memory spaces, therefore can in monolithic FPGA, realize, do not need to be visited the speed limit of external memory, can greatly utilize the hardware speed of FPGA, improve decode rate; Likelihood ratio Lk can export continuously, only postpones 3L and has saved the time on the other hand constantly, and speed has improved one times nearly.

Realize the Turbo decoding in the WCDMA system, the present invention adopts the LOG-MAP algorithm of above-mentioned correction when the deconvolution sign indicating number of decoding unit inside, reuse decoding unit on the time at a high speed and finish the level Four decoding.Only need obtain the indication of data block size N, the inner interleaving scheme of Turbo code that proposes according to agreement calculates the required interlacing pattern of generation decoding in real time.The APEX20K400 of a slice altera corp has been adopted in the realization of embodiment, and data are 6bit behind the coding of input, and BM, FSM, RSM and Lk value all are 8bit, shared 4795 logical blocks and 167680bit memory.Adopt the 30M clock in the sheet, 8 states for member's convolution code adopt the parallel processing mode, promptly clock is handled 8 state values of data, the recycling decoding unit is finished the level Four decoding on time, therefore the maximum data rate that can reach of decoding is 30M/ (4 * 2)=3.75M bit/s, has reached the requirement of WCDMA predetermined data speed 2M bit/s.

At 384k business in the WCDMA agreement, the data block size is 4224, at awgn channel, under the condition of each given signal to noise ratio hardware decoder is carried out the emulation of Turbo decoding performance, altogether emulation 4224 * 1000=4224000 data.Simulation result as shown in Figure 5.Come as can be seen, along with decoding progression increases, decoding performance is significantly improved.The level Four decoding that we realized, when signal to noise ratio 1.8dB, decoded bit error rate is 7.5758e-006.

Need to prove that the present invention can be applied to the maximum posteriori decoding algorithm (LOG-MAP) on the log-domain, also can be applied to the decoding algorithm that other similarly uses the memory stores intermediateness, decode when adopting other Turbo code decoding algorithm, as long as relate to equally in the algorithm when using memory to finish the storage of intermediateness, device of the present invention just can continue to use.For example widely used deformation algorithm max-log-MAP algorithm on the engineering is exactly an example.Because the MAP algorithm computation amount of log-domain is still bigger,, adopted mathematical one to be similar to so can do further simplification:

\ln (\underset{i}{Σ} e^{x_{i}}) \approx \max_{i} (x_{i})

Like this logarithm operation is converted into the maximizing computing, the complexity of computing has reduced, and that is to say that each state measurement all becomes the computing of maximizing.But, with the log-domain MAP class of algorithms seemingly, this algorithm still needs to store forward state metric (FSM) and the branch road state measurement (BM) that intermediate computations obtains.Accompanying drawing 6 has provided the realization block diagram of max-log-MAP algorithm.Wherein, path metric unit 7, forward state metric unit 8, reverse state metric calculation unit 9 and likelihood ratio computing unit 10 have all been used the max-log-MAP algorithm.Memory cell 6 is exactly an employed memory among the present invention, and overall structure does not have big variation.

Increase a reverse state metric unit in the decoding device that the present invention proposes, significantly reduced the throughput of memory.Can increase more reverse state metric unit, further reduce the efficient of memory cell.But the quantity that increases can not be too much, otherwise the complexity of hardware logic is strengthened, and it is loaded down with trivial details that sequencing control also becomes.In general, it has been satisfied using the result of two reverse metrics.Accompanying drawing 7 has provided the realization block diagram that has increased by two reverse state metric unit.The realization of concrete unit is consistent with appropriate section in the accompanying drawing 2 among the figure.

The present invention's proposition is based on log-domain, and purpose is to reduce computation complexity, is beneficial to Project Realization.But structure of the present invention is applicable to that fully the decoding on the non-log-domain realizes (this algorithm can be referred to as non-log-domain MAP algorithm), just the computation complexity height from the principle.Accompanying drawing 8 has provided the block diagram of finishing decoding in non-log-domain.

The present invention goes up the decoding process that reuses decoding unit at a high speed the time that goes for, and also goes for non-multiplexing decoding process.

Provide a kind of specific implementation that is applicable to each metric element of the present invention below, with as the reference of specifically finishing decoding unit.Accompanying drawing 9 provides a kind of implementation of path metric unit (BMC), wherein | and x _k| and | y _k| be respectively to input x _kAnd y _kFinish the computing that takes absolute value, BM _k ^{I, j}(i, j ∈ 0,1}) path metric (BM) for calculating.Accompanying drawing 10 has provided a kind of implementation of forward state metric unit (FSMC), and wherein the min unit is the computing of minimizing, and the E unit is with after the BM of correspondence and the FSM addition, finishes the logarithm evaluation again by tabling look-up.From figure, can know the process of seeing the FSM iteration, promptly how try to achieve the value of k state from the FSM of k-1 state.The realization of reverse state metric unit (RSMC) and the realization of FSMC are very similar, be noted that should be from the RSM of k state the anti-value that obtains the k-1 state that pushes away.Also be to use E unit and min unit to adjudicate the final likelihood ratio of output in the middle of the log-likelihood calculations unit (LLRC), just no longer draw the realization block diagram of RSMC and LLRC at this.

Claims

1. the implementation method of a parallel cascade convolution code hardware decoder, comprise and use the forward state metric unit, the path metric unit, log-likelihood calculations unit and at least two cover reverse state metric unit, it is characterized in that: forward state metric unit and described at least two cover reverse state metric unit are connected in parallel and work simultaneously in described at least two cover reverse state metric unit, and begin iteration from different initial times, choose correct data output unit, the selection of iteration initial time does not rely on the quantity N of data to be decoded, and choosing a numerical value L, L can be 1 to N integer

2. the implementation method of a kind of parallel cascade convolution code hardware decoder as claimed in claim 1 is characterized in that the value of described L can be taken as 2 integral number power.

3. the implementation method of a kind of parallel cascade convolution code hardware decoder as claimed in claim 2 is characterized in that described L value desirable 16,32 and 64.

4. the implementation method of a kind of parallel cascade convolution code hardware decoder as claimed in claim 1, it is characterized in that to be applied to the maximum posteriori decoding algorithm on the log-domain, also can be applied to the decoding algorithm that other uses the memory stores intermediateness.

5. the implementation method of a kind of parallel cascade convolution code hardware decoder as claimed in claim 1 is characterized in that increasing by two or three reverse state metric unit, to exchange the further minimizing that memory uses for.

6. the implementation method of a kind of parallel cascade convolution code hardware decoder as claimed in claim 1 is characterized in that not only going for the decoding algorithm of log-domain, also goes for the decoding algorithm of non-log-domain.

7. the implementation method of a kind of parallel cascade convolution code hardware decoder as claimed in claim 1 is characterized in that the last decoding process that reuses decoding unit at a high speed of the time of going for, and also goes for non-multiplexing decoding process.