CN104168032A - High-performance 16-base Turbo decoder with four degrees of parallelism and compatibility with LTE and WiMAX - Google Patents

High-performance 16-base Turbo decoder with four degrees of parallelism and compatibility with LTE and WiMAX Download PDF

Info

Publication number
CN104168032A
CN104168032A CN201410403907.2A CN201410403907A CN104168032A CN 104168032 A CN104168032 A CN 104168032A CN 201410403907 A CN201410403907 A CN 201410403907A CN 104168032 A CN104168032 A CN 104168032A
Authority
CN
China
Prior art keywords
information
decoder
memory cell
decoding
external information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410403907.2A
Other languages
Chinese (zh)
Inventor
陈赟
陈旭斌
程劼
曾晓洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fudan University
Original Assignee
Fudan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fudan University filed Critical Fudan University
Priority to CN201410403907.2A priority Critical patent/CN104168032A/en
Publication of CN104168032A publication Critical patent/CN104168032A/en
Pending legal-status Critical Current

Links

Landscapes

  • Error Detection And Correction (AREA)

Abstract

The invention belongs to the field of the wireless communication technology, and particularly discloses a high-performance16-base Turbo decoder with four degrees of parallelism and compatibility with LTE standards and WiMAX standards. A four-degree-of-parallelism 16-base structure is adopted for the Turbo decoder, a chip is designed on the basis of the TSMC65nm1P9MLPCMOS process, the area of the kernel of the chip is 1.39 mm<2>, and the maximum clock frequency is 600 MHz. Compared with other Turbo decoders, the high-performance16-base Turbo decoder with four degrees of parallelism and compatibility with LTE and WiMAX has the advantages that a new high-base decoding structure is put forward, and complexity of a recursive arithmetic unit is lowered; in terms of a high-degree-of-parallelism and high-base decoder, a simplified recursive algorithm is put forward, and recursive algorithms in the LTE standards and the WiMAX standards can be composed on the basis of simple barrel shift networks and two-dimensional address generators; the structure of each storage is optimized, storage access conflicts are avoided, and the space and the number of the storages are reduced; algorithm complexity is lowered through algorithm optimization and resource sharing.

Description

4 degree of parallelisms, the base-16 high-performance Turbo decoder of compatible LTE and WiMAX
Technical field
The invention belongs to wireless communication technology field, be specifically related to employing 4 degree of parallelisms of a kind of compatible LTE and WiMAX standard, the Turbo decoder of base-16 structure.
Background technology
The people such as the C. Berrou of France are in the theoretical foundation of convolution code and cascaded code, a kind of brand-new coding and decoding scheme---Turbo code was proposed in 1993, obtain the error-correcting performance that approaches shannon limit, in the evolution of information theory, obtained breakthrough, there is milestone significance.Turbo code is a kind of parallel cascade codes, is made up of two less recursive systematic convolutional codes (RSC) of constraint length, between two component codings, is connected by a pseudo random interleaver.Turbo decoding, based on improved maximal posterior probability algorithm (also referred to as bcjr algorithm) and introduced iterative decoding thought, has obtained very excellent coding gain.Nonbinary Turbo code was suggested in 1999, in the single clock cycle in its coding and decoding process, can process multiple information bits, thereby reduce the interleaver depth under equal code length, reduced decoding delay, and there is even better performance identical with Binary Turbo codes.
Owing to thering is decoding performance excellence, coding is simple, code length and code check can be adjusted flexibly, the series of advantages such as hardware complexity is moderate, Turbo code, once proposing to be just applied to the numerous areas such as deep space communication, mobile communication, DMB, power line communication rapidly, has especially obtained unprecedented success in mobile communication system.Table 1-1 has listed and has adopted Turbo code some Typical Representatives as chnnel coding.
People, to demand high-speed, that high quality broadband communication rapidly increases, have promoted the development of wireless communication technology and the update of communication standard greatly.During the decade short, the third generation (3G) mobile communication standard that the normal structures such as 3GPP, 3GPP2, IEEE are formulated has obtained fast development, and super three generations (B3G) standard is also in continuous evolution process.By introducing a series of technology such as OFDM, MIMO, high order modulation, advanced chnnel coding, new communication standard has been realized the continuous lifting of data throughput and the availability of frequency spectrum.For example HSPA+ standard will progressively be transitioned into 672Mbps peak value throughput from 14.4Mbps.And 3GPP tissue and the IEEE 802.16 technology collection to 4G mobile communication for International Telecommunication Union, LTE-Advanced standard and WiMAX-Advanced standard are formulated respectively, its throughput will meet or exceed the long term object of ITU for 4G: when mobile reception, reach 100Mbps, reach 1Gbps when fixed reception.
In the process of these standard evolution, there is not too large variation in the coded system of Turbo code, and is mainly to have changed traditional interleaver, makes it to have the uncontested feature that interweaves so that parallel decoding.However being applied in high-speed mobile communication system of future generation, still there is lot of challenges in Turbo code.First be the sharp increase of throughput, rise to the desired 1Gbps throughput of following 4G from the throughput of the initial desired 2Mbps of 3G, it is far from being enough that large leap like this relies on merely the development of integrated circuit fabrication process, necessarily requires to excavate from many levels such as decoding algorithm, hardware configuration, circuit realizations the ability of parallel computation; Next is the harsh requirement of mobile terminal for chip area, power consumption, also requires the improvement and bring new ideas of implementation; Be finally mutually to coexist in various standards, in the process of Parallel Development, require the designer of band receiver of base must consider the realization of the configurable terminal of multi-mode, therefore the design of the Turbo decoder of multi-mode multi-standard also becomes one of following developing direction.
Summary of the invention
The object of the present invention is to provide and a kind ofly can support employing 4 degree of parallelisms of binary system and duobinary compatible LTE and WiMAX standard, the Turbo decoder of base-16 structure simultaneously, to be applicable to the 4G mobile communication system such as 3GPP LTE/LTE-Advanced, WiMAX/WiMAX-Advanced (IEEE 802.16e/m).
Turbo decoder provided by the invention, adopts 4 degree of parallelisms, base-16 structure, completes chip design based on TSMC 65nm 1P9M LP CMOS technique, and chip core area is 1.39 mm 2, maximum clock frequency is 600MHz, the high-throughput under 5.5 iteration is: 821Mbps under LTE pattern, 810.6Mbps under WiMAX pattern.There is following improvement compared to other Turbo decoders: (1) proposes new high radix decoding architecture, by comparing in advance the branch in grid chart with same initial state and same state of termination, reduce the complexity of recursive operation unit and reduced critical path simultaneously; (2) for high degree of parallelism, high radix decoder, the interleaver of simplifying has been proposed, the interleaver in LTE and WiMAX standard all can be formed based on simple barrel shift network and two-dimensional address maker; (3) optimize memory construction, avoided memory access conflict, reduced space and the number of memory; (4) by algorithm optimization and resource-sharing, reduced computational complexity.
Turbo decoder provided by the invention, as shown in Figure 8, system configuration comprises its system configuration: the soft inputting and soft output sub-decoder (SISO) of external information memory cell, system information memory cell, check digit 1 memory cell, check digit 2 memory cell, P-road permutation network, base-16, corresponding control module, address generation module and hard decision buffer memory.Wherein, P-road permutation network, soft inputting and soft output sub-decoder and corresponding memory cell are all 4 degree of parallelisms, i.e. P=4 shown in figure.External information memory cell is mainly used to be stored in last iterative process and obtains external information, replace by permutation network, input soft inputting and soft output sub-decoder is as the prior information of next iteration, and the size of external information memory cell is N/P, N is code length, and P is degree of parallelism.System information memory cell is the system information to code word for storing received, the systematic code receiving, be input to soft inputting and soft output sub-decoder by permutation network, carry out iterative decoding as the system information in iterative decoding, system information memory cell size is N/P equally.And check digit 1 memory cell and check digit 2 memory cell are respectively for corresponding 2 check codes of storing received code word, by a selector, during different sub-iteration, send into different check digit, to complete iterative decoding.The effect of address generation module is according to the progress of iterative decoding and requirement, generates corresponding address and visits these memory cell, to read corresponding external information, system information and check digit information.Soft inputting and soft output sub-decoder is the core of whole decoder, input corresponding external information, system information and selection check position by permutation network, obtain the required initial value of decoding, carry out iterative decoding, after sub-iteration completes, obtain posterior information, posterior information stores the external information as next second son iteration in external information memory cell into by permutation network.And control module is for total control address generation module and the process of permutation network and iterative decoding.After iterative decoding completes, the code word that obtains translating by hard decision, exists in hard decision buffer memory, exports as decoding.
In the present invention, system information and external information adopt four-quadrant to divide, to ensure the real-time storage of data in the decoding of base-16.Each clock cycle, 4 parallel sub-decoders need read 16 groups of intrinsic information and external information altogether, they and through 4 group of 4 parallel permutation network in tunnel.After maximum posteriori decoding, each clock cycle writes back 16 class values to external information.Wherein, address generation module provides 16 addresses under LTE pattern, 8 addresses are only provided under WiMAX pattern, their hardware resource can be completely multiplexing, and permutation network has stronger configurability, all code lengths and 1,2 under can two kinds of patterns of complete support, 4 three kinds of degree of parallelisms, permutation network is all based on simple barrel shift network.
Below some modules are described further:
1. configurable base-16 soft inputting and soft output sub-decoder
Soft inputting and soft output sub-decoder is to forward state metric α, the unit that backward state measurement β and branch metric γ calculate, as shown in Figure 9, the RAM that the inside has comprised twoport is for depositing calculative numeral information, branch metric calculation unit calculates after branch metric, be kept in register, for follow-up forward state metric and the calculating of backward state measurement, the forward state metric calculating and backward state measurement are kept in by register again, calculate forward and backward state measurement as next iteration, the corresponding LLR value of forward-backward algorithm metric calculation output that LLR arithmetic element utilization is simultaneously current are preserved.
The soft inputting and soft output sub-decoder of the present invention's design can be supported binary system turbo code and duobinary system turbo code, can support sliding window (SW) and two kinds of modes of two-way simultaneous window (PW), can support information transmission and training calculate two kinds of recursive operation initialization, and according to the pattern of rate adjust window and initialize mode, there is dynamic control power consumption.
In fact this configuration provides several important mechanism: be first to peek by two-way simultaneous, this is the prerequisite that reduces time delay, secondly LLR and backward training can be carried out to hardware resource sharing.Due to LLR arithmetic element can with forward recursive walked abreast also can with backward recursive arithmetic element, therefore the key of concurrency is the concurrency of forward direction and backward recursive.In the middle of traditional sliding window flow process, it should be noted that the grid chart between sub-block joins, therefore also can transmit mode metric.
Number at window has very high similitude in fact compared with hour two kinds of flow processs, and PW is more effective, and at this moment in PW, needed training calculating does not need to increase hardware, only need two existing arithmetic elements to calculate, therefore need classification discussion, comprise a window, two windows, the situation of multiple windows.Training computing in the time of high code check, open, in the time of low code check, can close or reduce to train length, performance loss is not obvious because in the time of different code check performance loss difference, this becomes the basis of optimised power consumption.
2. address generation module---configurable ARP and QPP interleaver
The interleaver of the turbo code using in distinct communication standards is conventionally different, and in the design of many standards decoder, interleaver designs is a key issue flexibly.ARP interleaver is a polynomial form, and as Fig. 5, QPP interleaver is quadratic polynomial form.In fact parallel interleaver can utilize the relation between parallel address, and recurrence relation between adjacent periods address is simplified the calculating of interleaving address.What in WiMAX standard, adopt is ARP interleaver, also can calculate interleaving address by recursive form.
Because window is long larger, write address is larger by the method cost of reading to store address, therefore adopts the mode of calculating in real time here, adopts the address generator of two forward directions; Two backward addresses can be obtained by LIFO, or use two reverse address generators.
3. memory cell
Storage resources under two kinds of patterns can efficient multiplexing, notice that duobinary system turbo code carries out decoding based on symbol, the external information of transmitting between adjacent sub-iteration is 2 times of binary system turbo, by the definition of log-likelihood ratio in algorithm, can be reduced to 1.5 times.For base-16 decoding, application four-quadrant splitting scheme.The each symbol of WiMAX (2-bit information bit) need to be stored 3 external information values, and therefore, under code length same case, WiMAX need to increase by 50% external information memory capacity.External information is converted to bit-level storage from symbol level, while reading, is again converted to symbol level, make like this external information memory cell of duobinary system and traditional binary decoding can be completely multiplexing.But this method can cause the performance loss of 0.1 ~ 0.2dB left and right, in the situation that degree of parallelism is lower, this part external information is higher at the proportion of whole decoder, and in the decoding architecture of highly-parallel, proportion can constantly decline.Therefore the present invention does not adopt this external information conversion method, but retains this department's storage resources to obtain good decoding performance.
Brief description of the drawings
Fig. 1 base-16 grid path in graphs merges schematic diagram.
Fig. 2 tradition base- state measurement recursive operation unit.
Fig. 3 based on path merge base- state measurement recursive operation unit.
Log-likelihood calculations unit, improved base-16 of Fig. 4.
Fig. 5 ARP address generator.
Fig. 6 duobinary system Turbo code grid chart.
Fig. 7 binary system and duobinary system Turbo code grid chart state transitions situation and with the relation of coded message.
Fig. 8 restructural LTE/WiMAX encoder/decoder system structure.
Fig. 9 soft inputting and soft output unit.
Embodiment
Below in conjunction with accompanying drawing to the present invention specifically realize optimization and improvement, be described in further details.
1. the fusion of decoding algorithm in the soft inputting and soft of base-16 output sub-decoder (SISO):
Although binary system and duobinary grid chart are distinct, by making grid chart there is identical annexation after reordering.Concrete, the position of 2,7 two states of WiMAX is exchanged, the position of 3,6 two states exchanges, and just can obtain the grid chart through rearrangement.The grid chart shape of it and Binary Turbo codes is closely similar, and in each like this state measurement recursive operation unit, the input of state variable is consistent, does not therefore need extra selector.Because inside, recursive operation unit is difficult to insert pipeline register conventionally, therefore, this effect that can reduce critical path is very useful.
Can make the two there is similar state transitions situation by rearrangement, but due to their difference of coding structure, its output situation in the situation that input information bits is identical is also inconsistent, therefore in branch metric and log-likelihood calculations, considers this species diversity again.Fig. 6 has provided the situation of two kinds of code state transitions and State-output.For its base-4 grid chart of Binary Turbo codes, obtain by conversion, therefore its state transitions be from moment is transferred to moment.For duobinary system Turbo code, each moment of encoder input dibit, therefore its grid progression only has the half of code length, its state transitions be all from arrive moment.
According to the information bit of each state transitions output, the combined situation of check bit, have 16 kinds of branch metrics, table 2 has provided the branch metric calculation formula after fortran.
Each branch metric can be split as two parts, and wherein a part is only relevant to information bit, and another part is only relevant to check digit.The former is prior information and the intrinsic information sum of information bit, we are referred to as measure information (Information-Only Metric, IOM), and the latter is check digit intrinsic information sum, we are referred to as verification tolerance (Parity-Only Metric, POM).By this fractionation, the branch metric calculation under two kinds of patterns can be completely multiplexing.After the branch metric publicity conversion of introducing, the measure information under LTE pattern has , , , 0 four kinds of values, the measure information under WiMAX pattern has , , , 0 four kinds of values.And verification tolerance is under two kinds of patterns , , , 0 these four kinds of values.Meanwhile, in branch metric buffer memory, only need to store three measure informations and two check digit intrinsic information (or three verification tolerance).
Searching out the similitude of two kinds of pattern inferior division metric calculation formula and carrying out after fortran, the amount of calculation of branch metric greatly reduces, and needed computational resource and storage resources be corresponding reducing also.Adopt original method, the calculating of each branch metric needs 5 adders, altogether needs 80 adders.And adopt after the computational methods of simplifying, altogether only need 13 adders, reduce 83.75%
Table 2 bimodulus base-4 decoding branch metric calculation formula
In order further to make hardware resource better multiplexing, the computing formula of base-4 log-likelihood ratio to two-stage system Turbo code converts, and can obtain the computational methods of the symbol level posterior probability identical with duobinary system Turbo code.32 possible state transitions branches in grid chart, according to the value condition of information bit, are divided into 4 classes, in each class, have 8 kinds of situations, then by the posterior probability of forward state metric and backward state measurement and branch metric addition acquisition symbol level.Due to rear in the computational process of state measurement, as calculated all situations of branch metric and backward state sum, therefore the two sum in formula , , , results of intermediate calculations that can backward state measurement is carried out buffer memory and is obtained, and this improvement can be saved 32 adders, and for high degree of parallelism decoder, this improvement can be saved a large amount of calculation resources and power consumption.
(3)
By 4 symbol level posterior probability , , , can calculate the external information of Binary Turbo codes , or the external information of duobinary system Turbo code , , .But both external information computational process difference are very large, can not allow fusion, need to calculate log-likelihood ratio and the external information of bit-level for LTE pattern, to calculate the soft output of symbol level for WiMAX pattern.Under LTE pattern, their computational methods are:
(4)
(5)
Notice , these two were just calculated before Branch Computed tolerance with value, therefore need not recalculate.
Under WiMAX pattern, their computational methods are:
(6)
(7)
Notice , , these three were just calculated before Branch Computed tolerance with value, therefore need not recalculate.
The output hard decision of Binary Turbo codes can obtain according to bit-level log-likelihood ratio information, is normally obtained by following method for the output judgement of Turbo code:
(8)
The consistency of considering subtraction operation and compare operation, the generation of the two hard decision information all can, according to formula below, be adjudicated according to the sign bit of bit-level posterior probability log-likelihood ratio.
(9)。
The Binary Turbo codes that adopts base-4 decoding, the calculating of its posterior probability also can be converted to the calculating of symbol level probability.It should be noted that, although formula 3 shows that both posterior probability computing formula are in full accord, but because the state transitions situation of the two encoder is also inconsistent, 32 kinds of paths situation in the time being divided into 4 groups according to information bit is also inconsistent, therefore must increase extra data selector, realize two kinds of compatibilities under pattern.
Also can calculate the posterior probability of bit-level for duobinary system Turbo code, but because its coding and decoding process is based on symbol, therefore can cause obvious performance loss.There is document to propose a kind of method converting between symbol level and bit-level, by stored bits level external information, in serial decoding device, reduced the storage resources of 20% left and right, can reduce performance loss by inverse transformation.But existing by performance simulation hair, this method still can cause the BER/FER performance loss of 0.1dB ~ 0.2dB left and right, on the other hand, for high degree of parallelism decoder, external information memory shared area proportion in whole decoder less (in 4 degree of parallelism base-16 decoders that the present invention realizes, external information memory proportion is less than 2%), therefore, symbolization level log-likelihood ratio storage external information of the present invention, to reduce performance loss.
2. a new high radix decoding architecture, by comparing in advance the branch in grid chart with same initial state and same state of termination, reduces the complexity of recursive operation unit and has reduced critical path simultaneously:
Computation complexity increases along with radix increases and is index, and the critical path of state measurement recursive operation unit also can constantly increase, and therefore, must take effective method reduce the complexity of high radix decoding and promote its operating frequency.
Due to the high radix Jia-ratio of state recursive operation unit-select circuit to occupy the more hardware resource of SISO sub-decoder, and limit the maximum clock frequency of decoding, high radix Jia-ratio-select the needs of circuit improve and optimize.There is not recursive operation in the calculating of branch metric and posterior probability, can adopt pipelining to promote operating frequency, but their computational complexity also can be along with radix increases and increases rapidly.Therefore, also the low complex degree of considering branch metric calculation unit and posterior probability computing unit is realized.
Gao Jijia ratio selects circuit implementation to mainly contain two kinds: a kind of is grid chart based on cascade by some groups of low order Jia-ratios-select circuit to carry out cascade, and another kind is that addition and comparison operation are carried out cascade by the grid chart based on merging.The present invention proposes a kind of high radix recursive operation unit of novelty, it is limited observing state number, therefore Gao Jijia can not exceed status number than the possible situation of the state measurement input of selecting circuit, have in the path of common initial state or common state of termination for these, maximum path only need to can obtain by the maximum of respective branch tolerance relatively, instead of must first computing mode tolerance and branch metric with could calculate maximum path.
In the renewal process of each state measurement, need the number of path of comparison for being radix, because the number of state measurement is certain, therefore branched measurement value that only need to be more identical from two paths of same state, is kept to state number by path number relatively like this.This is equivalent to and has compressed the path number that participates in recursive operation, and obviously, in the time that radix is greater than status number, this method is very effective.Because the status number of Turbo code is generally 8, therefore, in the time of base 16, can adopting said method the required 16 road acs unit of base 16 be reduced to 8 road acs units, can reach the effect that reduces complexity and reduce critical path simultaneously.As shown in Figure 1.
Fig. 2 provided traditional base- state measurement recursive operation unit, the number of adder is consistent with radix, and need to be from individual adder output compares maximum.Base provided by the invention- state measurement recursive operation cellular construction as shown in Figure 3, first compares individual branch metric, is divided into 2 mgroup compares, and draws the branch metric of each group of maximum, is then added with the larger branch metric drawing and corresponding state measurement, by 2 mthe comparator of input compares, and finally obtains optimum path.In recursive operation unit in Fig. 3 owing to having compared in advance part branch metric, therefore the number of adder is reduced to status number, also corresponding reducing of the input number of comparator, in reducing computational complexity, also make the critical path of this recursive operation unit reduce, thereby obtain higher operating frequency.
In the decoding of base-16, owing to having increased the step-length that in grid chart, state measurement upgrades, every four grid progression only calculate and store one-level state measurement value, and this makes the calculating of soft output more complicated.The method of calculating the log-likelihood ratio of base-16 has two kinds at present, a kind of method is that 4 grades of 128 kinds of state status corresponding to grid are divided into 8 groups according to every bit value condition, every group has 64 kinds of state transitions situations, this method need to be used 256 adders and 8 Zu64 road comparators, and amount of calculation is very large.Another kind method is the posterior probability that first calculates 4 bit symbols, is then converted to the log-likelihood ratio of bit-level, and this method still needs number of adders constant, and comparator is reduced to 16 group of 8 input comparator and 8 group of 8 input comparator.By using the intermediate object program of add operation in backward state measurement, the number of adder can reduce half.For base-16 in this paper, owing to having simplified the add operation of recursive operation unit, therefore the intermediate object program of adder no longer can be unit multiplexed by log-likelihood calculations.What the present invention taked is the state measurement that first recalculates the middle one-level of 4 grades of grids, then calculate respectively 2 bit symbol posterior probability corresponding to front two-stage and rear two-stage, then be converted to respectively symbol rank probability, by this new method, the number of adder and comparator can reduce, and its complexity is low.The computational methods of log-likelihood ratio in base-16 decoding architecture of the present invention's design, as shown in Figure 4, wherein, base-16 to be split into two base-4 calculate the LLR value of each bit (top and the bottom on the right in figure), in figure, the state measurement in the middle of base-16 is first calculated by ACS-4 in the left side, above 8 ACS-4 unit calculate forward state metric, below 8 ACS-4 calculate backward state measurement, 4 paths before and after comparing respectively, select optimal path and have calculated state measurement.And utilize intermediate value alpha+γ, the β+γ of the calculating of 16 ACS-4, then send into ACS-8 and be added and compare with corresponding state measurement, obtain LL uvthe value of (uv=00,01,10,11), obtains the finally LLR value of each bit according to these values.
3. for high degree of parallelism, high radix decoder, proposed the interleaver of simplifying, the interleaver in LTE and WiMAX standard all can be formed based on simple barrel shift network and two-dimensional address maker:
The interleaver of the turbo code using in distinct communication standards is conventionally different, and in the design of many standards decoder, interleaver designs is a key issue flexibly.ARP interleaver is a polynomial form, and QPP interleaver is quadratic polynomial form.In fact parallel interleaver can utilize the relation between parallel address, and recurrence relation between adjacent periods address is simplified the calculating of interleaving address.What in WiMAX standard, adopt is ARP interleaver, also can calculate interleaving address by recursive form.
Because window is long larger, write address is larger by the method cost of reading to store address, therefore, the present invention adopts the mode of real-time calculating, the address generator that adopts two forward directions, two backward addresses can be obtained by LIFO, or use two reverse address generators.
4. optimize storage structure, avoids memory access conflict, reduces space and the number of memory:
Except total storage size, the bit wide of memory, the degree of depth, type (single port, twoport, SRAM, Register Files etc.), Deng area and power consumption that all can appreciable impact memory, in addition, except these factors, read-write clock frequency, read-write number of times in the unit interval also can appreciable impact power consumption.
In storage resources, the storage size of channel intrinsic information and external information is determined by maximum code length and the lowest bit rate of system, does not change along with sub-block degree of parallelism.Adopt after sliding window setting technique, the branch metric in each MAP sub-decoder and state measurement memory space are by the long decision of window, and the value of boundary condition tolerance reduces along with the increase of degree of parallelism.And when degree of parallelism is lower, channel intrinsic information and external information storage account for the largest percentage, therefore can consider the optimization of its memory.Along with the increase of degree of parallelism, the storage resources proportion in MAP sub-decoder constantly increases.
External information need to provide access hole conventionally simultaneously, and the method therefore realizing has double port memory, adopts two single port memories of ping-pong operation, can be operated in the memory of Clock Doubled.Find by the research to ARP and QPP interleaver, it can be divided into Ji Qiouji or four-quadrant set according to the lowest order of address, and between sequence address and interleaving address, meet the rule of mapping one by one, therefore divide to realize by rational memory block and use single port memory.State measurement employing method for normalizing can increase the bit wide of storage, but the increase of this part is very limited.
5. by algorithm optimization and resource-sharing, reduced computational complexity:
Although the present invention uses after (Scaling)-MAX-Log-MAP decoding algorithm, addition and comparison operation in decode procedure, are only comprised, but meeting under high-throughput requirement, concurrency in algorithm is fully excavated, and be mapped to a large amount of hardware resources, how in the process that completes necessary calculating, to excavate shared data and computing can be saved great amount of hardware resources.
From the arthmetic statement process of MAX-Log-MAP, can see that in decoding, main arithmetic operation comprises: branch metric calculation, state measurement recursive calculation, posterior probability are calculated, in fact in existing document, conventionally only describe state measurement recursive calculation and improve one's methods, and branch metric and posterior probability are calculated description and improve less, but in high radix decoding circuit, the area that Zhe Liang department is shared and power dissipation ratio regular meeting enlarge markedly, and therefore become one of key of design of encoder.
The computation complexity of branch metric is also relevant to lowest bit rate and decoding radix that system is supported.The lowest bit rate of CDMA2000, the EV-DO that for example 3GPP2 organizes to set up, middle regulation is 1/5, and along with the increase of decoding radix, the complexity of branch metric also increases rapidly on the other hand.
What in the calculating due to Max-Log-MAP algorithm state tolerance and posterior probability, rely on is the relative size of state measurement and branch metric, therefore can do a fortran to branch metric, difference is participated in to computing as new branch metric, thereby reduce storage and the amount of calculation of branch metric.Be decoded as example with base-2, original branch metric formula is:
(1)
Here we deduct 4 kinds of branch metrics this branch metric, thereby the branch metric calculation formula being simplified:
(2)。
Table 1 has shown and utilizes above-mentioned fortran's method can save more calculation resources.
The branch metric calculation formula (3GPP standard, 1/3 code check) of table 1 base-4 decoding architecture

Claims (6)

1. 4 degree of parallelisms of compatible LTE and WiMAX, a base-16 high-performance Turbo decoder, is characterized in that comprising: the soft inputting and soft output sub-decoder of external information memory cell, system information memory cell, check digit 1 memory cell, check digit 2 memory cell, P-road permutation network, base-16, corresponding control module, address generation module and hard decision buffer memory; Wherein, road permutation network, soft inputting and soft output sub-decoder and corresponding memory cell are all 4 degree of parallelisms; External information memory cell is mainly used in being stored in last iterative process and obtains external information, replace by permutation network, input soft inputting and soft output sub-decoder is as the prior information of next iteration, and the size of external information memory cell is N/P, N is code length, and P is degree of parallelism; System information memory cell is the system information to code word for storing received, the systematic code receiving, be input to soft inputting and soft output sub-decoder by permutation network, carry out iterative decoding as the system information in iterative decoding, system information memory cell size is N/P equally; Check digit 1 memory cell and check digit 2 memory cell for corresponding 2 check codes of storing received code word, by a selector, are sent into different check digit, to complete iterative decoding respectively during different sub-iteration; The effect of address generation module is according to the progress of iterative decoding and requirement, generates corresponding address and visits these memory cell, to read corresponding external information, system information and check digit information; Soft inputting and soft output sub-decoder is to input corresponding external information, system information and selection check position by permutation network, obtain the required initial value of decoding, carry out iterative decoding, after sub-iteration completes, obtain posterior information, posterior information stores the external information as next second son iteration in external information memory cell into by permutation network; Control module is for total control address generation module and the process of permutation network and iterative decoding; After iterative decoding completes, the code word that obtains translating by hard decision, exists in hard decision buffer memory, exports as decoding.
2. Turbo decoder according to claim 1, is characterized in that, system information and external information adopt four-quadrant to divide, to ensure the real-time storage of data in the decoding of base-16; 4 of each clock cycle, parallel soft inputting and softs output sub-decoders need read 16 groups of intrinsic information and external information altogether, they and through 4 group of 4 parallel permutation network in tunnel; After maximum posteriori decoding, each clock cycle writes back 16 class values to external information; Wherein, address generation module provides 16 addresses under LTE pattern, and under WiMAX pattern only with 8 addresses are provided, their hardware resource is completely multiplexing; Permutation network has configurability, can be complete all code lengths and 1,2 under support LTE pattern and WiMAX pattern, 4 three kinds of degree of parallelisms, permutation network all adopts based on simple barrel shift network.
3. Turbo decoder according to claim 1, it is characterized in that, described soft inputting and soft output sub-decoder is to forward state metric α, the unit that backward state measurement β and branch metric γ calculate, wherein, the RAM that comprises twoport is for depositing calculative numeral information, branch metric calculation unit calculates after branch metric, be kept in register, for follow-up forward state metric and the calculating of backward state measurement, the forward state metric calculating and backward state measurement are kept in by register again, calculate forward and backward state measurement as next iteration, the corresponding LLR value of forward-backward algorithm metric calculation output that LLR arithmetic element utilization is simultaneously current are preserved,
Described soft inputting and soft output sub-decoder is supported binary system turbo code and duobinary system turbo code, support sliding window (SW) and two kinds of modes of two-way simultaneous window (PW), support information transmits and two kinds of recursive operation initialization are calculated in training, and according to the pattern of rate adjust window and initialize mode.
4. Turbo decoder according to claim 3, it is characterized in that described address generation module, configuration ARP and QPP interleaver, relation between the parallel address of parallel interleaver utilization, and recurrence relation between adjacent periods address is simplified the calculating of interleaving address; Adopt the mode of calculating in real time, with the address generator of two forward directions; Two backward addresses are obtained by LIFO, or use two reverse address generators.
5. Turbo decoder according to claim 4, is characterized in that, in described soft inputting and soft output sub-decoder, decoding algorithm is as follows:
The position of 2,7 two states of WiMAX is exchanged, and the position of 3,6 two states exchanges, and obtains the grid chart through rearrangement;
According to the situation of two kinds of code state transitions and State-output, the information bit of each state transitions output, the combined situation of check bit, have 16 kinds of branch metrics, and the branch metric calculation formula after conversion is as table 2:
Table 2 bimodulus base-4 decoding branch metric calculation formula
6. Turbo decoder according to claim 5, it is characterized in that, in order further to make hardware resource better multiplexing, the computing formula of base-4 log-likelihood ratio to two-stage system Turbo code converts, and obtains the computational methods of the symbol level posterior probability identical with duobinary system Turbo code; According to the value condition of information bit, be divided into 4 classes by 32 possible state transitions branches in grid chart, in each class, have 8 kinds of situations, then by the posterior probability of forward state metric and backward state measurement and branch metric addition acquisition symbol level;
(3)
By 4 symbol level posterior probability , , , , can calculate the external information of Binary Turbo codes , or the external information of duobinary system Turbo code , , ;
Need to calculate log-likelihood ratio and the external information of bit-level for LTE pattern, will calculate the soft output of symbol level for WiMAX pattern; Under LTE pattern, their computational methods are:
(4)
(5)
Notice , these two were just calculated before Branch Computed tolerance with value, therefore need not recalculate;
Under WiMAX pattern, their computational methods are:
(6)
(7)
Notice , , these three were just calculated before Branch Computed tolerance with value, therefore need not recalculate.
CN201410403907.2A 2014-08-16 2014-08-16 High-performance 16-base Turbo decoder with four degrees of parallelism and compatibility with LTE and WiMAX Pending CN104168032A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410403907.2A CN104168032A (en) 2014-08-16 2014-08-16 High-performance 16-base Turbo decoder with four degrees of parallelism and compatibility with LTE and WiMAX

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410403907.2A CN104168032A (en) 2014-08-16 2014-08-16 High-performance 16-base Turbo decoder with four degrees of parallelism and compatibility with LTE and WiMAX

Publications (1)

Publication Number Publication Date
CN104168032A true CN104168032A (en) 2014-11-26

Family

ID=51911696

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410403907.2A Pending CN104168032A (en) 2014-08-16 2014-08-16 High-performance 16-base Turbo decoder with four degrees of parallelism and compatibility with LTE and WiMAX

Country Status (1)

Country Link
CN (1) CN104168032A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106899313A (en) * 2017-02-27 2017-06-27 中国人民解放军国防科学技术大学 A kind of Turbo code code translator and method for supporting LTE standard
CN107453761A (en) * 2016-05-31 2017-12-08 展讯通信(上海)有限公司 Turbo code interpretation method and Turbo code decoder
CN108270452A (en) * 2016-12-30 2018-07-10 芯原微电子(成都)有限公司 A kind of Turbo decoders and interpretation method
CN109217878A (en) * 2017-06-30 2019-01-15 华为技术有限公司 A kind of data processing equipment and method
CN110299962A (en) * 2018-03-21 2019-10-01 钜泉光电科技(上海)股份有限公司 A kind of Turbo component coder and coding method, Turbo encoder and coding method
CN112202456A (en) * 2020-10-24 2021-01-08 青岛鼎信通讯股份有限公司 Turbo decoding method for broadband power line carrier communication
CN112398487A (en) * 2020-12-14 2021-02-23 中科院计算技术研究所南京移动通信与计算创新研究院 Implementation method and system for reducing complexity of Turbo parallel decoding
CN115883065A (en) * 2022-11-26 2023-03-31 郑州信大华芯信息科技有限公司 Method, device, chip and storage medium for quickly realizing software encryption and decryption based on variable S box

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050010854A1 (en) * 2003-06-26 2005-01-13 Bickerstaff Mark Andrew Unified serial/parallel concatenated convolutional code decoder architecture and method
CN101662673A (en) * 2006-01-03 2010-03-03 三星电子株式会社 Transmitter and system for transmitting/receiving digital broadcasting stream and method thereof
CN102084346A (en) * 2008-07-03 2011-06-01 诺基亚公司 Address generation for multiple access of memory
WO2012111846A1 (en) * 2011-02-18 2012-08-23 Nec Corporation Turbo decoder with qpp or arp interleaver
CN103501210A (en) * 2013-09-30 2014-01-08 复旦大学 High-performance multistandard FEC (Forward Error Correction) decoder
CN103905067A (en) * 2012-12-27 2014-07-02 中兴通讯股份有限公司 Multi-mode decoder realization method and apparatus

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050010854A1 (en) * 2003-06-26 2005-01-13 Bickerstaff Mark Andrew Unified serial/parallel concatenated convolutional code decoder architecture and method
CN101662673A (en) * 2006-01-03 2010-03-03 三星电子株式会社 Transmitter and system for transmitting/receiving digital broadcasting stream and method thereof
CN102084346A (en) * 2008-07-03 2011-06-01 诺基亚公司 Address generation for multiple access of memory
WO2012111846A1 (en) * 2011-02-18 2012-08-23 Nec Corporation Turbo decoder with qpp or arp interleaver
CN103905067A (en) * 2012-12-27 2014-07-02 中兴通讯股份有限公司 Multi-mode decoder realization method and apparatus
CN103501210A (en) * 2013-09-30 2014-01-08 复旦大学 High-performance multistandard FEC (Forward Error Correction) decoder

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
XUBIN CHEN 等: "A 691 Mbps 1.392mm2 Configurable Radix-16 Turbo Decoder ASIC for 3GPP-LTE and WiMAX Systems in 65nm CMOS", 《SOLID-STATE CIRCUITS CONFERENCE(A-SSCC),2013 IEEE ASIAN》 *
陈绪斌 等: "高性能并行Turbo译码器的VLSI设计", 《计算机工程》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107453761A (en) * 2016-05-31 2017-12-08 展讯通信(上海)有限公司 Turbo code interpretation method and Turbo code decoder
CN108270452A (en) * 2016-12-30 2018-07-10 芯原微电子(成都)有限公司 A kind of Turbo decoders and interpretation method
CN106899313A (en) * 2017-02-27 2017-06-27 中国人民解放军国防科学技术大学 A kind of Turbo code code translator and method for supporting LTE standard
CN109217878A (en) * 2017-06-30 2019-01-15 华为技术有限公司 A kind of data processing equipment and method
CN109217878B (en) * 2017-06-30 2021-09-14 重庆软维科技有限公司 Data processing device and method
CN110299962A (en) * 2018-03-21 2019-10-01 钜泉光电科技(上海)股份有限公司 A kind of Turbo component coder and coding method, Turbo encoder and coding method
CN110299962B (en) * 2018-03-21 2022-10-14 钜泉光电科技(上海)股份有限公司 Turbo component encoder and encoding method, turbo encoder and encoding method
CN112202456A (en) * 2020-10-24 2021-01-08 青岛鼎信通讯股份有限公司 Turbo decoding method for broadband power line carrier communication
CN112398487A (en) * 2020-12-14 2021-02-23 中科院计算技术研究所南京移动通信与计算创新研究院 Implementation method and system for reducing complexity of Turbo parallel decoding
CN112398487B (en) * 2020-12-14 2024-06-04 中科南京移动通信与计算创新研究院 Implementation method and system for reducing Turbo parallel decoding complexity
CN115883065A (en) * 2022-11-26 2023-03-31 郑州信大华芯信息科技有限公司 Method, device, chip and storage medium for quickly realizing software encryption and decryption based on variable S box
CN115883065B (en) * 2022-11-26 2024-02-20 郑州信大华芯信息科技有限公司 Method, device, chip and storage medium for quickly realizing software encryption and decryption based on variable S box

Similar Documents

Publication Publication Date Title
CN104168032A (en) High-performance 16-base Turbo decoder with four degrees of parallelism and compatibility with LTE and WiMAX
Dizdar et al. A high-throughput energy-efficient implementation of successive cancellation decoder for polar codes using combinational logic
Ilnseher et al. A 2.15 GBit/s turbo code decoder for LTE advanced base station applications
Kim et al. A unified parallel radix-4 turbo decoder for mobile WiMAX and 3GPP-LTE
CN104092470B (en) A kind of Turbo code code translator and method
CN103501210B (en) High-performance multistandard FEC (Forward Error Correction) decoder
Liang et al. Hardware efficient and low-latency CA-SCL decoder based on distributed sorting
CN102523076B (en) Universal and configurable high-speed Turbo code decoding method
JP6022085B2 (en) Method and apparatus for realizing multimode decoder
CN103354483B (en) General high-performance Radix-4SOVA decoder and interpretation method thereof
CN102739261B (en) Multi-additive comparing and selecting forward traceback Viterbi decoder
CN103812510A (en) Decoding method and device
CN101931453A (en) Random sequence-based method for interleave-division multiple-access system
US8032811B2 (en) Efficient almost regular permutation (ARP) interleaver and method
CN102611464B (en) Turbo decoder based on external information parallel update
Ahmed et al. A high throughput turbo decoder VLSI architecture for 3GPP LTE standard
CN105375934A (en) Viterbi decoder aiming at tail-biting convolution code and decoding method
CN103595424A (en) Component decoding method, decoder, Turbo decoding method and Turbo decoding device
CN103905066B (en) Turbo code code translator and method
Murugappa et al. Parameterized area-efficient multi-standard turbo decoder
CN102227097B (en) Generation method of variable length S random interleaver
Chen et al. A 691 Mbps 1.392 mm 2 configurable radix-16 turbo decoder ASIC for 3GPP-LTE and WiMAX systems in 65nm CMOS
CN102571107A (en) System and method for decoding high-speed parallel Turbo codes in LTE (Long Term Evolution) system
CN102832951B (en) Realizing method for LDPC (Low Density Parity Check) coding formula based on probability calculation
Liu et al. Benefit and cost of cross sliding window scheduling for low latency 5G Turbo decoding

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20141126

WD01 Invention patent application deemed withdrawn after publication