CN109672454A

CN109672454A - High speed viterbi coding method and its receiver under a kind of DSP little endian mode

Info

Publication number: CN109672454A
Application number: CN201910058847.8A
Authority: CN
Inventors: 马慧; 王旭; 陈南希; 舒睿俊; 徐景; 张武雄
Original assignee: Shanghai Research Center for Wireless Communications
Current assignee: Shanghai Research Center for Wireless Communications
Priority date: 2019-01-22
Filing date: 2019-01-22
Publication date: 2019-04-23

Abstract

The invention discloses a kind of high speed viterbi coding method and its receivers.The high speed viterbi coding method has carried out the optimization of height to Veterbi decoding method according to the system architecture and instruction feature of dsp processor, assembly instruction used in every single stepping is directly specified using the built-in function of C66x series DSP core, and has comprehensively considered the resource constraint of DSP.Compared with existing DSP Viterbi decoding techniques, the present invention realizes biggish performance boost.

Description

High speed viterbi coding method and its receiver under a kind of DSP little endian mode

Technical field

The present invention relates to a kind of viterbi coding method more particularly to a kind of high speed dimensions run under DSP little endian mode Spy is related to the receiver using the high speed viterbi coding method than interpretation method, belongs to wireless communication technology field.

Background technique

In a wireless communication system, since channel transfer characteristic is undesirable and the presence of noise, it will lead to reception and bring out The mistake of signal is now received, the channel coding for being accordingly used in error correction is particularly important sport technique segment.So-called channel coding, be Transmitting terminal adds redundancy relevant with former data to former data, then is detected and corrected according to this correlation in receiving end The mistake that transmission process generates.

Convolutional code is a kind of channel coding technology widely used in a wireless communication system, is just applied the 1970s In deep space and satellite communication.Wherein, (3,1,7) type convolutional code on error performance and decoding complexity due to achieving preferably Compromise, many communication systems all use the type convolutional code, such as LTE system etc..Viterbi (Viterbi) decoding is convolution The optimal decoding algorithm of code, is widely used in practice.

Viterbi decoding includes calculating path backtracking and path metric two parts.Problem is recalled for path, in the patent No. In Chinese invention patent for ZL 200910008300.3, Huawei Company proposes a kind of enhancing interpretation method, for reducing The operand in enhancing VITERBI decoding algorithm path trace-back process is realized in DSP.The enhancing interpretation method includes following step It is rapid: to obtain backtracking path since unreliable node and that traceback length is equal with predetermined traceback length；By the backtracking Path integration is corresponding decoding result；With the decoding result replacement maximum-likelihood decoding sequence in the backtracking path pair Answer the decoding result of position；The coding sequence of decoding result was replaced in output.

The calculating of path metric needs to complete by Jia-ratio-selection operation.Under normal conditions, calculation amount is much larger than road Diameter backtracking.In the United States Patent (USP) application No. is US19950558745, propose it is a kind of specifically for Viterbi decoding optimization DSP equipment, main thought are the hardware configurations of design specialized to improve the speed of Viterbi decoding.It is dedicated hard due to having used Part, so can not apply it on universal DSP.In the United States Patent (USP) application No. is US20020243567, one is proposed Kind modified DSP interpretation method.Its main thought be using special instruction such as vmshl, vitsel, vitadd, vitmax, Vitmin improves the speed of decoding.But these special instructions are not provided on many universal DSP, thus be not available This method.

In order to improve the speed of decoding on universal processor, someone is in x86 series processors by the portion in decoding Divide and calculate offline completion in advance, exchanges the raising of decoding speed for storage resource (referring to Phil Karn and Matthias P.Braendli, the KA9Q's FEC library. [Online] .Available:https: //github.com/ Opendigitalradio/ka9q-fec, Feb.2012).But this method realized only for x86 series processors it is excellent Change.In embedded systems, the use of DSP is more extensive.DSP and x86 series processors have biggish difference in structure.? Viterbi decoding is carried out on DSP, needs to optimize relevant algorithm according to the system architecture and instruction feature of DSP.

Summary of the invention

Primary technical problem to be solved by this invention is to provide the high speed Viterbi decoding under a kind of DSP little endian mode Method.

Another technical problem to be solved by this invention is to provide a kind of using above-mentioned high speed viterbi coding method Receiver.

To achieve the goals above, the present invention uses following technical solutions:

According to a first aspect of the embodiments of the present invention, a kind of high speed viterbi coding method is provided, C66x series DSP is based on It verifies now, wherein the built-in function using C66x series DSP core directly specifies assembly instruction used in every single stepping.

Wherein more preferably, the high speed viterbi coding method is run under the little endian mode of C66x series DSP core.

Wherein more preferably, the built-in function include but is not limited to _ amem8, _ dadd, _ loll, _ hill, _ maxu4, _ cmpeq4、_packh4、_packl4、_minu4、_packlh2、_sub4。

Wherein more preferably, the high speed viterbi coding method further comprises following steps:

1) cumulative metric array cum_metrics is reset, data counter counter clearing will have been read；

2) three continuous input datas are read in, the increment local_metric_0 and local_ of path metric are obtained Offset of the metric_1 in look-up table lookup_0 and lookup_1；Counter counter adds 3；

3) when k is even number, octette, successively using built-in function _ amem8 of C66x, _ dadd, _ loll and _ hill Calculate new value update_metrics0 [k] of possibility of cumulative metrics；

4) when k is odd number, octette, successively using built-in function _ amem8 of C66x, _ dadd, _ loll and _ hill Calculate new value update_metrics0 [k] of possibility of cumulative metrics；

5) when k is even number, octette, successively using built-in function _ amem8 of C66x, _ dadd, _ loll and _ hill Calculate new value update_metrics1 [k] of possibility of cumulative metrics；

6) when k is odd number, octette, successively using built-in function _ amem8 of C66x, _ dadd, _ loll and _ hill Calculate new value update_metrics1 [k] of possibility of cumulative metrics；

7) k be even number when, tetrad, using built-in function _ maxu4 of C66x compare update_metrics0 [k] and Maximum value in update_metrics1 [k]；

8) k be odd number when, tetrad, using built-in function _ maxu4 of C66x compare update_metrics0 [k] and Maximum value in update_metrics1 [k]；

9) when k is even number, tetrad records survivor path using built-in function _ cmpeq4 of C66x；

10) when k is odd number, tetrad records survivor path using built-in function _ cmpeq4 of C66x；

11) when k is all several, octette arranges accumulation degree using the built-in function _ packh4 and _ packl4 of C66x The format of amount；

12) it when k is all several, is searched for using built-in function _ minu4 of C66x, _ packlh2, _ packl4 and _ packh4 The minimum value of cumulative metric；

13) when k is all several, tetrad subtracts minimum using built-in function _ sub4 of C66x from cumulative metric Value；

14) when k is all several, tetrad stores cumulative metric；

15) whether counter is judged less than 3 × (N+6), and wherein N is the data length before convolution coding；If It is to be transferred to step 2), otherwise, carries out step 16)；

16) most deep path backtracking is carried out, transmitting data are estimated；

Wherein, in step 3) to step 14), k is state index；

Wherein, in step 3) and step 5), " when k is even number, octette " refers to state index k according to such as following table Lattice grouping

Wherein, in step 4) and step 6), " when k is odd number, octette " refers to state index k according to such as following table Lattice grouping:

Group number	State index
		1st group	K=1,3,5,7,9,11,13,15
2nd group	K=17,19,21,23,25,27,29,31
		3rd group	K=33,35,37,39,41,43,45,47
4th group	K=49,51,53,55,57,59,61,63

Wherein, in step 7) and step 9), " when k is even number, tetrad " refers to state index k according to such as following table Lattice grouping:

Group number	State index
		1st group	K=0,2,4,6
2nd group	K=8,10,12,14
		3rd group	K=16,18,20,22
4th group	K=24,26,28,30
		5th group	K=32,34,36,38
6th group	K=40,42,44,46
		7th group	K=48,50,52,54
8th group	K=56,58,60,62

Wherein, in step 8) and step 10), " when k is odd number, tetrad " refers to state index k according to as follows Table packets:

Group number	State index
		1st group	K=1,3,5,7
2nd group	K=9,11,13,15
		3rd group	K=17,19,21,23
4th group	K=25,27,29,31
		5th group	K=33,35,37,39
6th group	K=41,43,45,47
		7th group	K=49,51,53,55
8th group	K=57,59,61,63

Wherein, in step 11), " when k is all several, octette " refers to state index k according to following table point Group:

Group number	State index
		1st group	K=0,1,2,3,4,5,6,7
2nd group	K=8,9,10,11,12,13,14,15
		3rd group	K=16,17,18,19,20,21,22,23
4th group	K=24,25,26,27,28,29,30,31
		5th group	K=32,33,34,35,36,37,38,39
6th group	K=40,41,42,43,44,45,46,47
		7th group	K=48,49,50,51,52,53,54,55
8th group	K=56,57,58,59,60,61,62,63

Wherein, in step 13) and step 14), " when k is all several, tetrad " refers to state index k according to such as Lower table packets:

Group number	State index	Group number	State index
				1st group	K=0,1,2,3	9th group	K=32,33,34,35
2nd group	K=4,5,6,7	10th group	K=36,37,38,39
				3rd group	K=8,9,10,11	11st group	K=40,41,42,43
4th group	K=12,13,14,15	12nd group	K=44,45,46,47
				5th group	K=16,17,18,19	13rd group	K=48,49,50,51
6th group	K=20,21,22,23	14th group	K=52,53,54,55
				7th group	K=24,25,26,27	15th group	K=56,57,58,59
8th group	K=28,29,30,31	16th group	K=60,61,62,63

。

Wherein more preferably, when calculating update_metrics0 or update_metrics1,

The local_metric_0 [x] or local_ of 8 states are read from memory first with built-in function _ amem8 The cum_metrics [y] of metric_1 [x] and 8 states；Wherein, y indicates the element index of cum_metrics；

In the case where calculating update_metrics0, x indicates the element index of local_metric_0；It is calculating In the case where update_metrics1, x indicates that the element index of local_metric_1, the value of x and y see the table below；

Cum_metrics is added with local_metric_0 or local_metric_1 using built-in function _ dadd, is obtained To update_metrics0 or update_metrics1；

Low 32 that result is finally taken out using built-in function _ loll take out the high by 32 of result using built-in function _ hill Position.

Wherein more preferably, it in the step 11), if the cumulative metric that index is even number is even, indexes as the tired of odd number Product metric is odd；

Built-in function _ packh4 and built-in function _ packl4 are successively used to odd and even, respectively obtain result temp_ A and temp_b；

Built-in function _ packl4 and built-in function _ packh4 are successively used to result temp_a and temp_b, obtained continuous The cumulative metric of indexed format.

Wherein more preferably, it in the step 12), is stored in register min_metric equipped with 4 cumulative metrics；

Built-in function _ packlh2 is used to min_metric and min_metric, obtains result min_metric2；

Built-in function _ minu4 is used to min_metric and min_metric2, obtained result is still stored in deposit In device min_metric, the value of min_metric is copied in register min_metric2；

Built-in function _ packl4 is used to min_metric and min_metric, obtained result is still stored in deposit In device min_metric；

Built-in function _ packh4 is used to min_metric2 and min_metric2, obtained result, which is still stored in, posts In storage min_metric2；

Built-in function _ minu4 is used to min_metric and min_metric2, obtains final result.

According to a second aspect of the embodiments of the present invention, a kind of receiver, including down conversion module, analog-to-digital conversion module are provided And digital signal processing module, wherein the digital signal processing module includes C66x series DSP core, using above-mentioned high speed Viterbi coding method carries out Viterbi decoding.

Compared with existing DSP Viterbi decoding techniques, one side of high speed viterbi coding method provided by the present invention Assembly instruction used in every single stepping is directly specified using the built-in function technology of C66x series DSP core in face；On the other hand, The technical characterstic of the single instruction stream multiple data stream of C66x series DSP core is made full use of, and has comprehensively considered the resource constraint of DSP, Thus obtain biggish performance boost.

Detailed description of the invention

Fig. 1 is the operational flow diagram of high speed viterbi coding method provided by the present invention after completing initialization；

Fig. 2 is the arrangement schematic diagram of 8 local_metric_0 data in the register bank under little endian mode；

Fig. 3 is the arrangement schematic diagram of 8 cum_metrics data in the register bank under little endian mode；

Fig. 4 is the result schematic diagram that above-mentioned two register group is added；

Fig. 5 is the data format schematic diagram in temp_a after instructing by PACKH4；

Fig. 6 is the data format schematic diagram in temp_b after instructing by PACKL4；

Fig. 7 is the data format schematic diagram in cum_metrics0_3 after instructing by PACKL4；

Fig. 8 is the data format schematic diagram in cum_metrics4_7 after instructing by PACKH4；

Fig. 9 is the data format schematic diagram in min_metric2；

Figure 10 is the data format schematic diagram in min_metric；

Figure 11 is the data format schematic diagram in final min_metric；

Figure 12 is the receiver module schematic diagram using above-mentioned high speed viterbi coding method；

Figure 13 is the bit error rate performance ratio of high speed viterbi coding method provided by the present invention and floating-point viterbi algorithm Compared with schematic diagram.

Specific embodiment

Technology contents of the invention are described in further detail in the following with reference to the drawings and specific embodiments.

Digital signal is widely used in many wireless communication systems due to strong antijamming capability.Number Signal processing (Digital Signal Processing, be abbreviated as DSP) both can be used at the x86 series on common computer Device is managed, dsp processor (Digital Signal Processor, abbreviation exclusively for Embedded System Design also can be used It is similarly DSP).In the present invention, if explanation is not added, described DSP refers both to dsp processor.

TI company (Texas Instruments, Texas Instruments) is the maximum dsp processor production in the current whole world Quotient.C66x type DSP core is the newest universal DSP core of TI company, is widely used in a plurality of DSP products of TI company, such as TMS320C6670, TMS320C6672, TMS320C6674, TMS320C6678 etc..In addition, C66x type DSP core is also answered extensively In the KeyStone II type product family of new generation DSP+ARM framework, as TCI6638K2K, TCI6630K2L, 66AK2G12,66AK2L06,66AK2H14 etc..They use the more core architectures of KeyStone, while chip interior is integrated with The peripheral hardwares such as RapidIO, gigabit Ethernet and EDMA and a large amount of hardware accelerator, can be widely used in communication, radar, sonar Equal fields.About the further explanation of C66x series DSP, a series of technical documentations that can be provided refering to TI company, such as " TMS320C66x DSP CPU and Instruction Set Reference Guide " (No.SPRUGH7, November 2010) etc..Built-in function (intrinsics) is a kind of code generation that TI company provides.It is in traditional Inline Function (inline) assembly instruction used on the basis of further, is directly specified in source code.Different processor hardwares Platform has different built-in functions, it is thus possible to play the max calculation performance per a processor, have very high efficiency. Built-in function is usually one-to-one with assembly instruction.This enable DSP engineer be accurately controlled generated two into Processing procedure sequence.After built-in function, compiler need to only distribute register used.

In one embodiment of the invention, a kind of (3,1,7) type convolutional code dimension by height optimization is provided firstly Spy is than interpretation method (abbreviation high speed viterbi coding method).System architecture and instruction feature of this method according to DSP, one side Compilation used in every single stepping is directly specified to refer to using built-in function (intrinsics) technology of C66x series DSP core It enables, distributes register used in each assembly instruction as compiler；On the other hand, singly referring to for C66x series DSP core is made full use of The technical characterstic of stream multiple data stream (SIMD) is enabled, and comprehensively considers the resource constraint of DSP.With existing Viterbi decoding techniques phase Compare, this high speed viterbi coding method obtains biggish performance boost.This high speed viterbi coding method is particularly suitable for (i.e. the high byte of data is stored in the high address of memory the little endian mode of C66x series DSP core, and the low byte of data is stored in In the low address of memory) under run.Test on C6678 development board shows to translate by this high speed Viterbi of height optimization The decoding speed of code method has reached 1.776 times of official, TI company speed.

In the following, detailed specific description is unfolded in the specific implementation step to this high speed viterbi coding method.

Firstly, this high speed viterbi coding method needs first to be initialized, Viterbi decoding then could be started.It is holding When row step 2), two look-up tables lookup_0 and lookup_1 are needed.The initialization when DSP is powered on of the two look-up tables, and It is stored in memory.It is directly read from memory when followed by Viterbi decoding.Due to initialization only when powering on into Row is primary, so without optimizing to initialization step.

As shown in Figure 1, the detailed process of this high speed viterbi coding method after initialization is complete includes the following steps:

1) cumulative metric array cum_metrics is reset, data counter counter clearing will have been read.

2) three continuous input datas are read in.Thus local_metric_0 and local_metric_1 is obtained to search Offset in table lookup_0 and lookup_1.Gauge outfit is later corresponding to present input data plus this offset The position of local_metric_0 and local_metric_1 in a lookup table.At this point, counter counter adds 3.

3) octette successively calculates update_metrics0 [k], k=0,2 ... when state index k is even number, 62.Specifically, state index k=0 is calculated first, 2,4,6,8,10,12,14；Then calculating state index k=16,18, 20,22,24,26,28,30；Next calculating state index k=32,34,36,38,40,42,44,46；Finally calculate state rope Draw k=48,50,52,54,56,58,60,62.

4) octette successively calculates update_metrics0 [k], k=1,3 ... when state index k is odd number, 63, specifically, state index k=1 is calculated first, 3,5,7,9,11,13,15；Then calculating state index k=17,19, 21,23,25,27,29,31；Next calculating state index k=33,35,37,39,41,43,45,47；Finally calculate state rope Draw k=49,51,53,55,57,59,61,63.

5) octette successively calculates update_metrics1 [k], k=0,2 ... when state index k is even number, 62, the specific same step 3) of calculating process.

6) octette successively calculates update_metrics1 [k], k=1,3 ... when state index k is odd number, 63, the specific same step 4) of calculating process.

7) tetrad, k compare in update_metrics0 [k] and update_metrics1 [k] most when being even number Big value, k=0,2 ..., 62.Specifically, being operated first to state index k=0,2,4,6；Then to state index k= 8,10,12,14 are operated；Next state index k=16,18,20,22 is operated；Then to state index k= 24,26,28,30 are operated；And then state index k=32,34,36,38 is operated；Subsequently to state index k =40,42,44,46 are operated；Next state index k=48,50,52,54 is operated；Finally to state index K=56,58,60,62 are operated.

8) tetrad, k compare in update_metrics0 [k] and update_metrics1 [k] most when being odd number Big value, k=1,3 ..., 63.Specifically, being operated first to state index k=1,3,5,7；Then to state index k= 9,11,13,15 are operated；Next state index k=17,19,21,23 is operated；Then to state index k= 25,27,29,31 are operated；And then state index k=33,35,37,39 is operated；Subsequently to state index k =41,43,45,47 are operated；Next state index k=49,51,53,55 is operated；Finally to state index K=57,59,61,63 are operated.

9) tetrad, survivor path when record k is even number, k=0,2 ..., 62, the grouping situation of state index k is same Step 7).

10) tetrad, survivor path when record k is odd number, k=1,3 ..., 63, the grouping situation of state index k Same step 8).

11) octette, the format of arrangement cumulative metric, k=0,1,2,3 ..., 63, are become just when k is all several Normal continuity index format.Specifically, being operated first to state index k=0,1,2,3,4,5,6,7；Then to state K=8 is indexed, 9,10,11,12,13,14,15 are operated；Next to state index k=16,17,18,19,20,21,22, 23 are operated；Then state index k=24,25,26,27,28,29,30,31 is operated；And then to state index k =32,33,34,35,36,37,38,39 are operated；Subsequently to state index k=40,41,42,43,44,45,46,47 It is operated；Next state index k=48,49,50,51,52,53,54,55 is operated；Finally to state index k =56,57,58,59,60,61,62,63 are operated.

12) minimum value of cumulative metric when search k is all several.Specific search process is detailed in hereinafter furtherly It is bright.

13) tetrad, k subtract minimum value when being all several from cumulative metric, k=0, and 1,2,3 ..., 63.Specifically It says, state index k=0,1,2,3 is operated first；Then state index k=4,5,6,7 is operated；Next right State index k=8,9,10,11 are operated；Then state index k=12,13,14,15 is operated；And then to shape State indexes k=16, and 17,18,19 are operated；Subsequently state index k=20,21,22,23 is operated；Next State index k=24,25,26,27 is operated；Then state index k=28,29,30,31 is operated；Next State index k=32,33,34,35 is operated；Then state index k=36,37,38,39 is operated；And then State index k=40,41,42,43 is operated；Subsequently state index k=44,45,46,47 is operated；It connects again Get off and state index k=48,49,50,51 is operated；Then state index k=52,53,54,55 is operated；Tightly Then state index k=56,57,58,59 is operated；Finally state index k=60,61,62,63 is operated.

14) tetrad, k store cumulative metric when being all several.The same step 13) of grouping situation of state index k.

15) judge whether to have read in all input datas.Assuming that there is N number of data before convolution coding.So passing through It crosses (3,1,7) type convolution coding and becomes 3 × (N+6) a data later.If the value of counter counter is less than 3 × (N+ 6), then illustrate to be transferred to step 2) there are also the input data that do not read in.Otherwise, then step 16) is carried out down.

16) most deep path backtracking is carried out, transmitting data are estimated.

In this high speed viterbi coding method, since the calculation amount of path backtracking is little, step 16) can be using tradition C language realize.

In the following, some technical details of this high speed viterbi coding method in implementation process are further described:

1. the quantization of likelihood ratio

The likelihood ratio (llr) for receiving signal is the input data of viterbi decoder.With common viterbi decoder one Sample uses 4 bit quantizations in one embodiment of the present of invention.

2. the foundation of look-up table

Updating cumulative metric is one of committed step of Viterbi decoding, and core operation is as follows:

Local_metric_0 [k]=prev_state_0 [k] [0] * llr [i*3]+prev_state_0 [k] [1] * llr [i*3+1]+prev_state_0[k][2]*llr[i*3+2]；

Update_metrics0 [k]=cum_metrics [k/2]+local_metric_0 [k]；

Local_metric_1 [k]=prev_state_1 [k] [0] * llr [i*3]+prev_state_1 [k] [1] * llr [i*3+1]+prev_state_1[k][2]*llr[i*3+2]；

Update_metrics1 [k]=cum_metrics [k/2+32]+local_metric_1 [k]；

Wherein, local_metric_0, local_metric_1 are the accumulations calculated according to current input data The increment of measurement；Prev_state_0, prev_state_1 are two groups of constants, and value is -1 or 1, can pass through (3,1,7) type The generator polynomial of convolutional code precomputes；K is an index, and the positive integer that value is 0 to 63 represents (3,1,7) type 64 kinds of decoded states of convolutional code；Llr is likelihood ratio, i.e. the input data of viterbi decoder；I is the index of input data； Update_metrics0, update_metrics1 are the updated two kinds of possible values of cumulative metric；On cum_metrics is Secondary cumulative metric.

For input data llr [i*3], llr [i*3+1], llr [i*3+2], by local_metric_0 and local_ The value of metric_1 is computed in advance and is stored on a look-up table.When carrying out Viterbi decoding, reading llr first [i*3], llr [i*3+1] and llr [i*3+2], then accordingly in a lookup table by local_metric_0 and local_ The value of metric_1 directly reads out, thus eliminates the mistake for calculating local_metric_0 and local_metric_1 Journey.(3,1,7) type convolutional code has 64 kinds of decoded states, so one group of input data llr [i*3], llr [i*3+1], llr [i*3+ 2] 64 local_metric_0 and local_metric_1 have been corresponded to.

When updating cumulative metric, need to read 3 continuous likelihood ratios every time.Each likelihood ratio is 4 bit quantizations, There are 16 kinds of possible values.(3,1,7) type convolutional code one shares 64 kinds of decoded states.So the size of each look-up table is 64×16×16×16。

Viterbi decoding algorithm is substantially maximum likelihood (ML) decoding algorithm of convolutional code.It is stored in look-up table current The probability of transmitter transmitting 0 or transmitting 1 on bit respectively corresponds transmission letter so decoder needs to establish two look-up tables It number is 0 and 1 two kinds of situations.If the corresponding look-up table of local_metric_0 is corresponding lookup_0, local_metric_1 Look-up table lookup_1.The initialization when DSP is powered on of the two look-up tables, and be stored in memory.Followed by Wei Te It is directly read from memory when than decoding.

In the present invention, local_metric_0, local_metric_1 are indicated with 8 bits, i.e. 1 byte.C66x The assembly instruction LDDW of series DSP once can read 8 bytes from memory.In order to reduce the expense for reading look-up table, this hair By one group of input data llr [i*3], llr [i*3+1], corresponding 64 local_ of llr [i*3+2] in bright embodiment The value of metric_0 is continuously placed in memory, thus can once read the local_metric_0 of 8 states.In order to Enough continuous values for reading local_metric_0, need the storage order according to following rule adjustment look-up table:

If k is odd number, the value of local_metric_0 [k] is located at (k-1)/2+32 position of look-up table； If k is even number, the value of local_metric_0 [k] is located at kth/2 position of look-up table；Wherein, k is status number.

Local_metric_1 is handled using same method.

In dsp, need to specify the storage location of each variable by hand.In order to accelerate to read the speed of look-up table, need Look-up table is placed in the caching carried on DSP core.In view of the length of two look-up tables is bigger, the embodiment of the present invention It is middle that the two look-up tables are placed in the 4MB multicore shared buffer memory of DSP.

3. the calculating based on nonnegative number

Many SIMD assembly instructions of C66x series DSP core all only support nonnegative number, such as calculate the assembly instruction of maximum value The assembly instruction MINU4 of MAXU 4 and calculated minimum.In order to avoid carrying out data type conversion, the present invention in calculating process Embodiment in without exception use signless integer.But original viterbi algorithm is designed according to signed number, this is just needed It to modify to original viterbi algorithm.

In an embodiment of the present invention, in the expression formula of local_metric_0 and local_metric_1, input data Llr uses 4 bit quantizations, and value is [- 8,7].The value of prev_state_0 and prev_state_1 is -1 or 1.So The value range of local_metric_0 and local_metric_1 is [- 24,21].In order to be changed into nonnegative number, Ke Yijia A upper biasing 24, i.e.,

Local_metric_0 [k]=prev_state_0 [k] [0] * llr [i*3]+prev_state_0 [k] [1] * llr [i*3+1]+prev_state_0[k][2]*llr[i*3+2]+24；

Local_metric_1 [k]=prev_state_1 [k] [0] * llr [i*3]+prev_state_1 [k] [1] * llr [i*3+1]+prev_state_1[k][2]*llr[i*3+2]+24；

For the sake of unification, input data llr is also become into nonnegative number in the embodiment of the present invention.If modified input Data llr ' expression, then

Llr ' [i]=llr [i]+8

Therefore, when being calculated using nonnegative number, the expression formula of local_metric_0 and local_metric_1 are such as Under:

Local_metric_0 [k]=24+prev_state_0 [k] [0] * llr ' [i*3]+prev_state_0 [k] [1]*llr’[i*3+1]+prev_state_0[k][2]*llr’[i*3+2]-8*(prev_state_0[k][0]+prev_ state_0[k][1]+prev_state_0[k][2])；

Local_metric_1 [k]=24+prev_state_1 [k] [0] * llr ' [i*3]+prev_state_1 [k] [1]*llr’[i*3+1]+prev_state_1[k][2]*llr’[i*3+2]-8*(prev_state_1[k][0]+prev_ state_1[k][1]+prev_state_1[k][2])。

4. calculating update_metrics

It establishes after look-up table, so that it may formally start Viterbi decoding.The exemplary steps of Viterbi decoding can be summarized For " adding, ratio, choosing ".Wherein the first step be calculate two groups of cumulative metric may new value update_metrics0 and update_ metrics1。

In an embodiment of the present invention, cum_metrics, update_metrics0, update_metrics1 use 8 ratios Spy indicates.C66x series DSP core is 32 bit DSPs, and each register has 32, so a register can indicate 4 shapes State realizes the parallel computation of 4 states.

Specifically, (corresponding to assembly instruction first with built-in function _ amem8 when calculating update_metrics0 LDDW the local_metric_0 and cum_metrics of 8 states) are read respectively from memory, then using built-in function _ Dadd (corresponding to assembly instruction DADD) is added, to obtain the update_metrics0 of 8 states.

Such as: for update_metrics0 [0], update_metrics0 [2], update_metrics0 [4], update_metrics0[6]、update_metrics0[8]、update_metrics0[10]、update_metrics0 [12], [14] update_metrics0,

Code sample is as follows:

Uint64_t update0=_dadd (_ amem8 (&m0_ptr [x]), _ amem8 (&cum_metrics [y]))；

Uint32_t update0_even0_6=_loll (update0)；

Uint32_t update0_even8_14=_hill (update0)；

_ amem8 is the built-in function that TI company provides, and effect is that compiler is told to use assembly instruction LDDW. This assembly instruction once reads the data of 8 bytes to a register group from memory.Memory address must be alignment, It can be divided exactly by 8.Register group must be two continuous 32 bit registers and it is necessary to it is even numbered register number is small, Odd-numbered register number is big, such as A0 and A1, B2 and B3.

M0_ptr is the pointer of a direction look-up table lookup_0, and direction is corresponding to present input data local_metric_0.X and y is two integers relevant to state index k.Herein, x=0, y=0.According to above right The discussion of local_metric_0 storage order, _ amem8 (&m0_ptr [0]) it can read following data to register group: local_metric_0[0]、local_metric_0[2]、local_metric_0[4]、local_metric_0[6]、 local_metric_0[8],local_metric_0[10],local_metric_0[12],local_metric_0[14].? Under little endian mode, the arrangement of this 8 data in the register bank is as shown in Fig. 2, wherein local_0 is local_metric_0 It writes a Chinese character in simplified form.

_ amem8 (&cum_metrics [0]) read cum_metrics [0], cum_metrics [1], cum_metrics [2]、cum_metrics[3]、cum_metrics[4]、cum_metrics[5]、cum_metrics[6]、cum_metrics [7] register group is arrived.Under little endian mode, the arrangement of this 8 data in the register bank is as shown in figure 3, wherein cum is cum_ Metrics's writes a Chinese character in simplified form.

Built-in function _ dadd tells compiler to be added two register groups using assembly instruction DADD, each register Group is 64.After addition, obtained result is as shown in figure 4, wherein update_0 is writing a Chinese character in simplified form for update_metrics0.

Built-in function _ dadd's is the result is that 64.In C66x series DSP core, this result is placed on two registers In.For the subsequent convenience used, low 32 for taking out result with built-in function _ loll are named as update0_even0_6； The 32 high of result is taken out with built-in function _ hill, is named as update0_even8_14.

It is the update_metrics0 [k] and update_metrics1 [k] of other values for k, it can be with similar Method calculates.The value of x and y is shown in Table 1.

The value of x and y needed for table 1 calculates update_metrics

update_metrics	State index k	x	y
				update_metrics0	0,2,4,6,8,10,12,14	0	0
update_metrics0	16,18,20,22,24,26,28,30	1	2
				update_metrics0	32,34,36,38,40,42,44,46	2	4
update_metrics0	48,50,52,54,56,58,60,62	3	6
				update_metrics0	1,3,5,7,9,11,13,15	4	0
update_metrics0	17,19,21,23,25,27,29,31	5	2
				update_metrics0	33,35,37,39,41,43,45,47	6	4
update_metrics0	49,51,53,55,57,59,61,63	7	6
				update_metrics1	0,2,4,6,8,10,12,14	0	8
update_metrics1	16,18,20,22,24,26,28,30	1	10
				update_metrics1	32,34,36,38,40,42,44,46	2	12
update_metrics1	48,50,52,54,56,58,60,62	3	14
				update_metrics1	1,3,5,7,9,11,13,15	4	8
update_metrics1	17,19,21,23,25,27,29,31	5	10
				update_metrics1	33,35,37,39,41,43,45,47	6	12
update_metrics1	49,51,53,55,57,59,61,63	7	14

5. comparing update_metrics

Next, comparing the maximum value in update_metrics0 [k] and update_metrics1 [k], and made with this For the updated value of cumulative metric.

When relatively, built-in function _ maxu4 (correspond to assembly instruction MAXU4) can use to 4 states Update_metrics0 [k] and update_metrics1 [k] are compared.

For example, work as k=0, when 2,4,6, code sample is as follows:

Uint32_teven0_6=_maxu4 (update0_even0_6, update1_even0_6)；

Wherein, update0_even0_6 is the update_metrics0 [k], k=0,2,4,6 that front is acquired. Update1_even0_6 is the update_metrics1 [k], k=0,2,4,6 that front is acquired.Built-in function _ maxu4 uses remittance Compile command M AXU4.This is instructed 32 bit registers as the data for being 48 bits, and according to this format to 4 data Respectively maximizing [5].Even0_6 is updated cumulative metric value, k=0,2,4,6.It, can for other cumulative metrics To be acquired with similar method.

When path is recalled, need to know that update_metrics0 [k] or update_metrics1 [k] are selected For survivor path.At this point it is possible to compare update_ using built-in function _ cmpeq4 (corresponding to assembly instruction CMPEQ4) Whether metrics1 [k] and max (update_metrics0 [k], update_metrics1 [k]) are equal.If max (update_metrics0 [k], update_metrics1 [k]) is equal with update_metrics1 [k], then illustrates update_ Greatly than update_metrics0 [k], otherwise update_metrics1 [k] is than update_metrics0 [k] by metrics1 [k] It is small.

For example, work as k=0, when 2,4,6, code sample is as follows:

Paths_ptr [0]=_ cmpeq4 (even0_6, update1_even0_6)；

Wherein, paths_ptr is a pointer, is directed toward the group address of storing path information.Built-in function _ cmpeq4 makes With assembly instruction CMPEQ4.This is instructed 32 bit registers as the data for being 48 bits, and according to this format to 4 Whether data are respectively compared equal.For other cumulative metrics, can be acquired with similar method.

6. arranging the format of cumulative metric

In the cumulative metric acquired in front, continuously storing together for even number, such as k=0,2,4,6 are indexed.Index is Odd number continuously stores together, such as k=1,3,5,7.Here become normal continuity index format, such as k=0,1,2, 3,4,5,6,7.

If the cumulative metric that index is even number is even, indexing as the cumulative metric of odd number is odd.Arrange format when It waits, it is (corresponding using built-in function _ packh4 (corresponding to assembly instruction PACKH4) and built-in function _ packl4 to odd and even In assembly instruction PACKL4), result temp_a and temp_b are respectively obtained.Then, built-in letter is used to temp_a and temp_b Number _ packl4 (corresponding to assembly instruction PACKL4) and built-in function _ packh4 (corresponding to assembly instruction PACKH4), is connected The cumulative metric of continuous indexed format.

For example, working as state index k=0, when 1,2,3,4,5,6,7, code sample is as follows:

Uint32_t temp_a=_packh4 (odd1_7, even0_6)；

Uint32_t temp_b=_packl4 (odd1_7, even0_6)；

Uint32_t cum_metrics0_3=_packl4 (temp_a, temp_b)；

Uint32_t cum_metrics4_7=_packh4 (temp_a, temp_b)；

Built-in function _ packh4 uses assembly instruction PACKH4, built-in function _ packl4 to use assembly instruction PACKL4. This is two assembly instructions dedicated for arranging format.

After instructing by PACKH4, the data format in temp_a is as shown in Figure 5.

After instructing by PACKL4, the data format in temp_b is as shown in Figure 6.

After instructing by PACKL4, the data format in cum_metrics0_3 is as shown in Figure 7.

After instructing by PACKH4, the data format in cum_metrics4_7 is as shown in Figure 8.

For other cumulative metrics, can be acquired with similar method.

7. preventing from overflowing

In an embodiment of the present invention, cum_metrics is indicated with 8 bits.In order to avoid cumulative later more than 8 bits It indicates range, needs to cut its common portion after having updated cum_metrics every time.This is because Viterbi decoding is only Pay close attention to the comparison of two measurement sizes.Specific practice is as follows: the minimum value in 64 cum_metrics is acquired, then to all Cum_metrics subtracts minimum value.After adjustment, the minimum value in cum_metrics array becomes 0.

In the following, first looking for the minimum value of cumulative metric.It, can be with due to storing 4 cumulative metric values in each register Array cum_metrics is operated repeatedly using built-in function _ minu4 (corresponding to assembly instruction MINU4), thus by minimum value Search area narrow down to 4 cumulative metrics, and be all stored in a register min_metric.

Uint32_t min_metric=_minu4 (cum_metrics0_3, cum_metrics4_7)；

Built-in function _ minu4 uses assembly instruction MINU4.It is 48 bits that 32 bit registers are treated as in this instruction Data, and minimize respectively according to this format to 4 data.Similarly, the register for having cumulative metric to remaining is anti- Multiple operation.

Next, the search area of minimum value is narrowed down to 2 cumulative metrics.When search, to min_metric With min_metric using built-in function _ packlh2 (corresponding to assembly instruction PACKLH2), result min_metric2 is obtained. Then to min_metric and min_metric2 using built-in function _ minu4 (corresponding to assembly instruction MINU4), obtained knot Fruit is still stored in register min_metric.

Code sample is as follows:

Min_metric2=_packlh2 (min_metric, min_metric)；

Min_metric=_minu4 (min_metric, min_metric2)；

Built-in function _ packlh2 uses assembly instruction PACKLH2, this is one and refers to dedicated for the compilation of form collator It enables.Data format in min_metric2 is as shown in figure 9, wherein min0, min1, min2, min3 are in min_metric respectively 4 cumulative metric values to be searched of storage.

MINU4 instruction after, the data format in min_metric is as shown in Figure 10, wherein min_a=min (min1, Min3), min_b=min (min0, min2).

Next, seeking the minimum value of cumulative metric.The value of min_metric is copied into register min_metric2.It is right Min_metric and min_metric is using built-in function _ packl4 (corresponding to assembly instruction PACKL4), and obtained result is still So it is stored in register min_metric.Min_metric2 and min_metric2 (is corresponded to using built-in function _ packh4 Assembly instruction PACKH4), obtained result is still stored in register min_metric2.To min_metri c and min_ Metric2 obtains the minimum value of cumulative metric using built-in function _ minu4 (corresponding to assembly instruction MINU4).

Code sample is as follows:

Min_metric2=min_metric；

Min_metric=_packl4 (min_metric, min_metric)；

Min_metric2=_packh4 (min_metric2, min_metric2)；

Min_metric=_minu4 (min_metric, min_metric2)；

Finally, the data format in min_metric is as shown in figure 11, wherein minimum=min (min_a, min_b).

Acquiring minimum value is exactly later to adjust cumulative metric.This (can be referred to by built-in function _ sub4 corresponding to compilation Enable SUB4) cum_metrics and minimum are subtracted each other into completion.

For example, work as k=0, when 1,2,3, code sample is as follows:

Cum_metrics [0]=_ sub4 (cum_metrics0_3, min_metric)；

Built-in function _ sub4 uses assembly instruction SUB4.It is 48 bits that this assembly instruction, which treats as 32 bit registers, Data, and 4 data are subtracted each other respectively according to this format.For other cumulative metrics, can be asked with similar method ?.

On this basis, the present invention further provides a kind of novel receivers.In one embodiment shown in Figure 12, The receiver includes down conversion module, analog-to-digital conversion module and digital signal processing module.Wherein, wireless signal is defeated by antenna Enter down conversion module.After down conversion module is handled, corresponding analog-to-digital conversion is completed by analog-to-digital conversion module and is operated, then It inputs digital signal processing module and carries out subsequent processing.Digital signal processing module includes C66x series DSP core, for completing letter Operation, channel equalization operation and Viterbi decoding are estimated in road, wherein being tieed up when carrying out Viterbi decoding using above-mentioned high speed Spy compares interpretation method.

In order to verify the actual effect of high speed viterbi coding method provided by the present invention, inventor develops in C6678 It is tested on plate.It is as shown in table 2 to test environment.

The value of parameters in the test of table 2

Software environment	Code Composer Studio Version 5.5.0.00077
		Hardware environment	TMDSEVM6678LE development board
Convolutional code generator polynomial	0133,0171,0165 (octal system)
		Code rate	1/3
Multiplication coefficient in quantization	3.5
		Channel	Real number awgn channel
Data length before encoding	198

What is tested first is bit error rate performance.This high speed viterbi coding method has carried out 4 bit quantizations to signal is received. In addition, during cumulative metric updates, it is possible to spilling can be generated because of the expression range beyond 8 bits.These because Element is all likely to result in the decline of bit error rate performance.In Figure 13, this high speed viterbi coding method and optimal floating-point are tieed up Spy compares than algorithm.So-called floating-point viterbi algorithm refers to that input data is double-precision floating points (double), and It is all calculated using double-precision floating point during entire decoding.As can see from Figure 13, this high speed viterbi coding method Bit error rate performance loss is very small.

Next, the speed of service of measurement viterbi decoder, concrete outcome are shown in Table 3.As a comparison, giving here Technical documentation " the Viterbi Decoding Techniques for the TMS320C55x DSP of TI company Generation " speed of service of algorithm in (No.SPRA776A, April 2009).It can be seen that this high speed Viterbi decoding The speed of service of method is 1.776 times of official, TI company speed.

The speed of service of 3 viterbi decoder of table

High speed viterbi coding method provided by the present invention and its receiver are described in detail above.To this For the those skilled in the art in field, it is done under the premise of without departing substantially from true spirit any obvious Change will all belong to the protection scope of the invention patent power.

Claims

1. a kind of high speed viterbi coding method is verified existing, it is characterised in that utilize C66x series DSP based on C66x series DSP The built-in function of core directly specifies assembly instruction used in every single stepping.

2. high speed viterbi coding method as described in claim 1, it is characterised in that the high speed viterbi coding method exists It is run under the little endian mode of C66x series DSP core.

3. high speed viterbi coding method as claimed in claim 1 or 2, it is characterised in that the built-in function includes but unlimited In _ amem8, _ dadd, _ loll, _ hill, _ maxu4, _ cmpeq4, _ packh4, _ packl4, _ minu4, _ packlh2, _ sub4。

4. high speed viterbi coding method as claimed in claim 3, it is characterised in that include the following steps:

2) three continuous input datas are read in, the increment local_metric_0 and local_metric_1 of path metric are obtained Offset in look-up table lookup_0 and lookup_1；Counter counter adds 3；

3) when k is even number, octette is successively calculated using built-in function _ amem8 of C66x, _ dadd, _ loll and _ hill New value update_metrics0 [k] of the possibility of cumulative metrics；

4) when k is odd number, octette is successively calculated using built-in function _ amem8 of C66x, _ dadd, _ loll and _ hill New value update_metrics0 [k] of the possibility of cumulative metrics；

5) when k is even number, octette is successively calculated using built-in function _ amem8 of C66x, _ dadd, _ loll and _ hill New value update_metrics1 [k] of the possibility of cumulative metrics；

6) when k is odd number, octette is successively calculated using built-in function _ amem8 of C66x, _ dadd, _ loll and _ hill New value update_metrics1 [k] of the possibility of cumulative metrics；

11) when k is all several, octette arranges cumulative metric using the built-in function _ packh4 and _ packl4 of C66x Format；

12) it when k is all several, is searched for and is accumulated using built-in function _ minu4 of C66x, _ packlh2, _ packl4 and _ packh4 The minimum value of measurement；

13) when k is all several, tetrad subtracts minimum value using built-in function _ sub4 of C66x from cumulative metric；

14) when k is all several, tetrad stores cumulative metric；

15) whether counter is judged less than 3 × (N+6), and wherein N is the data length before convolution coding；If so, turning Enter step 2), otherwise, carries out step 16)；

Wherein, in step 3) to step 14), k is state index；

Wherein, in step 3) and step 5), " when k is even number, octette " refers to state index k according to following table point Group

Group number State index 1st group K=0,2,4,6,8,10,12,14 2nd group K=16,18,20,22,24,26,28,30 3rd group K=32,34,36,38,40,42,44,46 4th group K=48,50,52,54,56,58,60,62

Wherein, in step 4) and step 6), " when k is odd number, octette " refers to state index k according to following table point Group:

Wherein, in step 7) and step 9), " when k is even number, tetrad " refers to state index k according to following table point Group:

Wherein, in step 8) and step 10), " when k is odd number, tetrad " refers to state index k according to following table Grouping:

Wherein, in step 11), " when k is all several, octette " refers to state index k according to following table packets:

Wherein, in step 13) and step 14), " when k is all several, tetrad " refers to state index k according to such as following table Lattice grouping:

。

5. high speed viterbi coding method as claimed in claim 4, it is characterised in that calculate update_metrics0 or When update_metrics1,

In the case where calculating update_metrics0, x indicates the element index of local_metric_0；Calculating update_ In the case where metrics1, x indicates that the element index of local_metric_1, the value of x and y see the table below；

Cum_metrics is added with local_metric_0 or local_metric_1 using built-in function _ dadd, is obtained Update_metrics0 or update_metrics1；

Low 32 that result is finally taken out using built-in function _ loll take out the 32 high of result using built-in function _ hill.

6. high speed viterbi coding method as claimed in claim 4, it is characterised in that in the step 11), if index is even Several cumulative metrics is even, and indexing as the cumulative metric of odd number is odd；

Built-in function _ packh4 and built-in function _ packl4 are successively used to odd and even, respectively obtain result temp_a and temp_b；

Built-in function _ packl4 and built-in function _ packh4 are successively used to result temp_a and temp_b, obtain continuity index The cumulative metric of format.

7. high speed viterbi coding method as claimed in claim 4, it is characterised in that in the step 12), be equipped with 4 accumulations Measurement is stored in register min_metric；

Built-in function _ minu4 is used to min_metric and min_metric2, obtained result is still stored in register In min_metric, the value of min_metric is copied in register min_metric2；

Built-in function _ packl4 is used to min_metric and min_metric, obtained result is still stored in register In min_metric；

Built-in function _ packh4 is used to min_metric2 and min_metric2, obtained result is still stored in register In min_metric2；

8. a kind of receiver, including down conversion module, analog-to-digital conversion module and digital signal processing module, wherein the number Signal processing module uses C66x series DSP core, carries out Viterbi using high speed viterbi coding method described in claim 1 Decoding.

9. receiver as claimed in claim 8, it is characterised in that run under the little endian mode of C66x series DSP core.