Summary of the invention
The invention provides a kind of DFT/IDFT disposal route and processor, to solve at least in the correlation technique when carrying out the DFT/IDFT deal with data, time delay is bigger, and then has compressed the processing time of follow-up link, has brought the problem of great risk to system.
According to an aspect of the present invention, provide a kind of DFT/IDFT disposal route, having comprised: sample sequence to be transformed has been carried out DFT/IDFT according to 12 point transformation handle.
Preferably, sample sequence to be transformed is carried out the DFT/IDFT processing according to 12 point transformation comprise: described sample sequence is done 12 tunnel parallel accumulation process, with the premultiplication of realization with first matrix of coefficients; To do the imaginary number multiplying through the sample sequence after the parallel accumulation process, with the premultiplication of realization with second matrix of coefficients; To carry out 3 tunnel parallel accumulation process through the sample sequence of imaginary number multiplying, with the premultiplication of realization with the 3rd matrix of coefficients.
Preferably, will carry out through the sample sequence of imaginary number multiplying also comprising: the employing sequence after the accumulation process is carried out block floating point detect, and be adjusted to the nature order by the sequential nature of output matrix after the 3 tunnel parallel accumulation process.
Preferably, sample sequence to be transformed is carried out before DFT/IDFT handles according to 12 point transformation, also comprise: judge whether described sample sequence to be transformed is 12 variations, if not, then described sample sequence is carried out DFT/IDFT according to the mixed base implementation algorithm and handle.
Preferably, described mixed base comprises one of following at least: base 2, base 3, base 4 and base 5.
Preferably, what adopt in the described mixed base implementation algorithm is butterfly computation, and twiddle factor is that 0 degree is to 45 degree.
Preferably, the input of described sample sequence, output and intermediate result all adopt to interweave and write and jump the mode that read the location.
According to a further aspect in the invention, provide a kind of DFT/IDFT processor, having comprised: 12 DFT processing units, be used for sample sequence to be transformed is divided according to 12 sampled points, and the sample sequence after will dividing carries out the DFT/IDFT processing according to 12.
Preferably, this processor also comprises: the block floating point detecting unit is used for that the employing sequence after the accumulation process is carried out block floating point and detects, and is adjusted to the nature order by the sequential nature of output matrix.
Preferably, this processor also comprises: the complex multiplication unit is used for the multiplying of the butterfly computation of mixed base; The complex addition unit is used for the additive operation of the butterfly computation of mixed base;
Preferably, described complex multiplication unit also comprises twiddle factor storage ROM.
Preferably, this processor also comprises: data storage cell, comprise a pair of ping-pong ram that is become by two single port RAM fabrics, and the interweaving of input, output and intermediate result that is used for described sample sequence writes and jumps the location and read.
Preferably, this processor also comprises: the scheduling controlling unit is used for control the read-write of data storage unit is enabled generation with the address.
By the present invention, adopt sample sequence to be transformed is carried out the method that DFT/IDFT handles according to 12 point transformation, being about to different sampled point sequences of counting and being divided into 12 points, is that the basis is carried out DFT/IDFT and handled with 12, has solved in the correlation technique when carrying out the DFT/IDFT deal with data, time delay is bigger, and then compressed processing time of follow-up link, to system brought great risk problem, and then reduced time delay, promote the security performance of system, and then improved user's experience.
Embodiment
Hereinafter will describe the present invention with reference to the accompanying drawings and in conjunction with the embodiments in detail.Need to prove that under the situation of not conflicting, embodiment and the feature among the embodiment among the application can make up mutually.
Based in the correlation technique when carrying out the DFT/IDFT deal with data, time delay is bigger, and then has compressed the processing time of follow-up link, to system brought great risk problem, the invention provides a kind of DFT/IDFT disposal route, as shown in Figure 1, the method comprising the steps of S102:
Step S102 carries out DFT/IDFT with sample sequence to be transformed according to 12 point transformation and handles.
Pass through the embodiment of the invention, adopt sample sequence to be transformed is carried out the method that DFT/IDFT handles according to 12 point transformation, be about to different sampled point sequences of counting and be divided into 12 points, be that the basis is carried out DFT/IDFT and handled with 12, solved in the correlation technique when carrying out the DFT/IDFT deal with data, time delay is bigger, and then compressed processing time of follow-up link, brought the problem of great risk to system, and then reduced time delay, promote the security performance of system, and then improved user's experience.
Because the expression of the sub-carrier number that given each Uplink Shared Channel of 3GPP LTE agreement comprises is as follows:
Wherein, this formula has provided when carrying out the DFT/IDFT processing, can adopt base-2, base-3, and the quick FFT/IFFT computing of mixed base of base-5 realizes
The DFT/IDFT computing of point.
Based on above-mentioned consideration, the embodiment of the invention can also be carried out following processing to sequence to be transformed before execution in step S102: judge whether sample sequence to be transformed is 12 variations.If not 12 points, then sample sequence is carried out DFT/IDFT according to the mixed base implementation algorithm and handle.Wherein, mixed base can comprise a kind of in base 2, base 3, base 4 and the base 5 etc. at least, also can be the mixing of multiple base, certainly, also can comprise base 12 in this mixed base; What adopt in the mixed base implementation algorithm is butterfly computation, and its twiddle factor is that 0 degree is to 45 degree.
In the process of execution in step S102, as shown in Figure 2, the sample sequence after dividing can be carried out the steps that DFT/IDFT handles according to 12 and carry out refinement, comprise that step S202 is to step S206:
Step S202 does 12 tunnel parallel accumulation process with sample sequence, with the premultiplication of realization with first matrix of coefficients.
Step S204 will do the imaginary number multiplying through the sample sequence after the parallel accumulation process, with the premultiplication of realization with second matrix of coefficients;
Step S206 will carry out 3 tunnel parallel accumulation process through the sample sequence of imaginary number multiplying, with the premultiplication of realization with the 3rd matrix of coefficients.
In the implementation of above-mentioned steps, can all adopt full line production, namely in the process that sequence to be transformed is handled, not store any result, just will carry out the result that obtains according to corresponding steps and send in next treatment scheme.For example, in the process of execution in step S202,72 pending sequences to input it is divided by 12, and the sequence after will dividing write.The data that write are done 12 tunnel parallel accumulation process, in the process of parallel processing, processing speed that can Hoisting System.This step has been equivalent to have realized to the carrying out of 12 point sequence matrixes conversion.The execution of step S206 is that the matrix with step S204 is further processed, and is about to further carry out accumulation process through 12 point sequence matrixes of imaginary-part operation, namely does 3 tunnel accumulation process.3 tunnel accumulation process can improve the arithmetic speed of system, further reduce the operation time delay of system, and the arithmetic speed of system is further improved.
After will carrying out 3 tunnel parallel accumulation process through the sample sequence of imaginary number multiplying, can also carry out block floating point to the employing sequence after the accumulation process and detect, and be adjusted to the nature order by the sequential nature of output matrix.
In the implementation, the block floating point detection is a kind of decision design to system's precision, the anti-spilled sign-extension bit of the sequence that it is can detection calculations intact, be in invalid extension bits and before storage, delete detected, finding out the number that extension bits is maximum in the current sequence simultaneously is benchmark, before carrying out the next stage computing with sequence in each the number extension bits compensate to consistent with said reference, this processing procedure is in order to realize dynamic expansion, reduce the loss of significance of data, improve the handling property of system.
Sample sequence in the above steps is imported, exported and calculating in the process of intermediate result, can all adopt interweaves writes and jumps the mode that read the location.Interweave and write the stored in association that has realized parallel multichannel data, and do not need through conventional procedure string, and the conversion, can only handle from small to large with prior art mixed base unit and to compare, broken through the restriction of processing sequence, can adopt the order of calling basic unit arbitrarily, eliminated simultaneously because inconsistent causing of basic unit exports, imports the degree of parallelism difference, need carry out the processing links of data line, rank transformation.
In the present embodiment, adopted 12 DFT/IDFT disposal routes, namely sequence to be transformed has been interweaved with writing with jumping the location and read, sequence to be transformed adopts full flowing water computing in the process of handling, any intermediate data is not stored, saved the resource occupation of system.In implementation process, replace existing basic 3 to add basic 4 iterative processings by 12 full line productiones.For the bigger transform sequence of counting, for example 1200 points adopt the same process technology, and existing scheme needs about 800 cycles, and present embodiment only needs more than 40 cycle, and have reduced secondary data storage, have reduced the ram power consumption; For the small point transform sequence, for example, a UE takies the situation (12 point) of a RB, adopts the same process technology, and prior art needs more than 80 cycle, and this method only needs more than 40 cycle, has solved the time delay bottleneck of follow-up link processing.
The embodiment of the invention also provides a kind of DFT/IDFT processor, as shown in Figure 3, this processor comprises: 12 DFT processing units 10, be used for sample sequence to be transformed is divided according to 12 sampled points, and the sample sequence after will dividing carries out the DFT/IDFT processing according to 12.
For above-mentioned processor is further optimized, can also in above-mentioned processor, add the data storage cell 20 among Fig. 4, scheduling controlling unit 30.In implementation process, scheduling controlling unit 30 is used for control the read-write of data storage unit 20 is enabled generation with the address.Data storage cell 20 with 12 DFT processing units 10 coupling, comprises a pair of ping-pong ram that is become by two single port RAM fabrics, and the interweaving of input, output and intermediate result that is used for sample sequence writes and jump the location and read.
In a preferred embodiment, if input is not 12 sequences to be transformed, then need said apparatus further is optimized, comprise block floating point detecting unit 40 shown in Figure 5, complex multiplication unit 50, complex addition unit 60.Wherein, complex multiplication unit 50 with data storage cell 20 couplings, is used for the multiplying of the butterfly computation of mixed base.Complex addition unit 60 with 50 couplings of complex multiplication unit, is used for the additive operation of the butterfly computation of mixed base; Wherein, the complex multiplication unit also comprises twiddle factor storage ROM.Block floating point detecting unit 40 with 10 couplings of 12 DFT processing units, is used for that the employing sequence after the accumulation process is carried out block floating point and detects, and be adjusted to the nature order by the sequential nature of output matrix.
Preferred embodiment
Below in conjunction with Fig. 6 to Figure 11, the system that the embodiment of the invention is applied in the reality describes in detail.
Based on the regulation of LTE agreement in the correlation technique, 34 kinds of transform sequences satisfy following relation:
N=2a*3b*4c*5d (wherein a, b, c, d are respectively positive integer or 0)
Wherein, each sample sequence all is to be unit with RB, and a RB has 12 subcarriers, and each sample sequence is 12 multiple like this.So transform sequence satisfies following relation again:
N=12*2a*3b*4c*5d (wherein a, b, c, d are respectively positive integer or 0)
This preferred embodiment adopts the mixed base algorithm, and according to the above-mentioned relation formula, each transform sequence is realized by base 12, base 2, base 3, base 4, base 5 combinations respectively.Need to prove: base 2 is realized by base 4 here.Base 12 is to adopt the winograd algorithm to realize 12 DFT computings of a full flowing water.
As shown in Figure 6, be the LTE_DFT system architecture diagram, the LTE_DFT processor comprises: scheduling controlling unit (dft_ctrl), data storage cell (dft_store), complex multiplication unit (dft_cm_mult), complex addition unit (dft_cm_add), block floating point detecting unit (dft_bfp) and 12 DFT processing units (dft_12dft_unit) are totally six unit.
(1) the scheduling controlling unit finishes mainly that storage read-write enables, the generation of address, the configuration that different conversion are counted and scheduling.
(2) data are stored the storage interleave function that the ram cell that interweaves is mainly finished RAM when changing current drainage water between the storage of data, each base.The RAM that interweaves is adopted in the storage of data, by the interweave storage of data in a plurality of RAM, can realize the stored in association of base 3, base 4, basic 5 computing parallel duplex data, and not need through string, also conversion.
What (3) butterfly unit was mainly finished in complex multiplication unit and (4) complex addition unit takes advantage of, adds computing.Take advantage of the multiplication unit also to comprise twiddle factor storage ROM again, stored the twiddle factor of 45 ° of 0 ° of, the twiddle factor of other angles can be obtained by conversion by this part twiddle factor.Complex multiplication and adder unit are not distinguished in strict accordance with base 3, base 4, base 5, but according to the distribution of each basic unit multiplication, addition, fully multiplexing, when realizing function, reduce resource overhead substantially.
(5) the expansion carry after the block floating point detecting unit is mainly finished data and taken advantage of, adds is overflowed dynamic control function.The principle that the block floating point detecting unit adopts block floating point to detect, anti-spilled sign-extension bit to one group of intact sequence of each grade computing detects, invalid extension bits is deleted before storage, find out simultaneously this group sequence in the maximum number of extension bits be benchmark, before carrying out the next stage computing with sequence in extension bits and the benchmark of each number compensate consistent, to realize dynamic expansion, reduce the loss of significance of data, improve the dft performance of processors.
(6) 12 dft mainly finish 12 DFT conversion of a full flowing water.12 DFT processing unit stream treatment can reduce processing delay, especially for 12 conversion, and realize comparing by base 3, basic 4 mixed bases, and time delay has reduced nearly one times.
In specific implementation process, the butterfly computation flow process of DFT comprises that step S702 is to step S710 as shown in Figure 7:
Whether step S702, detecting sequence is 12 point transformation.After one group of N sample sequence (12 multiples) of fixed point had symbol two's complement data to be input to the LTE_DFT processor, beginning was during computing, if one 12 conversion, and execution in step S708 then, otherwise execution in step S704.
Step S704 carries out butterfly computation.If not 12 conversion, then according to dispatching control module, loop iteration, finish multistage basic 3, base 4, basic 5 computings respectively after, in the processing of carrying out 12 DFT of afterbody base.
Step S706 judges whether operation times is 0.If, execution in step S708 then, otherwise execution in step S704.
Step S708, the operational data serial is input to 12 DFT processing units, finishes the processing of base 12, and then these 12 DFT processing operation results are final transformation results, press the output of nature order.
Step S710, flow process finishes.After afterbody base unitary operation was finished, the butterfly computation flow process finished.
In order to improve the handling capacity of DFT conversion, this preferred embodiment adopts the ping-pong ram operation, wherein, data storage cell comprises two identical single port RAM groups, two RAM ping-pong ram that partners, the RAM group is inner to comprise 5 identical little storage RAM, and capacity is 400*36.When " ping " RAM carries out read data, " pang " RAM write in the middle of or result data, and the like.The computing of butterfly unit realizes that according to the order of basic 12-base x wherein, the order of basic x inside is unrestricted.This preferred embodiment adopts the order of base 3, base 4, base 5.
12 DFT processing units are according to the principle of DFT linear transformation, if each data of its 12 output data and 12 relations of importing between the data are represented with formula, can be expressed as the polynomial expression group of one 12 row, each polynomial expression of polynomial expression group inside is corresponding with each output data, and it can be expressed as the result that 12 input data add up with a series of multiplication respectively again.The linear transformation of the polynomial expression group of these 12 row, its coefficient can be expressed as one 12 * 12 matrix.If Y is 12 * 1 output matrix, X is 12 * 1 input matrix, and matrix of coefficients is W, then has: Y=W*X.
In order to simplify above-mentioned computing, we carry out elementary transformation to this matrix of coefficients, and it is expressed as 12 * 12 diagonal matrix B and the form that is multiplied each other by 0,1 ,-1 12 * 12 two the 12 rank square formation A that form and C, then have:
Y=A*B*C*X
Realize 12 DFT/IDFT conversion, as long as realize three matrix multiples, Fig. 8 is the synoptic diagram of 12 DFT processing unit processes flow processs, and the processing of this processing unit can be divided into step S1 to step S4:
Step S1, the premultiplication of realization list entries X and C matrix.Be the parallel accumulation process submodule of 12 tunnel among Fig. 8, output is one 12 * 1 matrix.Here be actually the process that 12 point sequences of input are added and subtracted mutually.In order to reduce time delay, consider that the resource overhead of totalizer in ASIC realizes is not very big, so can 12 tunnel parallel processings.According to Matrix Formula, the input data need reorder, but are simple additive operation here, thus can natural order import, as long as when adding up, adjust according to matrix coefficient.
Step S2 realizes that B and step S1 export result's premultiplication.Be the multiplication process submodule among Fig. 8, output also is one 12 * 1 matrix.Analysis matrix B only needs 8 multipliers to realize last 4 imaginary number multiplication (multiplication resources of the multiplexing whole DFT of imaginary number multiplication inside reduces resource overhead) here as can be known, and other all can be reduced to simple additive operation or directly output.
Step S3 realizes that A matrix and step S2 result carry out premultiplication.It is the parallel accumulation process submodule of 3 tunnel among Fig. 8.According to the flowing water character of step S2 data output, in order to simplify computing, we are one group with three data, and 12 parallel circuit-switched data are divided into 4 groups successively, adopt Fig. 7 identical with S1 unit module that adds up to carry out computing.After adding up, be output as four continuous datas of three-channel parallel.
Step S4 realizes that the block floating point of output data detects and the output of natural order.It is the block floating point detection module among Fig. 6.Each output data class of 12 DFT/IDFT is similar to 12 data and adds up with the multiplication that is not more than 1 respectively again, so the data overflow position can not exceed 4bits, we expand sign bit when adding up.After calculating is finished, detect principle according to block floating point and finish and overflow bit and detect.According to the sequential nature of output matrix Y, when data write buffer memory ram, simultaneously the data order is adjusted to the nature order, convenient output.
Storage to data is to adopt to interweave to write, and reading of data adopted the mode that read the location of jumping.When carrying out base 3, base 4, basic 5 computings, according to the difference of basic unit, distribute according to the input of 3,4,5 channel parallel datas, output also is followed successively by 3,4, the 5 road and walks abreast.According to DFT butterfly computation principle, the buffer memory of data will be considered the convenient parallel data of taking out of follow-up Base computing module.For example, with the operation result buffer memory of basic Unit 4, the interweave storage mode of data in RAM is described.Base Unit 4 mainly comprise three kinds of cache way, and base 4 is handled basic 2 computings, carries out basic 3 computings after finishing; Base 4 next stage still are basic 4 computings; Base 4 next stage are basic 5 computings.The storage that interweaves of data is extremely shown in Figure 11 as Fig. 9.Figure 9 shows that 3 computings of basic 2-base.Base 4 is handled base 2, after computing is finished, and the storage of current drainage water.As can be seen from Figure 9, base 4 is finished 2 tunnel base 2 computings simultaneously, 4 parallel output data write respectively among four little RAM in the RAM group, write the address and increase progressively successively, if next stage is basic 3 computings, need be according to the diagram wiring method, the identical address unit of RAM group, difference according to sheet choosing writes three continuous data, when making things convenient for 3 computings of next stage base and line output.Figure 10 shows that 4 computings of basic 4-base.After base 4 computings finished, 4 channel parallel datas were written among four little RAM of RAM group, and the next stage computing is basic 4-, need be written to 4 continuous data among the different RAM of RAM group identical address.Figure 11 shows that 5 computings of basic 4-base.After base 4 computings finished, 4 channel parallel datas were written among four little RAM of RAM group, and the next stage computing is base 5, need be written to 5 continuous data among the different RAM of RAM group identical address.Data reading from RAM according to the rule of butterfly computation, adopted and jumped the mode that read the location, sense data from RAM.
The data that conversion is finished are according to the scale factor that natural order and block floating point detect, output together.
As can be seen from the above description, the present invention has realized following technique effect:
The above embodiment of the present invention has improved the handling capacity of DFT processor, handles minimum counting, and handling capacity has almost increased by one times, handles and greatly counts, and it is nearly 1/3rd that handling capacity has improved, and handles other various sequences, and handling capacity also has considerable lifting.Removed the treatment step of row, column conversion between the different units base, realized the seamless switching between the different mixed base cell processing, time delay has reduced resource and power consumption, provides cost savings.
Obviously, those skilled in the art should be understood that, above-mentioned each module of the present invention or each step can realize with the general calculation device, they can concentrate on the single calculation element, perhaps be distributed on the network that a plurality of calculation elements form, alternatively, they can be realized with the executable program code of calculation element, thereby, they can be stored in the memory storage and be carried out by calculation element, and in some cases, can carry out step shown or that describe with the order that is different from herein, perhaps they are made into each integrated circuit modules respectively, perhaps a plurality of modules in them or step are made into the single integrated circuit module and realize.Like this, the present invention is not restricted to any specific hardware and software combination.
Be the preferred embodiments of the present invention only below, be not limited to the present invention, for a person skilled in the art, the present invention can have various changes and variation.Within the spirit and principles in the present invention all, any modification of doing, be equal to replacement, improvement etc., all should be included within protection scope of the present invention.