CN102567282A - Device and method for realizing FFT (Fast Fourier Transform) calculation in universal DSP (Digital Signal Processor) - Google Patents

Device and method for realizing FFT (Fast Fourier Transform) calculation in universal DSP (Digital Signal Processor) Download PDF

Info

Publication number
CN102567282A
CN102567282A CN201010607219XA CN201010607219A CN102567282A CN 102567282 A CN102567282 A CN 102567282A CN 201010607219X A CN201010607219X A CN 201010607219XA CN 201010607219 A CN201010607219 A CN 201010607219A CN 102567282 A CN102567282 A CN 102567282A
Authority
CN
China
Prior art keywords
fft
calculation
data
basic
calculating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201010607219XA
Other languages
Chinese (zh)
Other versions
CN102567282B (en
Inventor
樊广超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING GUORUI ZHONGSHU TECHNOLOGY CO LTD
Original Assignee
BEIJING GUORUI ZHONGSHU TECHNOLOGY CO LTD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING GUORUI ZHONGSHU TECHNOLOGY CO LTD filed Critical BEIJING GUORUI ZHONGSHU TECHNOLOGY CO LTD
Priority to CN201010607219.XA priority Critical patent/CN102567282B/en
Publication of CN102567282A publication Critical patent/CN102567282A/en
Application granted granted Critical
Publication of CN102567282B publication Critical patent/CN102567282B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Complex Calculations (AREA)

Abstract

The invention discloses a device and method for realizing an FFT (Fast Fourier Transform) calculation in a universal DSP (Digital Signal Processor). According to the technical scheme, the device for realizing the FFT calculation in the universal DSP comprises an FFT calculating point processing control unit, a data input control unit, a mixed-base FFT calculation unit, a data output control unit and a rotating factor calculation unit, wherein the mixed-base FFT calculation unit is used for performing base 4 FFT disc calculation as well as base 2 FFT disc calculation. The method for realizing the FFT calculation in the universal DSP comprises the following steps: converting an FFT calculation point, with the quantity of the point being more than or equal to a threshold value, into a two-dimensional data IxJ; for the two-dimensional FFT calculation point, performing the base 4 FFT disc calculation on a one-dimensional I point; and for the two-dimensional J point or the FFT calculation point, with the quantity of the point being less than the threshold value, if the pure base 4 FFT disc calculation can be performed on the two-dimensional J point or the FFT calculation point, performing the pure base 4 FFT disc calculation, and if not, performing the base 4 FFT disc calculation at first, and then performing the base 2 FFT disc calculation. According to the technical scheme of the invention, the calculation efficiency is high and the speed is high.

Description

FFT calculates implement device and method in the general dsp processor
Technical field
The present invention relates to signal Processing field and general dsp field of processors, relate in particular to FFT calculating implement device and method in a kind of general dsp processor.
Background technology
Digital signal processing (DSP; Digital Signal Processing) mainly refers to digital Signal Processing; It all is essential for many application, and said application examples is digital signal and image filtering, phonetic synthesis, high speed modem, speech recognition, signal and compression of images or the like in this way.
The general dsp processor is meant a kind of microprocessor of handling a large amount of digital signal informations, has powerful data-handling capacity and very high travelling speed.In the general dsp processor, usually use DFT (Discrete Fourier Transform, discrete Fourier transformation) to come discrete signal is handled, but the computation complexity of DFT is higher.FFT (the Fast Fourier Transform that thereupon occurs; Fast Fourier Transform (FFT)) not only the computation complexity of DFT has been dropped to
Figure 186424DEST_PATH_IMAGE002
by
Figure 206967DEST_PATH_IMAGE001
; And its control is simple relatively; Memory cell structure is simple; It is minimum to expend hardware resource; Thereby be convenient to the low-power dissipation system design; Therefore in field of digital signals, especially all have very widely and use at aspect FFT such as digital communication, Flame Image Process, radio communications.There are many fft processors at present based on ASIC or FPGA.
Fft processor has two kinds of structures usually, and a kind of is cascade structure, and a kind of is single step arrangement.Said cascade structure is meant to have multistage computing unit, and the result of calculation of previous stage is as the input of back one-level; Said single step arrangement is meant the computing unit that has only one-level, and result of calculation is as the input of calculating, cycle calculations next time.In fft processor, adopt the pipeline organization that realizes based on the cooley-turkey algorithm, calculate, realize so be easy to hardware because cooley turkey has the former address.
Fft processor commonly used all adopts multilevel hierarchy, and computing velocity is fast, can realize that streamline calculates, but versatility is not strong, and costs an arm and a leg.Handle if accomplish various convectional signalses, also need general dsp to examine assist process.For the general dsp processor, do not have FFT to calculate needed flood tide storer, and computing unit is fewer, it adopts single step arrangement usually.This single step arrangement mainly comprises base 2 or basic 4 FFT computing units, can only carry out one-level FFT dish at every turn and calculate.
Calculate with regard to basic 2 FFT, it can only handle 2 NPoint, wherein, N=1 ...., n.And basic 4 FFT calculate; It can only handle
Figure 966161DEST_PATH_IMAGE003
point; Wherein, N=1;.; N, for example 16 points, 256 points, 1024 or the like but can't be handled
Figure 361370DEST_PATH_IMAGE004
points such as 32 points, 128 points, 512 points, at 2048.
From the above, it is big that basic 2 FFT calculate the reach of calculating than basic 4 FFT, but the height that the concurrency price ratio that basic 4 FFT calculate is calculated than basic 2 FFT, and the speed that basic 4 FFT calculate is calculated fast than basic 2FFT.
Summary of the invention
The invention discloses FFT calculating implement device and method in the high general dsp processor of a kind of counting yield.
In order to address the above problem, the technical scheme of FFT calculating implement device comprises in the general dsp processor of the present invention:
FFT calculates implement device in the general dsp processor; Be used for the calculation level that is stored in RAM is carried out the FFT conversion; It is characterized in that; Comprise FFT calculation level processing and control element (PCE), data Input Control Element, hybrid base FFT computing unit, data output control unit and twiddle factor computing unit, wherein:
The calculation level that said FFT calculation level processing and control element (PCE) will be counted more than or equal to threshold value converts 2-D data I * J into; Wherein I is that 4 power and I * J equal counting of this FFT calculation level; For less than the data after the calculation level of threshold value and the 2-D data conversion, calculate the control that the mixed class that then carries out base 4 or base 2 calculates if can not carry out pure basic 4;
Said data Input Control Element is used for producing the FFT calculation level in the inverted sequence address of RAM according to said FFT calculation level processing and control element (PCE) process result, uses the DMA reading of data, flows to said hybrid base FFT computing unit;
Said twiddle factor computing unit is used to calculate twiddle factor and exports to said hybrid base FFT computing unit;
Said hybrid base FFT computing unit is used to combine said twiddle factor that the calculation level of input is carried out dish-shaped calculating of perhaps basic 2 FFT of basic 4 FFT dish calculating;
Said data output control unit is used for exporting by its raw address at RAM through said hybrid base FFT computing unit process result.
Said hybrid base FFT computing unit comprises:
Metadata cache is used for data or intermediate result that the said data Input Control Element of buffer memory is carried;
4 multipliers connected to one another and 8 totalizers are used for combining twiddle factor that the data of said metadata cache are carried out one-level base 4 FFT dish and calculate the calculating of perhaps basic 2 FFT dish, and output result of calculation are given output control unit.
Said calculation level processing and control element (PCE) comprises:
Data two-dimensional transform logical block, the grouping that is used for the calculation level greater than threshold value is carried out two dimension comprises the address to produce corresponding two-dimensional array information, row number and row number;
Hybrid base FFT calculation control logical block is used for if can carry out the calculating of pure basic 4 FFT dish, then carrying out the control of basic 4 FFT dish calculating for less than the calculation level of threshold value and the data after the two dimension grouping, otherwise carries out the control of base 4 or the calculating of basic 2 FFT dish.
Said data Input Control Element comprises:
INADD produces logic, the data message that is used for calculating as required, the inverted sequence address of generating run number.
Said data output control unit comprises:
OPADD produces logic, is used for the data message according to calculation level, generates the original preservation address of output data.
The value of said I confirms that according to the capacity of said metadata cache this metadata cache is preferably 64 points.
Said threshold value can preestablish or set by the user.
Correspondingly, FFT calculating implementation method comprises in the general dsp processor of the present invention:
Convert the FFT calculation level of counting more than or equal to threshold value into 2-D data I * J, wherein I is that 4 power and I * J equal counting of this FFT calculation level;
For the Two-dimensional FFT calculation level; Earlier one dimension I point being carried out basic 4 FFT dish calculates; Then for two-dimentional J point or count less than the FFT calculation level of threshold value; Dish-shaped calculating then carried out the dish-shaped calculating of basic 4 FFT to it if it can carry out pure basic 4 FFT, carries out the calculating of basic 2 FFT dish again otherwise it is carried out the calculating of basic 4 FFT dish earlier.
For the FFT calculation level of counting less than threshold value, calculate if can carry out pure basic 4 FFT butterflies, then it is carried out basic 4 FFT butterflies and calculate, carry out basic 2 FFT butterflies calculating again otherwise carry out basic 4 FFT butterflies calculating earlier.
The value of said I is confirmed according to the capacity of said metadata cache.
Said threshold value can preestablish or set by the user.
Compared with prior art, the beneficial effect of FFT calculating implement device and method is in the general dsp processor of the present invention:
At first; Because the present invention can handle FFT calculating carrying out mixed base in the general dsp processor, that is to say, can carry out basic 4 FFT calculating and also can carry out basic 2 FFT calculating; And preferentially adopt basic 4 FFT to calculate, therefore improved the efficient that FFT calculates in the general dsp processor.
Secondly, the present invention has realized the two dimension calculating of data, for the big data of calculated amount, carries out two dimension and calculates, and has effectively reduced the time of calculating, and it is fairly simple to programme, and has improved computing velocity.
Description of drawings
According to the following description that only provides as an example, will more be expressly understood the present invention with reference to accompanying drawing, in the accompanying drawings:
Fig. 1 is the synoptic diagram that FFT calculates implement device in the dsp processor of the present invention;
Fig. 2 is that the FFT dish of 16 bases 4 calculates synoptic diagram;
Fig. 3 is that the FFT dish of 8 bases 2 calculates synoptic diagram;
Fig. 4 is an example schematic diagram of calculating the calculation level processing and control element (PCE) of implement device according to FFT in the dsp processor of the present invention;
Fig. 5 is an example schematic diagram of calculating the hybrid base FFT computing unit of implement device according to FFT in the dsp processor of the present invention;
Fig. 6 is an example schematic diagram of calculating the data Input Control Element of implement device according to FFT in the dsp processor of the present invention;
Fig. 7 is an example schematic diagram of calculating the data output control unit of implement device according to FFT in the dsp processor of the present invention.
Embodiment
Introducing before the present invention calculates the specific embodiment of implement device and method through FFT in the dsp processor, introduce basic 2 FFT dish earlier and calculate dish-shaped calculating with basic 4 FFT.
For transform length is that its Fourier transform of sequence x (n) (DFT) of N can be represented as follows: length is that the DFT of the finite length sequence x (n) of N does
Figure 972480DEST_PATH_IMAGE005
Answering of N point DFT takes advantage of number of times to equal
Figure 794943DEST_PATH_IMAGE001
.Obviously, be decomposed into several short DFT to N point DFT, the multiplication number of times is significantly reduced.
Calculate for basic 4 FFT:
Be divided into 4 parts to N; if
Figure 61976DEST_PATH_IMAGE006
(expression twiddle factor);
Figure 588772DEST_PATH_IMAGE007
;
Figure 257651DEST_PATH_IMAGE008
;
Figure 251015DEST_PATH_IMAGE009
;
Figure 67661DEST_PATH_IMAGE010
, then dish-shaped computing formula does
Figure 804673DEST_PATH_IMAGE011
Figure 390375DEST_PATH_IMAGE012
Figure 820219DEST_PATH_IMAGE013
Figure 61845DEST_PATH_IMAGE014
Calculate for 4 base 2 FFT, its corresponding dish-shaped computing formula is:
Figure 399285DEST_PATH_IMAGE015
Figure 377923DEST_PATH_IMAGE017
Figure 903582DEST_PATH_IMAGE018
Can find out that basic 2 FFT calculate to calculate with basic 4 FFT and need use multiplier and totalizer, thus basic 4 FFT calculate, only need be on the basis that basic 2 FFT calculate increase multiplier and part totalizer just can realize.Base 2 FFT that accomplish at 8 calculate.Every grade of 4 multiplication, 8 sub-additions.Accomplish once basic 4 dishes and calculate, need 3 multiplication to calculate and the calculating of 8 sub-additions.So FFT computing unit of the present invention has adopted 4 multipliers and 8 totalizers, these also can do to use when other calculate at DSP nuclear.
Calculate for Two-dimensional FFT, be meant the array that is divided into the big data of calculated amount two dimension, compare one dimension FFT and calculate, calculate convenient.If metadata cache is 64 points, calculate 64 point data, then can be in the inner base 4 FFT calculating of 64 point data being carried out three grades of mixed base computing unit at every turn; But that for example will calculate counts greater than 64; Such as 128 points, carry out three grades of bases 4 earlier and calculate, carry out one-level base 2 again and calculate.Each level is calculated and all will be written into 64 point data at twice at every turn, all need import the cooperation of control and output control unit at every turn.Each data volume of calculating is little so two dimension is calculated, and the programming complexity is low, and the closeness of calculating is high.
Can find out that by Fig. 2 and Fig. 3 data order before FFT calculates and the data after the calculating are different in proper order.For the FFT calculating of time domain being extracted (DIT); Carrying out FFT according to the order that extracts calculates; The order that is the data input is an inverted sequence, and the result who finally obtains is positive sequence in proper order, in the present invention; Result of calculation to each grade has all been carried out the positive sequence processing, makes the next stage data read reading of data easily.
For same data, can carry out 1 dimension and calculate, also can carry out 2 dimensions and calculate.Here, carry out 2 dimensions and calculate,, reduce the complexity of programming to improve counting yield to bigger data.
As shown in Figure 1, FFT calculating implement device comprises FFT calculation level processing and control element (PCE) 1, data Input Control Element 2, hybrid base FFT computing unit 3, data output control unit 4 and twiddle factor computing unit 5 in the general dsp processor of the present invention, wherein:
Said FFT calculation level processing and control element (PCE) 1 is used for converting the calculation level of counting more than or equal to threshold value into 2-D data I * J, and wherein I is that 4 power and I * J equal counting of this FFT calculation level; For less than the calculation level of threshold value and the data after the 2-D data conversion, if can carry out the calculating of pure basic 4 FFT dish, then carry out the control that pure basic 4 FFT dish calculates, otherwise carry out the control of the mixed class calculating of base 4 or base 2.
Said data Input Control Element 2 is used for producing the FFT calculation level in the inverted sequence address of RAM according to said FFT calculation level processing and control element (PCE) 1 process result, uses the DMA reading of data, flows to said hybrid base FFT computing unit 3;
Said twiddle factor computing unit 5 is used to calculate twiddle factor and exports to said hybrid base FFT computing unit 3;
Said hybrid base FFT computing unit 3 is used to combine said twiddle factor that the calculation level of input is carried out dish-shaped calculating of perhaps basic 2 FFT of basic 4 FFT dish calculating;
Said data output control unit 4 is used for exporting by its raw address at RAM through said hybrid base FFT computing unit 3 process result.
From the above; The present invention calculates the FFT calculation level that implement device will count more than or equal to threshold value through FFT in the dsp processor and converts 2-D data I * J into; Wherein I is that 4 power and I * J equal counting of this FFT calculation level; For the Two-dimensional FFT calculation level, earlier one dimension I point is carried out basic 4 FFT butterflies and calculate, perhaps count for two-dimentional J point then less than the FFT calculation level of threshold value; If it can carry out pure basic 4 FFT butterflies calculating and then it carried out basic 4 FFT butterflies calculating,, it carries out basic 2 FFT butterflies calculating again otherwise being carried out basic 4 FFT butterflies calculating earlier.
Here said threshold value can preestablish or set by the user, for example is set at 512 or 1024 or the like.Here saidly carrying out pure basic 4 FFT butterflies and calculate and be meant that only calculating through basic 4 FFT butterflies is to realize, for example is 64 calculation level for counting, owing to 64=4 3So it can carry out pure basic 4 FFT butterflies calculating, promptly carries out three grades of bases, 4 FFT butterflies and calculates just passable.And be 32 calculation level for counting, 32 can not be converted into 4 power, because 32=4 2* 2, so can not carrying out pure basic 4 FFT butterflies, it calculates, need carry out that 2 grades of bases, 4 FFT butterflies are calculated and 1 grade of base 2 FFT butterfly is calculated and realized.
As shown in Figure 4, said FFT calculation level processing and control element (PCE) 1 comprises:
Data two-dimensional transform logical block 11, the grouping that is used for carrying out two dimension to the calculation level of counting greater than threshold value comprises the address to produce corresponding two-dimensional array information, row number and row number.
Hybrid base FFT calculation control logical block 12 is used for calculating if can carry out pure basic 4 for less than the calculation level of threshold value and the data after the two dimension grouping, then carries out the dish-shaped control of calculating of basic 4 FFT; Otherwise carry out the control that mixed class calculates, promptly carry out base 4 or basic 2 FFT dish and calculate.For example metadata cache is 64 points; It for counting 64 calculation level; It can carry out 3 grades of base 4 FFT dishes and calculate; Therefore the control signal that produces of hybrid base FFT calculation control logical block 12 is the 1st grade of base 4 FFT dish calculation control, the 2nd grade of base 4 FFT dish calculation control and 3rd level base 4 FFT dish calculation control; And corresponding calculated data message (through data Input Control Element 2) and control signal are exported to hybrid base FFT computing unit 3 according to certain sequential; Carry out the dish-shaped calculating of the 1st grade of base 4 FFT, the calculating of the 2nd grade of base 4 FFT dish and the calculating of 3rd level base 4 FFT dish by hybrid base FFT computing unit 3 according to certain sequential, thereby can realize the FFT conversion of 64 calculation level.For example for counting 32 calculation level, because 32=4 2* 2; Therefore the control signal of hybrid base FFT calculation control logical block 12 generations is the 1st grade of base 4 FFT dish calculation control, the 2nd grade of base 4 FFT dish calculation control and the 1st grade of base 2 FFT dish calculation control; And computational data information of correspondence (through data Input Control Element 2) and control signal are defeated by hybrid base FFT computing unit 3 according to certain sequential; When the control signal of exporting to hybrid base FFT computing unit 3 was the 1st grade of base 4 FFT dish calculation control, what then hybrid base FFT computing unit 3 carried out was that the 1st grade of base 4 FFT dish calculates; When the control signal of exporting to hybrid base FFT computing unit 3 was the 2nd grade of base 4 FFT dish calculation control, what then hybrid base FFT computing unit 3 carried out was that the 2nd grade of base 4 FFT dish calculates; When the control signal of exporting to hybrid base FFT computing unit 3 was the 1st grade of base 2 FFT dish calculation control, what then hybrid base FFT computing unit 3 carried out was that the 1st grade of base 2 FFT dish calculates.At this moment the FFT computational transformation of 32 calculation level is accomplished.
As shown in Figure 6, said data Input Control Element 2 comprises that INADD produces logic, the data message (address, size etc.) of the calculation level that calculates as required, the inverted sequence address of generating run number.
For said hybrid base FFT computing unit 3, as shown in Figure 5, it comprises:
Metadata cache 31 is used for calculation level or intermediate result that buffer memory RAM INADD produces the place, inverted sequence address that logical block generated.This metadata cache can adopt the general-purpose register of general dsp nuclear, for example, can preserve 64 point data, if counting smaller or equal to 64 of will calculating then can directly be carried out the multilevel FFT dish and calculate at every turn.As a rule, data two-dimensional transform logical block 11 will come calculation level carried out that two dimension divides into groups according to the size of this metadata cache 31.If the size of this metadata cache 31 for can preserve 64 point data, is all divided calculation level for one group by 64 some positions so usually.
4 multipliers connected to one another and 8 totalizers; Be used under the control of hybrid base FFT calculation control logical block 12, combining twiddle factor that the calculation level of said metadata cache 31 is realized that 4 base 4 FFT dishes of one-level calculate or 8 base 2 FFT dish calculates, and output result of calculation is given data output control unit 4., the general-purpose register that uses general dsp to examine can be preserved 64 point data, if counting smaller or equal to 64 of will calculating then can directly be carried out the multilevel FFT dish and calculate at every turn.
As shown in Figure 7, said data output control unit 4 comprises the OPADD occurrence logic, according to the information of calculation level, is used to generate the original preservation address of output data, makes things convenient for reading of next stage data.
For twiddle factor computing unit 5, be prior art here, can realize with the account form that any calculates twiddle factor well known by persons skilled in the art, repeat no more at this.
Below through being that 2048 calculation level carries out the FFT conversion to counting, the technical scheme that FFT in the dsp processor of the present invention calculates implement device is detailed.
In this example, threshold setting is 1024.
At first calculation level processing and control element (PCE) 1 receives the size of calculation level, and information such as address are carried out pre-service, because 2048 greater than 1024, so it is carried out two-dimensional transform, is transformed to 64 * 32 data of two dimension, and the FFT that therefore carries out at 32 groups every group 64 calculates.
Because 64 can carry out three grades basic 4FFT dish calculating; Therefore hybrid base FFT calculation control logical block 12 produces 3 control signals, is respectively the 1st grade of base 4 FFT dish calculation control, the 2nd grade of base 4 FFT dish calculation control and 3rd level base 4FFT dish calculation control.Then; FFT calculation level processing and control element (PCE) 1 comprises address and size to every group data message, passes to data Input Control Element 2; The inverted sequence address that data Input Control Element 2 produces individual data according to group address is used DMA to be written into 64 point data and is given hybrid base FFT computing unit 3.Hybrid base FFT computing unit 3 is divided into 16 parts to 64 point data, and every part 4 point data and twiddle factor carry out one-level base 4 together under the control of above-mentioned control signal FFT dish calculates.
The hybrid base FFT computing unit 3 every grade 16 part of 4 point data carried out butterfly calculate, and carry out being saved in the corresponding address ram through data output control unit 4 after three grades of FFT dishes calculate, obtain one group 64 FFT result of calculation this moment.
After repeating said process and carrying out altogether calculating for 32 times, 32 groups every group 64 point data that obtains multiplied each other the intermediate data that obtains calculating with the twiddle factor of correspondence respectively.
Intermediate data is divided into 32 * 64 groups of data, and every group the data message that data Input Control Element 2 provides according to FFT calculation level control module 1 produces corresponding data address, uses DMA to be written into 32 point data at every turn.
Because 32=4 2* 2, calculate so 3 pairs 32 point data of hybrid base FFT computing unit are carried out 2 grades of base 4 FFT dishes earlier, and then carry out 1 grade of base 2 FFT dish and calculate, this result promptly is a net result.The result of 4 pairs of hybrid base FFT computing units 3 of data output control unit resequences, and the order of dividing into groups 2-D data is divided into one-dimensional data again, and is saved among the RAM according to final result of calculation series arrangement.
Repeat said process carry out calculating for 64 times with saving result after, be the FFT result of 2048 calculation level thereby obtained counting, calculate end.
In another aspect of this invention, also disclose FFT calculating implementation method in a kind of general dsp processor, said method comprises:
Convert the FFT calculation level of counting more than or equal to threshold value into 2-D data I * J, wherein I is that 4 power and I * J equal counting of this FFT calculation level;
For the Two-dimensional FFT calculation level; Earlier one dimension I point being carried out basic 4 FFT butterflies calculates; For two-dimentional J point, then it is carried out basic 4 FFT butterflies calculating then, carry out basic 2 FFT butterflies calculating again otherwise it is carried out basic 4 FFT butterflies calculating earlier if it can carry out pure basic 4 FFT butterflies calculating.
For the FFT calculation level of counting less than threshold value, calculate if can carry out pure basic 4 FFT butterflies, then it is carried out basic 4 FFT butterflies and calculate, carry out basic 2 FFT butterflies calculating again otherwise carry out basic 4 FFT butterflies calculating earlier.
Preferably, the value of said I confirms that according to the capacity of said metadata cache if the metadata cache capacity is 64, then the value of I is preferably 64.
Wherein, said threshold value can preestablish or set by the user.
For example, for counting 32 calculation level, can carry out 2 grades of bases, 2 FFT butterflies earlier and calculate, and then carry out 1 grade of base, 2 FFT butterflies and calculate.For example, for counting 64 calculation level, carry out 3 grades base 4 FFT butterflies and calculate and can realize the FFT conversion.For example be 2048 calculation level, be divided into 64 * 32 2-D data earlier for counting.Carrying out 32 groups every group 64 3 grades of bases, 4 FFT butterflies earlier calculates; The intermediate data that obtains is divided into 32 * 64 2-D data after multiply by twiddle factor; And then 2 grades of bases, the 4 FFT butterflies of carrying out at 64 groups every group 32 calculate and 1 grade of base 2 FFT butterflies calculating, can realize the FFT conversion of 2048 calculation levels of counting.
Through the description of above-mentioned enforcement, because two dimension is calculated and hybrid base FFT calculates, the both can quicken the speed that FFT calculates, so though increased certain F FT computational logic, the present invention has significantly improved the FFT computing velocity of general dsp nuclear.
What should explain at last is: above embodiment is the unrestricted technical scheme of the present invention in order to illustrate only; Although the present invention is specified with reference to the foregoing description; Those of ordinary skill in the art is to be understood that: still can make amendment or be equal to replacement the present invention; And any modification under the situation that does not break away from the spirit and scope of the present invention, carried out or local replacement, it all should be encompassed in the middle of the claim scope of the present invention.

Claims (10)

1. FFT calculates implement device in the general dsp processor; Be used for the calculation level that is stored in RAM is carried out the FFT computational transformation; It is characterized in that; Comprise FFT calculation level processing and control element (PCE), data Input Control Element, hybrid base FFT computing unit, data output control unit and twiddle factor computing unit, wherein:
The calculation level that said FFT calculation level processing and control element (PCE) will be counted more than or equal to threshold value converts 2-D data I * J into; Wherein I is that 4 power and I * J equal counting of this FFT calculation level; For less than the data after the calculation level of threshold value and the 2-D data conversion, calculate the control that the mixed class that then carries out base 4 or base 2 calculates if can not carry out pure basic 4;
Said data Input Control Element is used for producing the FFT calculation level in the inverted sequence address of RAM according to said FFT calculation level processing and control element (PCE) process result, uses the DMA reading of data, flows to said hybrid base FFT computing unit;
Said twiddle factor computing unit is used to calculate twiddle factor and exports to said hybrid base FFT computing unit;
Said hybrid base FFT computing unit is used to combine said twiddle factor that the calculation level of input is carried out dish-shaped calculating of perhaps basic 2 FFT of basic 4 FFT dish calculating;
Said data output control unit is used for exporting by its raw address at RAM through said hybrid base FFT computing unit process result.
2. FFT calculates implement device in the general dsp processor as claimed in claim 1, it is characterized in that said calculation level processing and control element (PCE) comprises:
Data two-dimensional transform logical block, the grouping that is used for the calculation level greater than threshold value is carried out two dimension comprises the address to produce corresponding two-dimensional array information, row number and row number;
Hybrid base FFT calculation control logical block is used for if can carry out the calculating of pure basic 4 FFT dish, then carrying out the control of basic 4 FFT dish calculating for less than the calculation level of threshold value and the data after the two dimension grouping, otherwise carries out the control of base 4 or the calculating of basic 2 FFT dish.
3. FFT calculates implement device in the general dsp processor as claimed in claim 2, it is characterized in that said hybrid base FFT computing unit comprises:
Metadata cache is used for data or intermediate result that the said data Input Control Element of buffer memory is carried;
4 multipliers connected to one another and 8 totalizers are used for combining twiddle factor that the data of said metadata cache are carried out one-level base 4 FFT dish and calculate the calculating of perhaps basic 2 FFT dish, and output result of calculation are given output control unit.
4. FFT calculates implement device in the general dsp processor as claimed in claim 3, it is characterized in that said data Input Control Element comprises:
INADD produces logic, the data message that is used for calculating as required, the inverted sequence address of generating run number.
5. FFT calculates implement device in the general dsp processor as claimed in claim 4, it is characterized in that said data output control unit comprises:
OPADD produces logic, is used for the data message according to calculation level, generates the original preservation address of output data.
6. FFT calculates implement device in the general dsp processor as claimed in claim 3, it is characterized in that the value of said I is confirmed according to the capacity of said metadata cache.
7. calculate implement device like FFT in the described general dsp processor of claim 1 to 6, it is characterized in that, said threshold value can preestablish or set by the user.
8. FFT calculates implementation method in the general dsp processor, it is characterized in that said method comprises:
Convert the FFT calculation level of counting more than or equal to threshold value into 2-D data I * J, wherein I is that 4 power and I * J equal counting of this FFT calculation level;
For the Two-dimensional FFT calculation level; Earlier one dimension I point being carried out basic 4 FFT dish calculates; For two-dimentional J point, dish-shaped calculating then carried out the dish-shaped calculating of basic 4 FFT to it if it can carry out pure basic 4 FFT, carries out the calculating of basic 2 FFT dish again otherwise it is carried out the calculating of basic 4 FFT dish earlier then;
For the FFT calculation level of counting less than threshold value, calculate if can carry out pure basic 4 FFT butterflies, then it is carried out basic 4 FFT butterflies and calculate, carry out basic 2 FFT butterflies calculating again otherwise carry out basic 4 FFT butterflies calculating earlier.
9. FFT calculates implementation method in the general dsp processor as claimed in claim 8, it is characterized in that the value of said I is confirmed according to the capacity of said metadata cache.
10. FFT calculates implementation method in the general dsp processor as claimed in claim 9, it is characterized in that, said threshold value can preestablish or set by the user.
CN201010607219.XA 2010-12-27 2010-12-27 In general dsp processor, FFT calculates implement device and method Expired - Fee Related CN102567282B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201010607219.XA CN102567282B (en) 2010-12-27 2010-12-27 In general dsp processor, FFT calculates implement device and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201010607219.XA CN102567282B (en) 2010-12-27 2010-12-27 In general dsp processor, FFT calculates implement device and method

Publications (2)

Publication Number Publication Date
CN102567282A true CN102567282A (en) 2012-07-11
CN102567282B CN102567282B (en) 2016-03-30

Family

ID=46412728

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201010607219.XA Expired - Fee Related CN102567282B (en) 2010-12-27 2010-12-27 In general dsp processor, FFT calculates implement device and method

Country Status (1)

Country Link
CN (1) CN102567282B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103440228A (en) * 2013-08-20 2013-12-11 中国人民解放军国防科学技术大学 Method for accelerating FFT calculation based on fused multiplying and adding instructions
CN103685127A (en) * 2013-12-19 2014-03-26 武汉虹信通信技术有限责任公司 FFT asynchronous overlapping conversion device and method applied to LTE
CN103955447A (en) * 2014-04-28 2014-07-30 中国人民解放军国防科学技术大学 FFT accelerator based on DSP chip
CN104615582A (en) * 2015-02-06 2015-05-13 中国人民解放军国防科学技术大学 Large-point one-dimensional FFT vectorization calculational method for GPDSP
CN105403769A (en) * 2015-09-30 2016-03-16 中国农业大学 Circuit structure based on FFT short-time Fourier analysis and control method thereof
CN103955446B (en) * 2014-04-28 2017-04-19 中国人民解放军国防科学技术大学 DSP-chip-based FFT computing method with variable length

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101136891A (en) * 2007-08-09 2008-03-05 复旦大学 3780-point quick Fourier transformation processor of pipelining structure
CN101231632A (en) * 2007-11-20 2008-07-30 西安电子科技大学 Method for processing floating-point FFT by FPGA
CN101454772A (en) * 2006-04-28 2009-06-10 高通股份有限公司 Multi-port mixed-radix fft
CN101504638A (en) * 2009-03-19 2009-08-12 北京理工大学 Point-variable assembly line FFT processor
CN101587469A (en) * 2009-06-03 2009-11-25 北京大学深圳研究生院 Rapid Fourier transform device with variable length

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101454772A (en) * 2006-04-28 2009-06-10 高通股份有限公司 Multi-port mixed-radix fft
CN101136891A (en) * 2007-08-09 2008-03-05 复旦大学 3780-point quick Fourier transformation processor of pipelining structure
CN101231632A (en) * 2007-11-20 2008-07-30 西安电子科技大学 Method for processing floating-point FFT by FPGA
CN101504638A (en) * 2009-03-19 2009-08-12 北京理工大学 Point-variable assembly line FFT processor
CN101587469A (en) * 2009-06-03 2009-11-25 北京大学深圳研究生院 Rapid Fourier transform device with variable length

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103440228A (en) * 2013-08-20 2013-12-11 中国人民解放军国防科学技术大学 Method for accelerating FFT calculation based on fused multiplying and adding instructions
CN103440228B (en) * 2013-08-20 2017-12-22 中国人民解放军国防科学技术大学 A kind of method for accelerating FFT to calculate based on the multiply-add instruction of fusion
CN103685127A (en) * 2013-12-19 2014-03-26 武汉虹信通信技术有限责任公司 FFT asynchronous overlapping conversion device and method applied to LTE
CN103685127B (en) * 2013-12-19 2016-08-17 武汉虹信通信技术有限责任公司 FFT asynchronous lapped transform device and method for LTE
CN103955447A (en) * 2014-04-28 2014-07-30 中国人民解放军国防科学技术大学 FFT accelerator based on DSP chip
CN103955447B (en) * 2014-04-28 2017-04-12 中国人民解放军国防科学技术大学 FFT accelerator based on DSP chip
CN103955446B (en) * 2014-04-28 2017-04-19 中国人民解放军国防科学技术大学 DSP-chip-based FFT computing method with variable length
CN104615582A (en) * 2015-02-06 2015-05-13 中国人民解放军国防科学技术大学 Large-point one-dimensional FFT vectorization calculational method for GPDSP
CN104615582B (en) * 2015-02-06 2018-02-02 中国人民解放军国防科学技术大学 The method calculated towards GPDSP one-dimensional FFT vectorizations of counting greatly
CN105403769A (en) * 2015-09-30 2016-03-16 中国农业大学 Circuit structure based on FFT short-time Fourier analysis and control method thereof
CN105403769B (en) * 2015-09-30 2018-05-11 中国农业大学 A kind of circuit structure and its control method based on FFT Short Time Fourier Analysis

Also Published As

Publication number Publication date
CN102567282B (en) 2016-03-30

Similar Documents

Publication Publication Date Title
CN102567282A (en) Device and method for realizing FFT (Fast Fourier Transform) calculation in universal DSP (Digital Signal Processor)
CN108805266A (en) A kind of restructural CNN high concurrents convolution accelerator
CN110163358B (en) Computing device and method
CN110765709A (en) FPGA-based 2-2 fast Fourier transform hardware design method
CN103226543A (en) FFT processor with pipeline structure
CN102419741A (en) In-place simultaneous prime factor algorithm-based 3780-point discrete Fourier transform processing device and method
CN112231626A (en) FFT processor
CN110598844A (en) Parallel convolution neural network accelerator based on FPGA and acceleration method
CN115310037A (en) Matrix multiplication computing unit, acceleration unit, computing system and related method
CN105095152A (en) Configurable 128 point fast Fourier transform (FFT) device
Lai et al. Low-computation-cycle, power-efficient, and reconfigurable design of recursive DFT for portable digital radio mondiale receiver
Xiao et al. FPGA-based scalable and highly concurrent convolutional neural network acceleration
CN101582059A (en) Method of realizing parallel structure for FFT processor based on FPGA
CN102129419B (en) Based on the processor of fast fourier transform
CN117725963A (en) Method, system and device for converting model reasoning calculation
CN102970545A (en) Static image compression method based on two-dimensional discrete wavelet transform algorithm
CN103237219A (en) Two-dimensional discrete cosine transformation (DCT)/inverse DCT circuit and method
CN102637124A (en) Device and method for parallel processing of radix 4 FFT (fast Fourier transform) algorithm
CN104657334A (en) FFT (Fast Fourier Transform) radix-2-4-8 mixed-radix butterfly operator and application thereof
CN103176949A (en) Circuit and method for achieving fast Fourier transform (FFT) / inverse fast Fourier transform (IFFT)
CN108960203B (en) Vehicle detection method based on FPGA heterogeneous computation
CN101957738A (en) Digital inner product calculator based on first moment
CN116578819A (en) Sparse fraction Fourier transform FPGA implementation method and system
CN103327332B (en) The implementation method of 8 × 8IDCT conversion in a kind of HEVC standard
CN101840393B (en) Treatment method and device of hybrid base discrete fourier conversion

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20160330

Termination date: 20211227