CN106415526A - FET processor and operation method - Google Patents
FET processor and operation method Download PDFInfo
- Publication number
- CN106415526A CN106415526A CN201680000901.8A CN201680000901A CN106415526A CN 106415526 A CN106415526 A CN 106415526A CN 201680000901 A CN201680000901 A CN 201680000901A CN 106415526 A CN106415526 A CN 106415526A
- Authority
- CN
- China
- Prior art keywords
- data
- read
- twiddle factor
- input data
- processing element
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/14—Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
- G06F17/141—Discrete Fourier transforms
- G06F17/142—Fast Fourier transforms, e.g. using a Cooley-Tukey type algorithm
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/14—Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
Landscapes
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Algebra (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Complex Calculations (AREA)
- Discrete Mathematics (AREA)
Abstract
The invention relates to the field of signal processing, and discloses an FET processor and an operation method. The FET processor comprises two data storage units, a rotation factor storage unit, a plurality of butterfly operation units, a data read-write unit, and a rotation factor read-write unit, the data read-write unit is connected with the two data storage units and the butterfly operation units, the two data storage units are used for uniformly storing N pieces of input data and N pieces of output data of the plurality of butterfly operation units, the rotation factor read-write unit is connected with the rotation factor storage unit and the butterfly operation units, the rotation factor storage unit is used for storing N/2 rotation factors, the rotation factor read-write unit is used for reading N/2 rotation factors one by one and inputting the N/2 rotation factors read one by one into the plurality of butterfly operation units in sequence, and the data read-write unit is also used for storing N pieces of output data one by one. The invention also discloses an FET operation method. According to the FET processor and the FET operation method, multi-point FET operation is realized, and the circuit area is reduced.
Description
Technical field
The present invention relates to field of signal processing, particularly to a kind of fft processor and operation method.
Background technology
Fourier transformation is a kind of variation that signal transforms from the time domain to frequency domain, is the important analysis of signal transacting
Means.Discrete Fourier transform (Discrete Fourier Transform, referred to as " DFT ") is Fourier transformation in discrete system
Representation in system.But the amount of calculation of DFT is very big.Fast Fourier changes (Fast Fourier
Transformation, referred to as " FFT ") it is a kind of highly effective algorithm of DFT, it is according to the characteristic such as odd, even, empty, real of DFT, right
DFT algorithm improves and obtains, thus substantially reducing the operand of DFT algorithm.
Fft processor is a kind of hardware configuration of fft algorithm, in prior art, is capable of the side of fft processor function
Method has many kinds, but mostly has some limitations.For some implementation methods, the FFT computing of single points can only be supported,
Also some methods then need to take substantial amounts of resource, and hardware circuit area is larger.
Content of the invention
The purpose of embodiment of the present invention is to provide a kind of fft processor and operation method so that multiple spot FFT computing obtains
To realize, increase the application scenarios of fft processor, take small electric road surface simultaneously and amass, lower circuit power consumption, reduction circuit becomes
This.
For solving above-mentioned technical problem, embodiments of the present invention provide a kind of fft processor, including:Two data
Memory cell, twiddle factor storage unit, multiple butterfly processing element, date read-write cell and twiddle factor read-write cell;
Date read-write cell is connected to two data storage cells and each butterfly processing element;Two data storage cells divide
Yong Yu uniformly not deposit N number of input data of multiple butterfly processing elements and N number of output data;Wherein, N=2k, k >=3 and k is
Integer;
Twiddle factor read-write cell is connected to twiddle factor storage unit and each butterfly processing element;Twiddle factor storage is single
Unit is used for depositing N/2 twiddle factor;
Wherein, date read-write cell be used for read N number of input data one by one, and by the N number of input data reading one by one according to
The multiple butterfly processing elements of secondary input;Twiddle factor read-write cell is used for reading N/2 twiddle factor one by one, and will read one by one
N/2 twiddle factor sequentially input multiple butterfly processing elements;Date read-write cell is additionally operable to store N number of output number one by one
According to.
Embodiments of the present invention additionally provide a kind of FFT operation method, including:
Date read-write cell will equably leave one of data storage cell in from N number of input data of external reception
In;
Twiddle factor read-write cell will leave twiddle factor storage unit in from N/2 twiddle factor of external reception;
Date read-write cell reads N number of input data one by one, and the N number of input data reading one by one is sequentially input multiple
Butterfly processing element;
Twiddle factor read-write cell reads N/2 twiddle factor one by one, and by N/2 read one by one twiddle factor successively
Input multiple butterfly processing elements;
Each butterfly processing element obtains each output data according to each input data receiving and each twiddle factor computing;
Date read-write cell stores each output data one by one to another data storage cell;
Wherein, each output data is as each input data of next stage computing, and carries out k level loop computation.
In terms of existing technologies, data uniformly leaves in two data storage cells embodiment of the present invention, right
For the input data of different points, can be read out using same reading rule, therefore can realize supporting multiple spot
Computing.And, date read-write cell reads input data one by one, and the input data reading one by one is sequentially input multiple butterflies
Arithmetic element, and the output data of each butterfly processing element is stored one by one, that is, in the same time, there is an input defeated with one
Go out data, therefore it may only be necessary to two data storage cells carry out the storage of data, circuit area can be saved.
In addition, the number of butterfly processing element is 4.By the pattern of 4 butterfly processing element circulation work, at utmost
Multiplexing butterfly processing element, circuit area can be reduced as far as possible, and 4 butterfly processing elements are from data storage cell
Continuously read data successively, the free time of arithmetic element can be avoided, and output result is constantly in effective status, thus effectively
Ground improves butterfly processing element utilization rate.
In addition, each butterfly processing element includes 1 multiplier and 2 adders;Each butterfly processing element is used for realizing base 2
Butterfly computation.The structure of each butterfly processing element in present embodiment is relatively simple, thus substantially reducing circuit area.
In addition, the value of k is k≤10.Different according to the value of k configuration, it is achieved thereby that supporting the FFT process of different points
Device computing.
In addition, the storage address incremented by successively of each input data;Each data storage cell includes 1024 addresses, works as k=
10, N=210When=1024, each input data is deposited successively;When k≤9, the address gaps of each adjacent input data are equal.Defeated
Enter, output data uniformly occupies whole memory address space, convenient calculating, need not configure not for the fft processors of different points
Colleague's numerical procedure.
In addition, for i-stage computing, wherein i=0,1 ..., k, reading N number of input data one by one in date read-write cell
In, the producing method of the reading address of each input data includes:Obtain each input data corresponding counter binary system ordered series of numbers;Will
Last i+1 position in counter binary system ordered series of numbers is inverted;Whole data after will be inverted for above-mentioned last i+1 position is inverted,
Using the reading address as each input data.Address data memory is reasonably distributed, correct FFT fortune can be completed
Calculate.
Brief description
Fig. 1 is the structural representation of the fft processor according to first embodiment of the invention;
Fig. 2 is a kind of butterfly processing element internal arithmetic process of the fft processor according to first embodiment of the invention
Schematic diagram;
Fig. 3 is a kind of flow chart of the FFT operation method according to third embodiment of the invention;
Fig. 4 is that the reading address according to input data in a kind of FFT operation method of four embodiment of the invention produces
The flow chart of mode;
Fig. 5 is the reading address sequence according to twiddle factor in a kind of FFT operation method of fifth embodiment of the invention
Producing method flow chart.
Specific embodiment
For making the object, technical solutions and advantages of the present invention clearer, below in conjunction with each reality to the present invention for the accompanying drawing
The mode of applying is explained in detail.However, it will be understood by those skilled in the art that in each embodiment of the present invention,
In order that reader more fully understands that the application proposes many ins and outs.But, even if there is no these ins and outs and base
In following embodiment many variations and modification it is also possible to realize the application technical scheme required for protection.
The first embodiment of the present invention is related to a kind of fft processor.Concrete structure schematic diagram is as shown in figure 1, include:Two
Individual data storage cell 11 and 12, twiddle factor storage unit 13, multiple butterfly processing element 161 to 164, date read-write cell
14 and twiddle factor read-write cell 15;Date read-write cell 14 is connected to two data storage cells 11 and 12 and each butterfly fortune
Calculate unit 161 to 164;Twiddle factor read-write cell 15 is connected to twiddle factor storage unit 13 and each butterfly processing element 161
To 164.
Wherein, two data storage cells 11 and 12, two in present embodiment data storage cell can be random
Access memory (Random Access Memory, referred to as " RAM "), two RAM are respectively used to uniformly deposit multiple butterfly fortune
Calculate N number of input data of unit 161 to 164 and N number of output data;Wherein, N=2k, k >=3 and k are integer.Reading and writing data list
Unit 14 is used for reading N number of input data one by one according to the address sequence producing, and will read one by one according to the address sequence producing
N number of input data sequentially input multiple butterfly processing elements 161 to 164.Additionally, date read-write cell 14 is additionally operable to deposit one by one
Store up N number of output data.
Specifically, before starting to calculate, need the data that will be used for required for butterfly processing element 161 to 164 calculating,
Import in a data storage cell 11, for example, import in data storage cell 11, when the enable signal setting of fft processor
After height, data storage cell 11 produces suitable address sequence according to residing series, and this address sequence is in data storage
In unit 11, corresponding data can be read by date read-write cell 14 as input data, and carries out further FFT computing,
After calculating finishes, date read-write cell 14, according to storage address, this output data is stored in data storage cell 12, and full
Sufficient storage address is consistent with the reading address in data storage cell 11, that is, participate in the reading address of data and the calculating calculating
The storage address finishing data keeps identical.
It should be noted that date read-write cell 14 can read the data of input from data storage cell 11, also may be used
To write output data to data storage cell 11, such operation can also be done for data storage cell 12 in the same manner, data is deposited
Storage unit to reduce consumed resource in the form of ping-pong ram, i.e. data storage cell 11 or 12, both can be used as input
The memory cell of data is it is also possible to memory cell as output data, and data is uniformly to be stored in data storage cell
11st, in 12 it can be understood as, the address correspondent equal that input or output data are deposited in data storage cell 11 or 12.
Fft processor in present embodiment adopts the algorithm of base 2, and the computing that N supports for fft processor is counted, wherein,
N=2k, k >=3 and k are integer;Then the minimum of a value of N is 8, the minimum FFT computing supporting at 8 points is described, and 2-base algorithm is a total of
K level computing.The value of k setting is different, and that is, FFT computing points N is different, and that is, computing series is also different.Wherein, every one-level is transported
Calculate, be required for date read-write cell 14 and read data from one of data storage cell, and will butterfly unit after calculating
Output data stores in another data storage cell, and when carrying out next stage computing, date read-write cell 14 is from upper level
It is stored in reading data in the data storage cell of data, after calculating, the output data of butterfly unit stores upper level receive data
According to data storage cell in.For example, date read-write cell 14 reads data from data storage cell 11, and will calculate queen butterfly
The output data of shape unit stores in data storage cell 12, and when carrying out next stage computing, date read-write cell 14 is from number
According to reading data in memory cell 12, after calculating, the output data of butterfly unit stores in data storage cell 11.
Additionally, the parameter in fft processor calculating process, can be configured by user.
In present embodiment, twiddle factor storage unit 13 is used for depositing N/2 twiddle factor.
Before starting to calculate, need to import the twiddle factor table being used for required for butterfly processing element 161 to 164 calculating
To in corresponding twiddle factor storage unit 13, more particularly, according toCan know
For could support up 1024 points of fft processors, the twiddle factor required for it all can be converted in roadWherein N
Span is 0-511, and 16 bit wides (bit) that real part and imaginary part are done signed number quantify, and the result of quantization is stored respectively
In twiddle factor storage unit 13.For each two input term signal it is only necessary to a twiddle factor computing draws output
Data, therefore, for the fft processor supporting N point, needs N/2 twiddle factor.
In general, twiddle factor storage unit 13 is segmented into two pieces, with one of memory cell storage rotation because
The real part of son, stores the imaginary part of twiddle factor with another piece of memory cell, and the data of each identical address corresponds.So not
It is limited to this, in actual applications, it is possible to use the high-low-position of this twiddle factor storage unit 13 stores twiddle factor respectively
Real part and imaginary part.Additionally, twiddle factor can be stored in twiddle factor storage unit 13, in computing in table form in advance
During when needing to use twiddle factor, from import in advance read in the twiddle factor form twiddle factor storage unit 13 right
The twiddle factor answered, is further calculated.
Further, twiddle factor read-write cell 15 be used for according to produce address sequence read one by one N/2 rotate because
Son, and by according to the N/2 twiddle factor reading one by one of address sequence producing sequentially input multiple butterfly processing elements 161 to
164.
Specifically, when needing to use twiddle factor in calculating process, by the twiddle factor address signal that is given and
Read to enable, reading the value of twiddle factor, twiddle factor address each two mechanical periodicity once, leads to twiddle factor read-write cell 15
Cross control logic to realize.The N/ that twiddle factor read-write cell 15 will read one by one according to the address sequence of the twiddle factor producing
2 twiddle factors, and sequentially input multiple butterfly processing elements 161 to 164, further calculated.
Additionally, N number of input data of butterfly processing element 161 to 164 by date read-write cell 14 from data storage cell
Read in 11 or 12 and get, each butterfly processing element, such as butterfly processing element 161 need 2 cycles to obtain input data,
And N/2 twiddle factor is read by twiddle factor read-write cell 15 and writes butterfly processing element 161, and N/2 twiddle factor
Real part and imaginary part be stored in twiddle factor storage unit 13 by high-low-position respectively, can obtain with N number of input data simultaneously.Its
In, in present embodiment, the required cycle referred specifically to for the time cycle.
For a butterfly processing element 161, need to do corresponding computing inside it, and obtain output data, such as formula
And (2) (1):
Wherein, x1 (k) and x2 (k) is respectively input data,For twiddle factor, x (k) and x (k+N/2) is through butterfly
The output data of shape arithmetic element 161.
Then the calculating process of whole butterfly processing element is as shown in Fig. 2 its output data is respectively:X (k) and x (k+N/
2).Each output data is write the form of real part and imaginary part, such as formula (3) and (4):
Out1=(xa+xbxc-ybyc)+(ya+xbyc+xcyb)j (3)
Out2=(xa-xbxc+ybyc)+(ya-xbyc-xcyb)j (4)
Wherein xa, xb, xc are respectively x1 (k), x 2 (k),Real part, ya, yb, yc be x1 (k), x2 (k),Void
Portion.
It is noted that for each butterfly processing element, such as butterfly processing element 161, including 1 multiplier and 2
Individual adder, the real part of 2 output out1 and out2 and imaginary part all calculate and finish 7 cycles of needs.Calculating process is:
A cycle, multiplier calculates xb*xc, and result is designated as mul_out;
Second period, No. 1 adder calculates xa+xb*xc, and No. 2 adders calculate xa-xb*xc, and multiplier calculates
Yb*yc, result is equally stored in mul_out;
In 3rd cycle, No. 1 adder calculates xa+xb*xc-yb*yc, here it is the real part of first output out1.With
When, No. 2 adders calculate xa-xb*xc+yb*yc, here it is the real part of second output out2, multiplier calculating xb*yc;
In 4th cycle, No. 1 adder calculates ya+xb*yc, and No. 2 adders calculate ya-xb*yc, and multiplier calculates xc*
yb;
In 5th cycle, No. 1 adder calculates ya+xb*yc+xc*yb, and the imaginary part of now out1 output calculates and completes, No. 2
Adder calculates ya-xb*yc-xc*yb, and the imaginary part of now out2 output calculates and completes.
The real part that we can see that 2 output out1 and out2 from above-mentioned analysis calculates simultaneously and completes, and is calculating
After the completion of, need the output data of butterfly processing element to be stored in data storage cell 11 or 12 by date read-write cell 14
In, and a cycle of data storage cell 11 or 12 can be only written a data, therefore can be by the 3rd cycle and the 5th
In the individual cycle, a bat delay is made in the output of No. 2 adders, so can obtain exporting the reality of out1 when the 3rd cycle
Portion, obtain when the 4th cycle export out2 real part, obtain when the 5th cycle export out1 imaginary part, the 6th
When cycle, obtain exporting the imaginary part of out2, thus meeting the memory requirement of data storage cell 11 or 12.
Because result of calculation is 17 data, and data storage cell 11 or 12 is only capable of storing 16 it is therefore desirable to meter
Calculate result and make cut position and process, cast out minimum 1, that is, be equivalent to output result divided by 2, for every first-level outcome all carry out as
This operation, k level altogether, therefore final result is reduced N times, but due between them relative size unaffected, because
This still can determine frequency by last spectrum distribution.
Further, since the write address of output data needs to be consistent with the reading address of input data, so for each
Butterfly processing element 161, the address that we will cache 2 input datas finishes until calculating, and the place of sole exception is
Afterbody, afterbody needs output data sequence once to be arranged to obtain correct storage order again, for example right
For 1024 points of fft processor, the input address of last 1 grade of first butterfly processing element are 0,1, and OPADD should
Should be for 0,512 (k and k+N/2 always occurs in pairs) it is therefore desirable to add the extra judgement of one-level correct to reach output sequence
The purpose of order.
It is noted that the number of butterfly processing element is 4.
Specifically, in present embodiment, the whole butterfly computation part of fft processor is the folded of 4 butterfly processing elements
Plus, each butterfly processing element takes out two numbers from data storage cell 11 or 12 and is calculated, if by butterfly computation list
Unit 161 to 164 array participates in computing, then can find when the 4th butterfly processing element is started working, first butterfly computation list
Unit's work finishes, and can begin preparing for peeking, 4 butterfly processing elements can meet from data storage cell 11 or 12 next time
In continuous peek successively requirement it is possible to by the pattern with 4 butterfly processing element circulation work, to complete every one-level
Required butterfly computation.The circuit area of this kind of mode is minimum.Wherein, the computing of every one-level is all by N/2 butterfly computation list
Unit's composition.
Additionally, the whole butterfly computation part of fft processor is the superposition of 4 butterfly processing elements in present embodiment,
Then every 8 cycles are that (each butterfly processing element needs 2 cycles to complete the reading of input data to one cycle, then 4 successively
The butterfly processing element of work needs 8 cycles), after carrying out N/8 circulation, N/2 butterfly computation in one-level calculates and finishes,
Enter row operation to obtain last output data due to also needing to 7 cycles after last arithmetic element reads data, because
The time of this every one-level could be arranged to N (reading the time of data)+7 (finally read and finish the time needing computing)+1 and (reads ground
Location and the bat reading delay between data)=N+8 the cycle.
According to above analyzing, for supporting 1024 points of fft processor, one-level computing needs 1032 cycles, meter
Number device to represent this 1032 cycles from 0-1031, and the wherein 0-1023 cycle is reading letter from data storage cell 11 or 12
Number, now reading to enable is height, and for 4 butterfly processing element cycle calculations patterns, every 8 cycles are a circulation, therefore
So every 8 cycles just may occur in which once, and other input enables and all postpones a cycle successively, so just obtains successively
48 whole inputs of butterfly processing element enable.
And for output data, according to analysis above, a butterfly processing element output is divided into real part and imaginary part,
Real part output enables and imaginary part output enables and all continues 2 cycles, and butterfly processing element input exactly needs 2 cycles to read
Fetch data, the result of therefore next butterfly computation is just caing be compared to out 2 cycles in result evening of a upper butterfly processing element, because
, for generally speaking, output result is constantly in effective status for this, and for real part output and imaginary part, every 8 cycles are one
Individual circulation, sequentially inputs first, second, the 3rd, the operating structure of the 4th butterfly processing element, until this wheel calculates
Finish.
It should be noted that in present embodiment, using the pattern of 4 butterfly processing element circulation work, then 4 butterflies
Arithmetic element as minimum basic processing unit, and due to every 2 points of input data, can need a butterfly computation list
Unit, therefore, minimum can support the fft processor computing that N is at 8 points, in conjunction with N=2kIt will be understood that k is >=3 integer.
Present embodiment relatively with prior art for, its main difference and effect are:Data uniformly leaves two in
In individual data storage cell, for the input data of different points, can be read out with same rule, therefore permissible
Realize supporting the computing of multiple spot.Date read-write cell reads input data one by one according to the address sequence producing, and will be according to product
The input data that raw address sequence reads one by one sequentially inputs multiple butterfly processing elements, and date read-write cell can be one by one
Storage input, the data of output, in the same time, have an input and an output data, therefore it may only be necessary to two data are deposited
Storage unit carries out the storage of data, can save circuit area.
It is noted that involved each module in present embodiment is logic module, in actual applications, one
Individual logical block can be a part for a physical location or a physical location, can also be with multiple physics lists
The combination of unit is realized.Additionally, for the innovative part projecting the present invention, will not be with solution institute of the present invention in present embodiment
The unit that the technical problem relation of proposition is less close introduces, but this is not intended that in present embodiment there are not other lists
Unit.
Second embodiment of the present invention is related to a kind of fft processor.Second embodiment is entering of first embodiment
One-step optimization, is in place of main optimization:In second embodiment of the invention, the value of k is k≤10, and each input data
Storage address incremented by successively.Each data storage cell includes 1024 addresses, works as k=10, N=210When=1024, each input
Data is deposited successively;When k≤9, the address gaps of each adjacent input data are equal.It is known that additionally, value according to k
Different, it is possible to achieve to support the fft processor of 8-1024 point, that is, under not changing existing equipment hardware environment, for example, do not change
In the case of the spatial content of data storage or address signal bit wide, it is possible to achieve support the FFT computing of maximum number of points scope.
Specifically, during the FFT calculating low spot number it would be desirable in the data write data storage cell calculating
Address is not continuous, for example, for 512 points of fft processors, in data write data storage cell address be 0,2,4,
6 ... 1022 such saltus steps, and for 256 points of FFT, the address that data writes in data storage cell is then 0,4,8,
16...1020 so change, the core concept that they meet is that data is uniformly occupied whole address space by needs, and not
It is in a certain piece continuously writing in address space, for the concrete restriction uniformly depositing data mode so that data is with same
One rule is read, thus realizing supporting multiple spot FFT computing.
It should be noted that present embodiment not only supports that maximum 1024 points of fft processor calculates, can also support
The computing of the fft processor more than 1024 points.If necessary to support the fft processor computing of higher points, need only to change
It is only to define higher k series, and the address signal of bigger data storage cell and bigger bit.
Present embodiment relatively with prior art for, its main difference and effect are:Do not changing existing equipment
Under hardware environment, for example, not in the case of the spatial content of change data memory or address signal bit wide, it is possible to achieve support
The FFT computing of maximum number of points scope.
Third embodiment of the invention is related to a kind of FFT operation method, as shown in figure 3, including:
Step 301:Date read-write cell uniformly leaves the input data from external reception in data storage cell.
Specifically, date read-write cell will equably leave one of number in from N number of input data of external reception
According in memory cell.The numerical value of N is the points that in present embodiment, fft processor can be supported, N can not change existing setting
In the case of standby, in the maximum magnitude allowing points, carry out value.And, in data storage cell, import FFT to be carried out in advance
N number of input data of computing, N number of input data uniformly leaves in a data storage cell, wherein, is uniformly distributed and can manage
Xie Wei, needs data storage address gaps to be in the data store the same, thus ensureing that data is read with identical rule
Take, realize supporting the computing of multiple spot.
Step 302:Twiddle factor read-write cell will leave twiddle factor storage unit in from the twiddle factor of external reception.
Specifically, twiddle factor read-write cell by from N/2 twiddle factor of external reception leave in described rotation because
Sub- memory cell.Because, in FFT calculating process, each two input data is used in conjunction with a twiddle factor, if input number
According to for N number of, then need N/2 twiddle factor, just can carry out FFT computing.And, N/2 twiddle factor is with twiddle factor
The form of table is pre-deposited in twiddle factor storage unit by twiddle factor read-write cell.
It should be noted that not having strict logical order between step 301 and step 302, order can be carried out
Exchange, date read-write cell leaves input data in data storage cell and places the data in rotation with twiddle factor read-write cell
Before and after the execution sequence of transposon memory cell, the result of FFT computing can't be caused any impact.
Step 303:Date read-write cell reads input data one by one, and is sequentially input butterfly processing element.
Specifically, date read-write cell reads N number of input data one by one according to the address sequence producing, and will be according to product
N number of input data that raw address sequence reads one by one sequentially inputs multiple butterfly processing elements.Date read-write cell is from wherein
Obtain N number of input data in one data storage cell, and the N number of input data obtaining is stored in multiple butterfly processing elements.
Wherein, basic butterfly processing element is 4 butterfly processing elements.Because date read-write cell reads and writes data one by one, therefore, same
One time, only one of which input and an output data, then only need to two data storage cells and carry out data storage, save electricity
Road area occupied.
Step 304:Twiddle factor read-write cell reads twiddle factor one by one, and is sequentially input butterfly processing element.
Specifically, twiddle factor read-write cell reads N/2 twiddle factor one by one according to the address sequence producing, and will
The N/2 twiddle factor that address sequence according to producing reads one by one sequentially inputs multiple butterfly processing elements.By rotation because
Sub- read-write cell, in twiddle factor storage unit, reads out N/2 twiddle factor, and by N/2 read out twiddle factor
Input multiple butterfly processing elements.
It should be noted that not having strict logical order between step 303 and step 304, order can be carried out
Exchange, date read-write cell reads input data and is stored in butterfly processing element and twiddle factor read-write cell reading twiddle factor
And it is stored in the execution sequence of butterfly processing element successively, the result of FFT computing can't be caused any impact.
Step 305:Butterfly processing element computing obtains output data.
Specifically, each butterfly processing element obtains each output according to each input data receiving and each twiddle factor computing
Data.By the use of 4 butterfly processing elements as a basic unit circulation, by date read-write cell and twiddle factor read-write
Unit do not stop to read from data storage cell and twiddle factor storage unit and write data basic structure, input data with
Twiddle factor is constantly selected, and is subsequently calculated, subsequent output data, only only needs to the ground to the write that every one-level reads
Location is changed accordingly, just can complete whole FFT arithmetic operation by not stopping multiplexing basic processing unit, and reduce
The idle stand-by period.
Step 306:Date read-write cell stores output data to data storage cell.
Specifically, date read-write cell stores each output data one by one to another data storage cell.Wherein, walk
It is stored in the data ground of another data storage cell in the data address reading in the data store in rapid 301 and this step
Location, needs to be consistent, and could facilitate and read-write data is operated.
Step 307:Counter records previous cycle series.
Specifically, when often completing an output data storage, the automatic record of counter currently follows date read-write cell
Ring series.Wherein, the initial value of counter is 0, then when representing initial computing, recurring series is 0, often completes once to export number
According to during storage, counter adds one automatically, and the result after Jia is saved in counter again.
Step 308:Judge whether previous cycle series is equal with k value.
Specifically, judge whether the currency preserving in counter is equal to k value, if equal to, then enter step 309
In, if entering step 303.If the value in counter is identical with k value, illustrate to have completed whole FFT fortune
Calculate, and enter in step 309.Otherwise, illustrate that previous cycle series is less than k value, also do not complete k level loop computation, then enter
In step 303, reacquire input data and the twiddle factor of next stage, and they are inputted butterfly processing element, until complete
Till becoming the computing of all k levels.
Wherein, each output data is as each input data of next stage computing, and in step 301 and step 302, by
In pre-depositing data storage cell and twiddle factor storage unit respectively from the input data of external reception and twiddle factor, because
This, do not enter in k level cycle calculations.
Step 309:Empty counter.
Specifically, when recurring series is equal to k value, that is, have been completed k level computing, then empty the value of counter,
In next FFT computing, again count, that is, recalculate computing series.
2 identical data storage cells are contained, two data storage cells both can store input number in present embodiment
According to it is also possible to store output data, 2 data storage cells carry out the storage of data in the form of ping-pong ram.It is appreciated that
For, for the FFT computing of every one-level, need to obtain input data from first data storage cell, and by the knot after calculating
Fruit exports second data storage cell, and the FFT computing of next stage just will obtain input from second data storage cell
Data, and the output data first after calculating is stored in data storage cell.
Wherein, the selection of the value of k can determine the points that FFT computing can be supported, and FFT computing runs k level altogether and follows
Ring.
Present embodiment relatively with prior art for, its main difference and effect are:Data uniformly leaves two in
In individual data storage cell, for the input data of different points, can be read out, therefore using same reading rule
Can realize supporting the computing of multiple spot.Date read-write cell reads input data one by one according to the address sequence producing, and will be by
Sequentially input multiple butterfly processing elements according to the input data that reads one by one of address sequence producing, and by each butterfly processing element
Output data store one by one, that is, in the same time, have an input and an output data, therefore it may only be necessary to two data
Memory cell carries out the storage of data, can save circuit area.
It is seen that, present embodiment is the embodiment of the method corresponding with first embodiment, and present embodiment can be with
First embodiment is worked in coordination enforcement.The relevant technical details mentioned in first embodiment still have in the present embodiment
Effect, in order to reduce repetition, repeats no more here.Correspondingly, the relevant technical details mentioned in present embodiment are also applicable in
In first embodiment.
Four embodiment of the invention is related to a kind of FFT operation method.4th embodiment is that the 3rd embodiment enters one
Step optimizes, and is in place of main optimization:In four embodiment of the invention, there is provided a kind of reading address of input data
Producing method;That is, for i-stage computing, wherein i=0,1 ..., k, date read-write cell according to produce address sequence by
In the N number of input data of individual reading, the producing method of the reading address of each input data that present embodiment provides can ensure that reading
The correctness fetched data.Wherein, step 303 date read-write cell in the 3rd embodiment is read in N number of input data one by one,
The flow chart of each input data address producing method, as shown in figure 4, include:
Step 401:Obtain an input data corresponding counter binary system ordered series of numbers.
Specifically, obtain each input data corresponding counter binary system ordered series of numbers.Obtained by date read-write cell
Input data is metric data, after getting metric input data, metric input data is converted into two
The input data of system.For example, the metric input data address of acquisition is " 1 ".Because in the present embodiment, k takes
Value is less than or equal to 10, if the value of k is 10, i.e. N=1024.After the decimal data address obtaining is " 1 " corresponding conversion
Binary system data address be " 0000000001 ".
Step 402:Will be inverted for the last i+1 position in counter binary system ordered series of numbers.
Specifically, i is the 0th grade, then this step is passed through in the binary system data address obtaining in step 401
Afterwards, it is output as " 0000000001 ";
I is the 1st grade, then the binary system data address obtaining in step 401, after this step, is output as
“0000000010”
Step 403:Whole data after will be inverted for last i+1 position is inverted, using the reading ground as each input data
Location.
Specifically, i is the 0th grade, then this step is passed through in the binary system data address obtaining in step 402
Afterwards, the binary data address of output is " 1000000000 ", and corresponding metric data address is " 512 ";
For i be the 1st grade, then in step 402 obtain binary system data address after this step, the two of output
Binary data address is " 0100000000 ", and corresponding metric data address is " 256 ".
As can be seen that reading the selection rule of address:For kth level, input data address sequence is counter binary number
Whole data is changed into reciprocal after taking the inverse of last k+1 position by row again.
Taking 1024 points of FFT computings as a example, we first pass through and make a counter and count down to 1023 from 0, to this sequence
Carry out corresponding reading address conversion, the sequence converting then means to send into first butterfly fortune for 0,512,256,768 ...
Calculate the data that the data in unit is in address 0 and address 512, sending into the data in second butterfly processing element is address
256 and address 768 in data, the like, data is sequentially sent in butterfly processing element according to this rule, and is calculating
It is removed successively after finishing.
Present embodiment relatively with prior art for, its main difference and effect are:Address data memory is entered
Row reasonably distributes it is ensured that completing correct FFT computing.
Fifth embodiment of the invention is related to a kind of FFT operation method.5th embodiment is that the 3rd embodiment enters one
Step optimizes, and is in place of main optimization:In fifth embodiment of the invention, for i-stage computing, read and write single in twiddle factor
Unit reads in N/2 twiddle factor according to the address sequence producing one by one, the generation of the reading address sequence of N/2 twiddle factor
Mode.Wherein, according to the address sequence producing, N/ is read one by one to step 304 twiddle factor read-write cell in the 3rd embodiment
In 2 twiddle factors, the flow chart of each twiddle factor address producing method, as shown in figure 5, include:
Step 501:Generate counting sequence.
Specifically, generate a counting sequence, counting sequence is expressed as:0、1、2、3、……、2i-1.It is appreciated that
For the 0th grade, that is, during i=0, counting sequence is 0,0,0 ...
For the 1st grade, counting sequence is 0,1,0,1 ...;
For the 2nd grade, counting sequence is 0,1,2,3,0,1,2,3 ...;
For i-stage, counting sequence is 0,1,2,3 ... 2i-1、0、1、2、3…
Step 502:Read address sequence using inverted for counting sequence as twiddle factor.
Specifically, by counting sequence 0,1,2,3 ..., 2i- 1 inverted after be expressed as:0th, 512,256,768 ...,
Using the reading address sequence as N/2 twiddle factor.For example, for the 2nd grade, counting sequence is 0,1,2,3,0,1,2,3 ...,
It is expressed as after inverted:0、512、256、768、0、512、256、768…
The principle that mode is chosen in the address of twiddle factor isTherefore for the selection required for every one-level
Twiddle factor value, all can be converted intoWherein the span of k is 0-511, so only needs to the storage list of one piece of 1KB
Unit, then can be stored in all of twiddle factor, simultaneously the change according to address, read suitable twiddle factor and send into butterfly computation list
Row operation is entered by unit., and converted them in the value of every one-level by analyzing twiddle factor in fft algorithmSimultaneously by k
As reading the address selecting twiddle factor to be used from memory cell, address sequence meets certain rule, rule is carried out
Summary can obtain the address producing method introduced as present embodiment.
The step of various methods divides above, is intended merely to describe clear, can merge into when realizing a step or
Some steps are split, is decomposed into multiple steps, as long as including identical logical relation, all in the protection domain of this patent
Interior;To adding inessential modification in algorithm or in flow process or introducing inessential design, but do not change its algorithm
With the core design of flow process all in the protection domain of this patent.
It will be appreciated by those skilled in the art that all or part of step realized in above-described embodiment method can be by
Program to complete come the hardware to instruct correlation, and this program storage, in a storage medium, includes some instructions use so that one
Individual equipment (can be single-chip microcomputer, chip etc.) or processor (processor) execute each embodiment methods described of the application
All or part of step.And aforesaid storage medium includes:USB flash disk, portable hard drive, read-only storage (ROM, Read-Only
Memory), random access memory (RAM, Random Access Memory), magnetic disc or CD etc. are various can store journey
The medium of sequence code.
It will be understood by those skilled in the art that the respective embodiments described above are to realize the specific embodiment of the present invention,
And in actual applications, can to it, various changes can be made in the form and details, without departing from the spirit and scope of the present invention.
Claims (8)
1. a kind of fft processor is it is characterised in that include:Two data storage cells, twiddle factor storage unit, Duo Gedie
Shape arithmetic element, date read-write cell and twiddle factor read-write cell;
Described date read-write cell is connected to described two data storage cells and each butterfly processing element;Described two data are deposited
Storage unit is respectively used to uniformly deposit N number of input data of the plurality of butterfly processing element and N number of output data;Wherein, N=
2k, k >=3 and k are integer;
Described twiddle factor read-write cell is connected to described twiddle factor storage unit and described each butterfly processing element;Described rotation
Transposon memory cell is used for depositing N/2 twiddle factor;
Wherein, described date read-write cell is used for reading described N number of input data one by one, and described N number of defeated by read one by one
Enter data and sequentially input the plurality of butterfly processing element;Described twiddle factor read-write cell is used for reading described N/2 one by one
Twiddle factor, and the described N/2 twiddle factor reading one by one is sequentially input the plurality of butterfly processing element;Described data
Read-write cell is additionally operable to store described N number of output data one by one.
2. fft processor according to claim 1 is it is characterised in that the number of described butterfly processing element is 4.
3. fft processor according to claim 1 is it is characterised in that each butterfly processing element includes 1 multiplier and 2
Individual adder;Each butterfly processing element is used for realizing base 2 butterfly computation.
4. fft processor according to claim 1 is it is characterised in that the value of described k is k≤10.
5. fft processor according to claim 4 is it is characterised in that the storage address incremented by successively of each input data;
Each data storage cell includes 1024 addresses, works as k=10, N=210When=1024, each input data is deposited successively;Work as k
When≤9, the address gaps of each adjacent input data are equal.
6. a kind of FFT operation method is it is characterised in that be applied to the fft processor described in any one in claim 1 to 5,
Described FFT operation method includes:
Described date read-write cell will equably leave one of data storage in from described N number of input data of external reception
In unit;
Described twiddle factor read-write cell will leave described twiddle factor storage in from the described N/2 twiddle factor of external reception
Unit;
Described date read-write cell reads described N number of input data one by one, and by the described N number of input data reading one by one successively
Input the plurality of butterfly processing element;
Described twiddle factor read-write cell reads described N/2 twiddle factor one by one, and described N/2 read one by one is rotated
The factor sequentially inputs the plurality of butterfly processing element;
Each butterfly processing element obtains each output data according to each input data receiving and each twiddle factor computing;
Described date read-write cell stores described each output data one by one to another data storage cell;
Wherein, described each output data is as each input data of next stage computing, and carries out k level loop computation.
7. FFT operation method according to claim 6 is it is characterised in that for i-stage computing, wherein i=0, and 1 ...,
K, reads in described N number of input data in described date read-write cell one by one, the producing method of the reading address of each input data
Including:
Obtain each input data corresponding counter binary system ordered series of numbers;
Will be inverted for the last i+1 position in described counter binary system ordered series of numbers;
Whole data after will be inverted for above-mentioned last i+1 position is inverted, using the reading address as described each input data.
8. FFT operation method according to claim 6 is it is characterised in that for i-stage computing, in described twiddle factor
Read-write cell reads in described N/2 twiddle factor one by one, the producing method of the reading address sequence of described N/2 twiddle factor
Including:
Generate a counting sequence, described counting sequence is expressed as:0、1、2、3、……、2i-1;
Will be inverted for described counting sequence, it is expressed as:0th, 512,256,768 ..., using the reading as described N/2 twiddle factor
Address sequence.
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2016/094465 WO2018027706A1 (en) | 2016-08-10 | 2016-08-10 | Fft processor and algorithm |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106415526A true CN106415526A (en) | 2017-02-15 |
CN106415526B CN106415526B (en) | 2019-05-24 |
Family
ID=58087900
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201680000901.8A Active CN106415526B (en) | 2016-08-10 | 2016-08-10 | Fft processor and operation method |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN106415526B (en) |
WO (1) | WO2018027706A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108062289A (en) * | 2018-01-25 | 2018-05-22 | 天津芯海创科技有限公司 | Fast Fourier Transform (FFT) FFT changes sequence method, signal processing method and device in address |
CN110347968A (en) * | 2019-07-08 | 2019-10-18 | 河海大学常州校区 | A kind of optimization fft algorithm and device based on FPGA |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108319804B (en) * | 2018-04-17 | 2023-08-08 | 福州大学 | 8192 point base 2 DIT ASIC design method for low resource call |
CN112307423B (en) * | 2020-11-19 | 2023-09-22 | 天津大学 | FFT processor based on base 2SDF pipeline type and implementation method thereof in ACO-OFDM system |
CN113569189B (en) * | 2021-07-02 | 2024-03-15 | 星思连接(上海)半导体有限公司 | Fast Fourier transform calculation method and device |
CN117591784B (en) * | 2024-01-19 | 2024-05-03 | 武汉格蓝若智能技术股份有限公司 | FPGA-based twiddle factor calculation method and FPGA chip |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101290613A (en) * | 2007-04-16 | 2008-10-22 | 卓胜微电子(上海)有限公司 | FFT processor data storage system and method |
CN103176950A (en) * | 2011-12-20 | 2013-06-26 | 中国科学院深圳先进技术研究院 | Circuit and method for achieving fast Fourier transform (FFT) / inverse fast Fourier transform (IFFT) |
CN103605636A (en) * | 2013-12-09 | 2014-02-26 | 中国科学院微电子研究所 | Device and method for realizing FFT operation |
CN103970718A (en) * | 2014-05-26 | 2014-08-06 | 苏州威士达信息科技有限公司 | Quick Fourier transformation implementation device and method |
CN104268122A (en) * | 2014-09-12 | 2015-01-07 | 安徽四创电子股份有限公司 | Point-changeable floating point FFT (fast Fourier transform) processor |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7047268B2 (en) * | 2002-03-15 | 2006-05-16 | Texas Instruments Incorporated | Address generators for mapping arrays in bit reversed order |
WO2004004265A1 (en) * | 2002-06-27 | 2004-01-08 | Samsung Electronics Co., Ltd. | Modulation apparatus using mixed-radix fast fourier transform |
TWI298448B (en) * | 2005-05-05 | 2008-07-01 | Ind Tech Res Inst | Memory-based fast fourier transformer (fft) |
CN101072218B (en) * | 2007-03-01 | 2011-11-30 | 华为技术有限公司 | FFT/IFFI paired processing system, device and method |
-
2016
- 2016-08-10 CN CN201680000901.8A patent/CN106415526B/en active Active
- 2016-08-10 WO PCT/CN2016/094465 patent/WO2018027706A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101290613A (en) * | 2007-04-16 | 2008-10-22 | 卓胜微电子(上海)有限公司 | FFT processor data storage system and method |
CN103176950A (en) * | 2011-12-20 | 2013-06-26 | 中国科学院深圳先进技术研究院 | Circuit and method for achieving fast Fourier transform (FFT) / inverse fast Fourier transform (IFFT) |
CN103605636A (en) * | 2013-12-09 | 2014-02-26 | 中国科学院微电子研究所 | Device and method for realizing FFT operation |
CN103970718A (en) * | 2014-05-26 | 2014-08-06 | 苏州威士达信息科技有限公司 | Quick Fourier transformation implementation device and method |
CN104268122A (en) * | 2014-09-12 | 2015-01-07 | 安徽四创电子股份有限公司 | Point-changeable floating point FFT (fast Fourier transform) processor |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108062289A (en) * | 2018-01-25 | 2018-05-22 | 天津芯海创科技有限公司 | Fast Fourier Transform (FFT) FFT changes sequence method, signal processing method and device in address |
CN108062289B (en) * | 2018-01-25 | 2021-09-03 | 天津芯海创科技有限公司 | Fast Fourier Transform (FFT) address order changing method, signal processing method and device |
CN110347968A (en) * | 2019-07-08 | 2019-10-18 | 河海大学常州校区 | A kind of optimization fft algorithm and device based on FPGA |
CN110347968B (en) * | 2019-07-08 | 2023-06-13 | 河海大学常州校区 | FPGA-based FFT optimization algorithm and device |
Also Published As
Publication number | Publication date |
---|---|
WO2018027706A1 (en) | 2018-02-15 |
CN106415526B (en) | 2019-05-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106415526A (en) | FET processor and operation method | |
CN110515739B (en) | Deep learning neural network model load calculation method, device, equipment and medium | |
CN105022670B (en) | Heterogeneous distributed task processing system and its processing method in a kind of cloud computing platform | |
Demmel et al. | Avoiding communication in sparse matrix computations | |
Shu et al. | A parallel transient stability simulation for power systems | |
CN104765589B (en) | Grid parallel computation preprocess method based on MPI | |
CN103955447B (en) | FFT accelerator based on DSP chip | |
Melab et al. | A GPU-accelerated branch-and-bound algorithm for the flow-shop scheduling problem | |
CN106775594A (en) | A kind of Sparse Matrix-Vector based on the domestic processor of Shen prestige 26010 multiplies isomery many-core implementation method | |
CN102135951A (en) | FPGA (Field Programmable Gate Array) implementation method based on LS-SVM (Least Squares-Support Vector Machine) algorithm restructured at runtime | |
CN106933777B (en) | The high-performance implementation method of the one-dimensional FFT of base 2 based on domestic 26010 processor of Shen prestige | |
Shi et al. | Efficient sparse-dense matrix-matrix multiplication on GPUs using the customized sparse storage format | |
CN109240644A (en) | A kind of local search approach and circuit for Yi Xin chip | |
CN109522127A (en) | A kind of fluid machinery simulated program isomery accelerated method based on GPU | |
CN104572588B (en) | Matrix inversion process method and apparatus | |
Wei et al. | Reconstructing permutation table to improve the Tabu Search for the PFSP on GPU | |
CN108647007A (en) | Arithmetic system and chip | |
CN102722472A (en) | Complex matrix optimizing method | |
CN103493039A (en) | Data processing method and related device | |
Kumar et al. | Massively parallel simulations for disordered systems | |
CN113112084B (en) | Training plane rear body research and development flow optimization method and device | |
CN108920097A (en) | A kind of three-dimensional data processing method based on Laden Balance | |
CN102968388B (en) | Data layout's method and device thereof | |
CN115328440A (en) | General sparse matrix multiplication implementation method and device based on 2D systolic array | |
Giles | Jacobi iteration for a Laplace discretisation on a 3D structured grid |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |