CN106415526A - FET processor and operation method - Google Patents

FET processor and operation method Download PDF

Info

Publication number
CN106415526A
CN106415526A CN201680000901.8A CN201680000901A CN106415526A CN 106415526 A CN106415526 A CN 106415526A CN 201680000901 A CN201680000901 A CN 201680000901A CN 106415526 A CN106415526 A CN 106415526A
Authority
CN
China
Prior art keywords
data
read
twiddle factor
input data
processing element
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201680000901.8A
Other languages
Chinese (zh)
Other versions
CN106415526B (en
Inventor
李帆
李一帆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Goodix Technology Co Ltd
Original Assignee
Shenzhen Huiding Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Huiding Technology Co Ltd filed Critical Shenzhen Huiding Technology Co Ltd
Publication of CN106415526A publication Critical patent/CN106415526A/en
Application granted granted Critical
Publication of CN106415526B publication Critical patent/CN106415526B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/14Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
    • G06F17/141Discrete Fourier transforms
    • G06F17/142Fast Fourier transforms, e.g. using a Cooley-Tukey type algorithm
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/14Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Algebra (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Complex Calculations (AREA)
  • Discrete Mathematics (AREA)

Abstract

The invention relates to the field of signal processing, and discloses an FET processor and an operation method. The FET processor comprises two data storage units, a rotation factor storage unit, a plurality of butterfly operation units, a data read-write unit, and a rotation factor read-write unit, the data read-write unit is connected with the two data storage units and the butterfly operation units, the two data storage units are used for uniformly storing N pieces of input data and N pieces of output data of the plurality of butterfly operation units, the rotation factor read-write unit is connected with the rotation factor storage unit and the butterfly operation units, the rotation factor storage unit is used for storing N/2 rotation factors, the rotation factor read-write unit is used for reading N/2 rotation factors one by one and inputting the N/2 rotation factors read one by one into the plurality of butterfly operation units in sequence, and the data read-write unit is also used for storing N pieces of output data one by one. The invention also discloses an FET operation method. According to the FET processor and the FET operation method, multi-point FET operation is realized, and the circuit area is reduced.

Description

Fft processor and operation method
Technical field
The present invention relates to field of signal processing, particularly to a kind of fft processor and operation method.
Background technology
Fourier transformation is a kind of variation that signal transforms from the time domain to frequency domain, is the important analysis of signal transacting Means.Discrete Fourier transform (Discrete Fourier Transform, referred to as " DFT ") is Fourier transformation in discrete system Representation in system.But the amount of calculation of DFT is very big.Fast Fourier changes (Fast Fourier Transformation, referred to as " FFT ") it is a kind of highly effective algorithm of DFT, it is according to the characteristic such as odd, even, empty, real of DFT, right DFT algorithm improves and obtains, thus substantially reducing the operand of DFT algorithm.
Fft processor is a kind of hardware configuration of fft algorithm, in prior art, is capable of the side of fft processor function Method has many kinds, but mostly has some limitations.For some implementation methods, the FFT computing of single points can only be supported, Also some methods then need to take substantial amounts of resource, and hardware circuit area is larger.
Content of the invention
The purpose of embodiment of the present invention is to provide a kind of fft processor and operation method so that multiple spot FFT computing obtains To realize, increase the application scenarios of fft processor, take small electric road surface simultaneously and amass, lower circuit power consumption, reduction circuit becomes This.
For solving above-mentioned technical problem, embodiments of the present invention provide a kind of fft processor, including:Two data Memory cell, twiddle factor storage unit, multiple butterfly processing element, date read-write cell and twiddle factor read-write cell;
Date read-write cell is connected to two data storage cells and each butterfly processing element;Two data storage cells divide Yong Yu uniformly not deposit N number of input data of multiple butterfly processing elements and N number of output data;Wherein, N=2k, k >=3 and k is Integer;
Twiddle factor read-write cell is connected to twiddle factor storage unit and each butterfly processing element;Twiddle factor storage is single Unit is used for depositing N/2 twiddle factor;
Wherein, date read-write cell be used for read N number of input data one by one, and by the N number of input data reading one by one according to The multiple butterfly processing elements of secondary input;Twiddle factor read-write cell is used for reading N/2 twiddle factor one by one, and will read one by one N/2 twiddle factor sequentially input multiple butterfly processing elements;Date read-write cell is additionally operable to store N number of output number one by one According to.
Embodiments of the present invention additionally provide a kind of FFT operation method, including:
Date read-write cell will equably leave one of data storage cell in from N number of input data of external reception In;
Twiddle factor read-write cell will leave twiddle factor storage unit in from N/2 twiddle factor of external reception;
Date read-write cell reads N number of input data one by one, and the N number of input data reading one by one is sequentially input multiple Butterfly processing element;
Twiddle factor read-write cell reads N/2 twiddle factor one by one, and by N/2 read one by one twiddle factor successively Input multiple butterfly processing elements;
Each butterfly processing element obtains each output data according to each input data receiving and each twiddle factor computing;
Date read-write cell stores each output data one by one to another data storage cell;
Wherein, each output data is as each input data of next stage computing, and carries out k level loop computation.
In terms of existing technologies, data uniformly leaves in two data storage cells embodiment of the present invention, right For the input data of different points, can be read out using same reading rule, therefore can realize supporting multiple spot Computing.And, date read-write cell reads input data one by one, and the input data reading one by one is sequentially input multiple butterflies Arithmetic element, and the output data of each butterfly processing element is stored one by one, that is, in the same time, there is an input defeated with one Go out data, therefore it may only be necessary to two data storage cells carry out the storage of data, circuit area can be saved.
In addition, the number of butterfly processing element is 4.By the pattern of 4 butterfly processing element circulation work, at utmost Multiplexing butterfly processing element, circuit area can be reduced as far as possible, and 4 butterfly processing elements are from data storage cell Continuously read data successively, the free time of arithmetic element can be avoided, and output result is constantly in effective status, thus effectively Ground improves butterfly processing element utilization rate.
In addition, each butterfly processing element includes 1 multiplier and 2 adders;Each butterfly processing element is used for realizing base 2 Butterfly computation.The structure of each butterfly processing element in present embodiment is relatively simple, thus substantially reducing circuit area.
In addition, the value of k is k≤10.Different according to the value of k configuration, it is achieved thereby that supporting the FFT process of different points Device computing.
In addition, the storage address incremented by successively of each input data;Each data storage cell includes 1024 addresses, works as k= 10, N=210When=1024, each input data is deposited successively;When k≤9, the address gaps of each adjacent input data are equal.Defeated Enter, output data uniformly occupies whole memory address space, convenient calculating, need not configure not for the fft processors of different points Colleague's numerical procedure.
In addition, for i-stage computing, wherein i=0,1 ..., k, reading N number of input data one by one in date read-write cell In, the producing method of the reading address of each input data includes:Obtain each input data corresponding counter binary system ordered series of numbers;Will Last i+1 position in counter binary system ordered series of numbers is inverted;Whole data after will be inverted for above-mentioned last i+1 position is inverted, Using the reading address as each input data.Address data memory is reasonably distributed, correct FFT fortune can be completed Calculate.
Brief description
Fig. 1 is the structural representation of the fft processor according to first embodiment of the invention;
Fig. 2 is a kind of butterfly processing element internal arithmetic process of the fft processor according to first embodiment of the invention Schematic diagram;
Fig. 3 is a kind of flow chart of the FFT operation method according to third embodiment of the invention;
Fig. 4 is that the reading address according to input data in a kind of FFT operation method of four embodiment of the invention produces The flow chart of mode;
Fig. 5 is the reading address sequence according to twiddle factor in a kind of FFT operation method of fifth embodiment of the invention Producing method flow chart.
Specific embodiment
For making the object, technical solutions and advantages of the present invention clearer, below in conjunction with each reality to the present invention for the accompanying drawing The mode of applying is explained in detail.However, it will be understood by those skilled in the art that in each embodiment of the present invention, In order that reader more fully understands that the application proposes many ins and outs.But, even if there is no these ins and outs and base In following embodiment many variations and modification it is also possible to realize the application technical scheme required for protection.
The first embodiment of the present invention is related to a kind of fft processor.Concrete structure schematic diagram is as shown in figure 1, include:Two Individual data storage cell 11 and 12, twiddle factor storage unit 13, multiple butterfly processing element 161 to 164, date read-write cell 14 and twiddle factor read-write cell 15;Date read-write cell 14 is connected to two data storage cells 11 and 12 and each butterfly fortune Calculate unit 161 to 164;Twiddle factor read-write cell 15 is connected to twiddle factor storage unit 13 and each butterfly processing element 161 To 164.
Wherein, two data storage cells 11 and 12, two in present embodiment data storage cell can be random Access memory (Random Access Memory, referred to as " RAM "), two RAM are respectively used to uniformly deposit multiple butterfly fortune Calculate N number of input data of unit 161 to 164 and N number of output data;Wherein, N=2k, k >=3 and k are integer.Reading and writing data list Unit 14 is used for reading N number of input data one by one according to the address sequence producing, and will read one by one according to the address sequence producing N number of input data sequentially input multiple butterfly processing elements 161 to 164.Additionally, date read-write cell 14 is additionally operable to deposit one by one Store up N number of output data.
Specifically, before starting to calculate, need the data that will be used for required for butterfly processing element 161 to 164 calculating, Import in a data storage cell 11, for example, import in data storage cell 11, when the enable signal setting of fft processor After height, data storage cell 11 produces suitable address sequence according to residing series, and this address sequence is in data storage In unit 11, corresponding data can be read by date read-write cell 14 as input data, and carries out further FFT computing, After calculating finishes, date read-write cell 14, according to storage address, this output data is stored in data storage cell 12, and full Sufficient storage address is consistent with the reading address in data storage cell 11, that is, participate in the reading address of data and the calculating calculating The storage address finishing data keeps identical.
It should be noted that date read-write cell 14 can read the data of input from data storage cell 11, also may be used To write output data to data storage cell 11, such operation can also be done for data storage cell 12 in the same manner, data is deposited Storage unit to reduce consumed resource in the form of ping-pong ram, i.e. data storage cell 11 or 12, both can be used as input The memory cell of data is it is also possible to memory cell as output data, and data is uniformly to be stored in data storage cell 11st, in 12 it can be understood as, the address correspondent equal that input or output data are deposited in data storage cell 11 or 12.
Fft processor in present embodiment adopts the algorithm of base 2, and the computing that N supports for fft processor is counted, wherein, N=2k, k >=3 and k are integer;Then the minimum of a value of N is 8, the minimum FFT computing supporting at 8 points is described, and 2-base algorithm is a total of K level computing.The value of k setting is different, and that is, FFT computing points N is different, and that is, computing series is also different.Wherein, every one-level is transported Calculate, be required for date read-write cell 14 and read data from one of data storage cell, and will butterfly unit after calculating Output data stores in another data storage cell, and when carrying out next stage computing, date read-write cell 14 is from upper level It is stored in reading data in the data storage cell of data, after calculating, the output data of butterfly unit stores upper level receive data According to data storage cell in.For example, date read-write cell 14 reads data from data storage cell 11, and will calculate queen butterfly The output data of shape unit stores in data storage cell 12, and when carrying out next stage computing, date read-write cell 14 is from number According to reading data in memory cell 12, after calculating, the output data of butterfly unit stores in data storage cell 11.
Additionally, the parameter in fft processor calculating process, can be configured by user.
In present embodiment, twiddle factor storage unit 13 is used for depositing N/2 twiddle factor.
Before starting to calculate, need to import the twiddle factor table being used for required for butterfly processing element 161 to 164 calculating To in corresponding twiddle factor storage unit 13, more particularly, according toCan know For could support up 1024 points of fft processors, the twiddle factor required for it all can be converted in roadWherein N Span is 0-511, and 16 bit wides (bit) that real part and imaginary part are done signed number quantify, and the result of quantization is stored respectively In twiddle factor storage unit 13.For each two input term signal it is only necessary to a twiddle factor computing draws output Data, therefore, for the fft processor supporting N point, needs N/2 twiddle factor.
In general, twiddle factor storage unit 13 is segmented into two pieces, with one of memory cell storage rotation because The real part of son, stores the imaginary part of twiddle factor with another piece of memory cell, and the data of each identical address corresponds.So not It is limited to this, in actual applications, it is possible to use the high-low-position of this twiddle factor storage unit 13 stores twiddle factor respectively Real part and imaginary part.Additionally, twiddle factor can be stored in twiddle factor storage unit 13, in computing in table form in advance During when needing to use twiddle factor, from import in advance read in the twiddle factor form twiddle factor storage unit 13 right The twiddle factor answered, is further calculated.
Further, twiddle factor read-write cell 15 be used for according to produce address sequence read one by one N/2 rotate because Son, and by according to the N/2 twiddle factor reading one by one of address sequence producing sequentially input multiple butterfly processing elements 161 to 164.
Specifically, when needing to use twiddle factor in calculating process, by the twiddle factor address signal that is given and Read to enable, reading the value of twiddle factor, twiddle factor address each two mechanical periodicity once, leads to twiddle factor read-write cell 15 Cross control logic to realize.The N/ that twiddle factor read-write cell 15 will read one by one according to the address sequence of the twiddle factor producing 2 twiddle factors, and sequentially input multiple butterfly processing elements 161 to 164, further calculated.
Additionally, N number of input data of butterfly processing element 161 to 164 by date read-write cell 14 from data storage cell Read in 11 or 12 and get, each butterfly processing element, such as butterfly processing element 161 need 2 cycles to obtain input data, And N/2 twiddle factor is read by twiddle factor read-write cell 15 and writes butterfly processing element 161, and N/2 twiddle factor Real part and imaginary part be stored in twiddle factor storage unit 13 by high-low-position respectively, can obtain with N number of input data simultaneously.Its In, in present embodiment, the required cycle referred specifically to for the time cycle.
For a butterfly processing element 161, need to do corresponding computing inside it, and obtain output data, such as formula And (2) (1):
Wherein, x1 (k) and x2 (k) is respectively input data,For twiddle factor, x (k) and x (k+N/2) is through butterfly The output data of shape arithmetic element 161.
Then the calculating process of whole butterfly processing element is as shown in Fig. 2 its output data is respectively:X (k) and x (k+N/ 2).Each output data is write the form of real part and imaginary part, such as formula (3) and (4):
Out1=(xa+xbxc-ybyc)+(ya+xbyc+xcyb)j (3)
Out2=(xa-xbxc+ybyc)+(ya-xbyc-xcyb)j (4)
Wherein xa, xb, xc are respectively x1 (k), x 2 (k),Real part, ya, yb, yc be x1 (k), x2 (k),Void Portion.
It is noted that for each butterfly processing element, such as butterfly processing element 161, including 1 multiplier and 2 Individual adder, the real part of 2 output out1 and out2 and imaginary part all calculate and finish 7 cycles of needs.Calculating process is:
A cycle, multiplier calculates xb*xc, and result is designated as mul_out;
Second period, No. 1 adder calculates xa+xb*xc, and No. 2 adders calculate xa-xb*xc, and multiplier calculates Yb*yc, result is equally stored in mul_out;
In 3rd cycle, No. 1 adder calculates xa+xb*xc-yb*yc, here it is the real part of first output out1.With When, No. 2 adders calculate xa-xb*xc+yb*yc, here it is the real part of second output out2, multiplier calculating xb*yc;
In 4th cycle, No. 1 adder calculates ya+xb*yc, and No. 2 adders calculate ya-xb*yc, and multiplier calculates xc* yb;
In 5th cycle, No. 1 adder calculates ya+xb*yc+xc*yb, and the imaginary part of now out1 output calculates and completes, No. 2 Adder calculates ya-xb*yc-xc*yb, and the imaginary part of now out2 output calculates and completes.
The real part that we can see that 2 output out1 and out2 from above-mentioned analysis calculates simultaneously and completes, and is calculating After the completion of, need the output data of butterfly processing element to be stored in data storage cell 11 or 12 by date read-write cell 14 In, and a cycle of data storage cell 11 or 12 can be only written a data, therefore can be by the 3rd cycle and the 5th In the individual cycle, a bat delay is made in the output of No. 2 adders, so can obtain exporting the reality of out1 when the 3rd cycle Portion, obtain when the 4th cycle export out2 real part, obtain when the 5th cycle export out1 imaginary part, the 6th When cycle, obtain exporting the imaginary part of out2, thus meeting the memory requirement of data storage cell 11 or 12.
Because result of calculation is 17 data, and data storage cell 11 or 12 is only capable of storing 16 it is therefore desirable to meter Calculate result and make cut position and process, cast out minimum 1, that is, be equivalent to output result divided by 2, for every first-level outcome all carry out as This operation, k level altogether, therefore final result is reduced N times, but due between them relative size unaffected, because This still can determine frequency by last spectrum distribution.
Further, since the write address of output data needs to be consistent with the reading address of input data, so for each Butterfly processing element 161, the address that we will cache 2 input datas finishes until calculating, and the place of sole exception is Afterbody, afterbody needs output data sequence once to be arranged to obtain correct storage order again, for example right For 1024 points of fft processor, the input address of last 1 grade of first butterfly processing element are 0,1, and OPADD should Should be for 0,512 (k and k+N/2 always occurs in pairs) it is therefore desirable to add the extra judgement of one-level correct to reach output sequence The purpose of order.
It is noted that the number of butterfly processing element is 4.
Specifically, in present embodiment, the whole butterfly computation part of fft processor is the folded of 4 butterfly processing elements Plus, each butterfly processing element takes out two numbers from data storage cell 11 or 12 and is calculated, if by butterfly computation list Unit 161 to 164 array participates in computing, then can find when the 4th butterfly processing element is started working, first butterfly computation list Unit's work finishes, and can begin preparing for peeking, 4 butterfly processing elements can meet from data storage cell 11 or 12 next time In continuous peek successively requirement it is possible to by the pattern with 4 butterfly processing element circulation work, to complete every one-level Required butterfly computation.The circuit area of this kind of mode is minimum.Wherein, the computing of every one-level is all by N/2 butterfly computation list Unit's composition.
Additionally, the whole butterfly computation part of fft processor is the superposition of 4 butterfly processing elements in present embodiment, Then every 8 cycles are that (each butterfly processing element needs 2 cycles to complete the reading of input data to one cycle, then 4 successively The butterfly processing element of work needs 8 cycles), after carrying out N/8 circulation, N/2 butterfly computation in one-level calculates and finishes, Enter row operation to obtain last output data due to also needing to 7 cycles after last arithmetic element reads data, because The time of this every one-level could be arranged to N (reading the time of data)+7 (finally read and finish the time needing computing)+1 and (reads ground Location and the bat reading delay between data)=N+8 the cycle.
According to above analyzing, for supporting 1024 points of fft processor, one-level computing needs 1032 cycles, meter Number device to represent this 1032 cycles from 0-1031, and the wherein 0-1023 cycle is reading letter from data storage cell 11 or 12 Number, now reading to enable is height, and for 4 butterfly processing element cycle calculations patterns, every 8 cycles are a circulation, therefore So every 8 cycles just may occur in which once, and other input enables and all postpones a cycle successively, so just obtains successively 48 whole inputs of butterfly processing element enable.
And for output data, according to analysis above, a butterfly processing element output is divided into real part and imaginary part, Real part output enables and imaginary part output enables and all continues 2 cycles, and butterfly processing element input exactly needs 2 cycles to read Fetch data, the result of therefore next butterfly computation is just caing be compared to out 2 cycles in result evening of a upper butterfly processing element, because , for generally speaking, output result is constantly in effective status for this, and for real part output and imaginary part, every 8 cycles are one Individual circulation, sequentially inputs first, second, the 3rd, the operating structure of the 4th butterfly processing element, until this wheel calculates Finish.
It should be noted that in present embodiment, using the pattern of 4 butterfly processing element circulation work, then 4 butterflies Arithmetic element as minimum basic processing unit, and due to every 2 points of input data, can need a butterfly computation list Unit, therefore, minimum can support the fft processor computing that N is at 8 points, in conjunction with N=2kIt will be understood that k is >=3 integer.
Present embodiment relatively with prior art for, its main difference and effect are:Data uniformly leaves two in In individual data storage cell, for the input data of different points, can be read out with same rule, therefore permissible Realize supporting the computing of multiple spot.Date read-write cell reads input data one by one according to the address sequence producing, and will be according to product The input data that raw address sequence reads one by one sequentially inputs multiple butterfly processing elements, and date read-write cell can be one by one Storage input, the data of output, in the same time, have an input and an output data, therefore it may only be necessary to two data are deposited Storage unit carries out the storage of data, can save circuit area.
It is noted that involved each module in present embodiment is logic module, in actual applications, one Individual logical block can be a part for a physical location or a physical location, can also be with multiple physics lists The combination of unit is realized.Additionally, for the innovative part projecting the present invention, will not be with solution institute of the present invention in present embodiment The unit that the technical problem relation of proposition is less close introduces, but this is not intended that in present embodiment there are not other lists Unit.
Second embodiment of the present invention is related to a kind of fft processor.Second embodiment is entering of first embodiment One-step optimization, is in place of main optimization:In second embodiment of the invention, the value of k is k≤10, and each input data Storage address incremented by successively.Each data storage cell includes 1024 addresses, works as k=10, N=210When=1024, each input Data is deposited successively;When k≤9, the address gaps of each adjacent input data are equal.It is known that additionally, value according to k Different, it is possible to achieve to support the fft processor of 8-1024 point, that is, under not changing existing equipment hardware environment, for example, do not change In the case of the spatial content of data storage or address signal bit wide, it is possible to achieve support the FFT computing of maximum number of points scope.
Specifically, during the FFT calculating low spot number it would be desirable in the data write data storage cell calculating Address is not continuous, for example, for 512 points of fft processors, in data write data storage cell address be 0,2,4, 6 ... 1022 such saltus steps, and for 256 points of FFT, the address that data writes in data storage cell is then 0,4,8, 16...1020 so change, the core concept that they meet is that data is uniformly occupied whole address space by needs, and not It is in a certain piece continuously writing in address space, for the concrete restriction uniformly depositing data mode so that data is with same One rule is read, thus realizing supporting multiple spot FFT computing.
It should be noted that present embodiment not only supports that maximum 1024 points of fft processor calculates, can also support The computing of the fft processor more than 1024 points.If necessary to support the fft processor computing of higher points, need only to change It is only to define higher k series, and the address signal of bigger data storage cell and bigger bit.
Present embodiment relatively with prior art for, its main difference and effect are:Do not changing existing equipment Under hardware environment, for example, not in the case of the spatial content of change data memory or address signal bit wide, it is possible to achieve support The FFT computing of maximum number of points scope.
Third embodiment of the invention is related to a kind of FFT operation method, as shown in figure 3, including:
Step 301:Date read-write cell uniformly leaves the input data from external reception in data storage cell.
Specifically, date read-write cell will equably leave one of number in from N number of input data of external reception According in memory cell.The numerical value of N is the points that in present embodiment, fft processor can be supported, N can not change existing setting In the case of standby, in the maximum magnitude allowing points, carry out value.And, in data storage cell, import FFT to be carried out in advance N number of input data of computing, N number of input data uniformly leaves in a data storage cell, wherein, is uniformly distributed and can manage Xie Wei, needs data storage address gaps to be in the data store the same, thus ensureing that data is read with identical rule Take, realize supporting the computing of multiple spot.
Step 302:Twiddle factor read-write cell will leave twiddle factor storage unit in from the twiddle factor of external reception.
Specifically, twiddle factor read-write cell by from N/2 twiddle factor of external reception leave in described rotation because Sub- memory cell.Because, in FFT calculating process, each two input data is used in conjunction with a twiddle factor, if input number According to for N number of, then need N/2 twiddle factor, just can carry out FFT computing.And, N/2 twiddle factor is with twiddle factor The form of table is pre-deposited in twiddle factor storage unit by twiddle factor read-write cell.
It should be noted that not having strict logical order between step 301 and step 302, order can be carried out Exchange, date read-write cell leaves input data in data storage cell and places the data in rotation with twiddle factor read-write cell Before and after the execution sequence of transposon memory cell, the result of FFT computing can't be caused any impact.
Step 303:Date read-write cell reads input data one by one, and is sequentially input butterfly processing element.
Specifically, date read-write cell reads N number of input data one by one according to the address sequence producing, and will be according to product N number of input data that raw address sequence reads one by one sequentially inputs multiple butterfly processing elements.Date read-write cell is from wherein Obtain N number of input data in one data storage cell, and the N number of input data obtaining is stored in multiple butterfly processing elements. Wherein, basic butterfly processing element is 4 butterfly processing elements.Because date read-write cell reads and writes data one by one, therefore, same One time, only one of which input and an output data, then only need to two data storage cells and carry out data storage, save electricity Road area occupied.
Step 304:Twiddle factor read-write cell reads twiddle factor one by one, and is sequentially input butterfly processing element.
Specifically, twiddle factor read-write cell reads N/2 twiddle factor one by one according to the address sequence producing, and will The N/2 twiddle factor that address sequence according to producing reads one by one sequentially inputs multiple butterfly processing elements.By rotation because Sub- read-write cell, in twiddle factor storage unit, reads out N/2 twiddle factor, and by N/2 read out twiddle factor Input multiple butterfly processing elements.
It should be noted that not having strict logical order between step 303 and step 304, order can be carried out Exchange, date read-write cell reads input data and is stored in butterfly processing element and twiddle factor read-write cell reading twiddle factor And it is stored in the execution sequence of butterfly processing element successively, the result of FFT computing can't be caused any impact.
Step 305:Butterfly processing element computing obtains output data.
Specifically, each butterfly processing element obtains each output according to each input data receiving and each twiddle factor computing Data.By the use of 4 butterfly processing elements as a basic unit circulation, by date read-write cell and twiddle factor read-write Unit do not stop to read from data storage cell and twiddle factor storage unit and write data basic structure, input data with Twiddle factor is constantly selected, and is subsequently calculated, subsequent output data, only only needs to the ground to the write that every one-level reads Location is changed accordingly, just can complete whole FFT arithmetic operation by not stopping multiplexing basic processing unit, and reduce The idle stand-by period.
Step 306:Date read-write cell stores output data to data storage cell.
Specifically, date read-write cell stores each output data one by one to another data storage cell.Wherein, walk It is stored in the data ground of another data storage cell in the data address reading in the data store in rapid 301 and this step Location, needs to be consistent, and could facilitate and read-write data is operated.
Step 307:Counter records previous cycle series.
Specifically, when often completing an output data storage, the automatic record of counter currently follows date read-write cell Ring series.Wherein, the initial value of counter is 0, then when representing initial computing, recurring series is 0, often completes once to export number According to during storage, counter adds one automatically, and the result after Jia is saved in counter again.
Step 308:Judge whether previous cycle series is equal with k value.
Specifically, judge whether the currency preserving in counter is equal to k value, if equal to, then enter step 309 In, if entering step 303.If the value in counter is identical with k value, illustrate to have completed whole FFT fortune Calculate, and enter in step 309.Otherwise, illustrate that previous cycle series is less than k value, also do not complete k level loop computation, then enter In step 303, reacquire input data and the twiddle factor of next stage, and they are inputted butterfly processing element, until complete Till becoming the computing of all k levels.
Wherein, each output data is as each input data of next stage computing, and in step 301 and step 302, by In pre-depositing data storage cell and twiddle factor storage unit respectively from the input data of external reception and twiddle factor, because This, do not enter in k level cycle calculations.
Step 309:Empty counter.
Specifically, when recurring series is equal to k value, that is, have been completed k level computing, then empty the value of counter, In next FFT computing, again count, that is, recalculate computing series.
2 identical data storage cells are contained, two data storage cells both can store input number in present embodiment According to it is also possible to store output data, 2 data storage cells carry out the storage of data in the form of ping-pong ram.It is appreciated that For, for the FFT computing of every one-level, need to obtain input data from first data storage cell, and by the knot after calculating Fruit exports second data storage cell, and the FFT computing of next stage just will obtain input from second data storage cell Data, and the output data first after calculating is stored in data storage cell.
Wherein, the selection of the value of k can determine the points that FFT computing can be supported, and FFT computing runs k level altogether and follows Ring.
Present embodiment relatively with prior art for, its main difference and effect are:Data uniformly leaves two in In individual data storage cell, for the input data of different points, can be read out, therefore using same reading rule Can realize supporting the computing of multiple spot.Date read-write cell reads input data one by one according to the address sequence producing, and will be by Sequentially input multiple butterfly processing elements according to the input data that reads one by one of address sequence producing, and by each butterfly processing element Output data store one by one, that is, in the same time, have an input and an output data, therefore it may only be necessary to two data Memory cell carries out the storage of data, can save circuit area.
It is seen that, present embodiment is the embodiment of the method corresponding with first embodiment, and present embodiment can be with First embodiment is worked in coordination enforcement.The relevant technical details mentioned in first embodiment still have in the present embodiment Effect, in order to reduce repetition, repeats no more here.Correspondingly, the relevant technical details mentioned in present embodiment are also applicable in In first embodiment.
Four embodiment of the invention is related to a kind of FFT operation method.4th embodiment is that the 3rd embodiment enters one Step optimizes, and is in place of main optimization:In four embodiment of the invention, there is provided a kind of reading address of input data Producing method;That is, for i-stage computing, wherein i=0,1 ..., k, date read-write cell according to produce address sequence by In the N number of input data of individual reading, the producing method of the reading address of each input data that present embodiment provides can ensure that reading The correctness fetched data.Wherein, step 303 date read-write cell in the 3rd embodiment is read in N number of input data one by one, The flow chart of each input data address producing method, as shown in figure 4, include:
Step 401:Obtain an input data corresponding counter binary system ordered series of numbers.
Specifically, obtain each input data corresponding counter binary system ordered series of numbers.Obtained by date read-write cell Input data is metric data, after getting metric input data, metric input data is converted into two The input data of system.For example, the metric input data address of acquisition is " 1 ".Because in the present embodiment, k takes Value is less than or equal to 10, if the value of k is 10, i.e. N=1024.After the decimal data address obtaining is " 1 " corresponding conversion Binary system data address be " 0000000001 ".
Step 402:Will be inverted for the last i+1 position in counter binary system ordered series of numbers.
Specifically, i is the 0th grade, then this step is passed through in the binary system data address obtaining in step 401 Afterwards, it is output as " 0000000001 ";
I is the 1st grade, then the binary system data address obtaining in step 401, after this step, is output as “0000000010”
Step 403:Whole data after will be inverted for last i+1 position is inverted, using the reading ground as each input data Location.
Specifically, i is the 0th grade, then this step is passed through in the binary system data address obtaining in step 402 Afterwards, the binary data address of output is " 1000000000 ", and corresponding metric data address is " 512 ";
For i be the 1st grade, then in step 402 obtain binary system data address after this step, the two of output Binary data address is " 0100000000 ", and corresponding metric data address is " 256 ".
As can be seen that reading the selection rule of address:For kth level, input data address sequence is counter binary number Whole data is changed into reciprocal after taking the inverse of last k+1 position by row again.
Taking 1024 points of FFT computings as a example, we first pass through and make a counter and count down to 1023 from 0, to this sequence Carry out corresponding reading address conversion, the sequence converting then means to send into first butterfly fortune for 0,512,256,768 ... Calculate the data that the data in unit is in address 0 and address 512, sending into the data in second butterfly processing element is address 256 and address 768 in data, the like, data is sequentially sent in butterfly processing element according to this rule, and is calculating It is removed successively after finishing.
Present embodiment relatively with prior art for, its main difference and effect are:Address data memory is entered Row reasonably distributes it is ensured that completing correct FFT computing.
Fifth embodiment of the invention is related to a kind of FFT operation method.5th embodiment is that the 3rd embodiment enters one Step optimizes, and is in place of main optimization:In fifth embodiment of the invention, for i-stage computing, read and write single in twiddle factor Unit reads in N/2 twiddle factor according to the address sequence producing one by one, the generation of the reading address sequence of N/2 twiddle factor Mode.Wherein, according to the address sequence producing, N/ is read one by one to step 304 twiddle factor read-write cell in the 3rd embodiment In 2 twiddle factors, the flow chart of each twiddle factor address producing method, as shown in figure 5, include:
Step 501:Generate counting sequence.
Specifically, generate a counting sequence, counting sequence is expressed as:0、1、2、3、……、2i-1.It is appreciated that For the 0th grade, that is, during i=0, counting sequence is 0,0,0 ...
For the 1st grade, counting sequence is 0,1,0,1 ...;
For the 2nd grade, counting sequence is 0,1,2,3,0,1,2,3 ...;
For i-stage, counting sequence is 0,1,2,3 ... 2i-1、0、1、2、3…
Step 502:Read address sequence using inverted for counting sequence as twiddle factor.
Specifically, by counting sequence 0,1,2,3 ..., 2i- 1 inverted after be expressed as:0th, 512,256,768 ..., Using the reading address sequence as N/2 twiddle factor.For example, for the 2nd grade, counting sequence is 0,1,2,3,0,1,2,3 ..., It is expressed as after inverted:0、512、256、768、0、512、256、768…
The principle that mode is chosen in the address of twiddle factor isTherefore for the selection required for every one-level Twiddle factor value, all can be converted intoWherein the span of k is 0-511, so only needs to the storage list of one piece of 1KB Unit, then can be stored in all of twiddle factor, simultaneously the change according to address, read suitable twiddle factor and send into butterfly computation list Row operation is entered by unit., and converted them in the value of every one-level by analyzing twiddle factor in fft algorithmSimultaneously by k As reading the address selecting twiddle factor to be used from memory cell, address sequence meets certain rule, rule is carried out Summary can obtain the address producing method introduced as present embodiment.
The step of various methods divides above, is intended merely to describe clear, can merge into when realizing a step or Some steps are split, is decomposed into multiple steps, as long as including identical logical relation, all in the protection domain of this patent Interior;To adding inessential modification in algorithm or in flow process or introducing inessential design, but do not change its algorithm With the core design of flow process all in the protection domain of this patent.
It will be appreciated by those skilled in the art that all or part of step realized in above-described embodiment method can be by Program to complete come the hardware to instruct correlation, and this program storage, in a storage medium, includes some instructions use so that one Individual equipment (can be single-chip microcomputer, chip etc.) or processor (processor) execute each embodiment methods described of the application All or part of step.And aforesaid storage medium includes:USB flash disk, portable hard drive, read-only storage (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disc or CD etc. are various can store journey The medium of sequence code.
It will be understood by those skilled in the art that the respective embodiments described above are to realize the specific embodiment of the present invention, And in actual applications, can to it, various changes can be made in the form and details, without departing from the spirit and scope of the present invention.

Claims (8)

1. a kind of fft processor is it is characterised in that include:Two data storage cells, twiddle factor storage unit, Duo Gedie Shape arithmetic element, date read-write cell and twiddle factor read-write cell;
Described date read-write cell is connected to described two data storage cells and each butterfly processing element;Described two data are deposited Storage unit is respectively used to uniformly deposit N number of input data of the plurality of butterfly processing element and N number of output data;Wherein, N= 2k, k >=3 and k are integer;
Described twiddle factor read-write cell is connected to described twiddle factor storage unit and described each butterfly processing element;Described rotation Transposon memory cell is used for depositing N/2 twiddle factor;
Wherein, described date read-write cell is used for reading described N number of input data one by one, and described N number of defeated by read one by one Enter data and sequentially input the plurality of butterfly processing element;Described twiddle factor read-write cell is used for reading described N/2 one by one Twiddle factor, and the described N/2 twiddle factor reading one by one is sequentially input the plurality of butterfly processing element;Described data Read-write cell is additionally operable to store described N number of output data one by one.
2. fft processor according to claim 1 is it is characterised in that the number of described butterfly processing element is 4.
3. fft processor according to claim 1 is it is characterised in that each butterfly processing element includes 1 multiplier and 2 Individual adder;Each butterfly processing element is used for realizing base 2 butterfly computation.
4. fft processor according to claim 1 is it is characterised in that the value of described k is k≤10.
5. fft processor according to claim 4 is it is characterised in that the storage address incremented by successively of each input data;
Each data storage cell includes 1024 addresses, works as k=10, N=210When=1024, each input data is deposited successively;Work as k When≤9, the address gaps of each adjacent input data are equal.
6. a kind of FFT operation method is it is characterised in that be applied to the fft processor described in any one in claim 1 to 5, Described FFT operation method includes:
Described date read-write cell will equably leave one of data storage in from described N number of input data of external reception In unit;
Described twiddle factor read-write cell will leave described twiddle factor storage in from the described N/2 twiddle factor of external reception Unit;
Described date read-write cell reads described N number of input data one by one, and by the described N number of input data reading one by one successively Input the plurality of butterfly processing element;
Described twiddle factor read-write cell reads described N/2 twiddle factor one by one, and described N/2 read one by one is rotated The factor sequentially inputs the plurality of butterfly processing element;
Each butterfly processing element obtains each output data according to each input data receiving and each twiddle factor computing;
Described date read-write cell stores described each output data one by one to another data storage cell;
Wherein, described each output data is as each input data of next stage computing, and carries out k level loop computation.
7. FFT operation method according to claim 6 is it is characterised in that for i-stage computing, wherein i=0, and 1 ..., K, reads in described N number of input data in described date read-write cell one by one, the producing method of the reading address of each input data Including:
Obtain each input data corresponding counter binary system ordered series of numbers;
Will be inverted for the last i+1 position in described counter binary system ordered series of numbers;
Whole data after will be inverted for above-mentioned last i+1 position is inverted, using the reading address as described each input data.
8. FFT operation method according to claim 6 is it is characterised in that for i-stage computing, in described twiddle factor Read-write cell reads in described N/2 twiddle factor one by one, the producing method of the reading address sequence of described N/2 twiddle factor Including:
Generate a counting sequence, described counting sequence is expressed as:0、1、2、3、……、2i-1;
Will be inverted for described counting sequence, it is expressed as:0th, 512,256,768 ..., using the reading as described N/2 twiddle factor Address sequence.
CN201680000901.8A 2016-08-10 2016-08-10 Fft processor and operation method Active CN106415526B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2016/094465 WO2018027706A1 (en) 2016-08-10 2016-08-10 Fft processor and algorithm

Publications (2)

Publication Number Publication Date
CN106415526A true CN106415526A (en) 2017-02-15
CN106415526B CN106415526B (en) 2019-05-24

Family

ID=58087900

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201680000901.8A Active CN106415526B (en) 2016-08-10 2016-08-10 Fft processor and operation method

Country Status (2)

Country Link
CN (1) CN106415526B (en)
WO (1) WO2018027706A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108062289A (en) * 2018-01-25 2018-05-22 天津芯海创科技有限公司 Fast Fourier Transform (FFT) FFT changes sequence method, signal processing method and device in address
CN110347968A (en) * 2019-07-08 2019-10-18 河海大学常州校区 A kind of optimization fft algorithm and device based on FPGA

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108319804B (en) * 2018-04-17 2023-08-08 福州大学 8192 point base 2 DIT ASIC design method for low resource call
CN112307423B (en) * 2020-11-19 2023-09-22 天津大学 FFT processor based on base 2SDF pipeline type and implementation method thereof in ACO-OFDM system
CN113569189B (en) * 2021-07-02 2024-03-15 星思连接(上海)半导体有限公司 Fast Fourier transform calculation method and device
CN117591784B (en) * 2024-01-19 2024-05-03 武汉格蓝若智能技术股份有限公司 FPGA-based twiddle factor calculation method and FPGA chip

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101290613A (en) * 2007-04-16 2008-10-22 卓胜微电子(上海)有限公司 FFT processor data storage system and method
CN103176950A (en) * 2011-12-20 2013-06-26 中国科学院深圳先进技术研究院 Circuit and method for achieving fast Fourier transform (FFT) / inverse fast Fourier transform (IFFT)
CN103605636A (en) * 2013-12-09 2014-02-26 中国科学院微电子研究所 Device and method for realizing FFT operation
CN103970718A (en) * 2014-05-26 2014-08-06 苏州威士达信息科技有限公司 Quick Fourier transformation implementation device and method
CN104268122A (en) * 2014-09-12 2015-01-07 安徽四创电子股份有限公司 Point-changeable floating point FFT (fast Fourier transform) processor

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7047268B2 (en) * 2002-03-15 2006-05-16 Texas Instruments Incorporated Address generators for mapping arrays in bit reversed order
WO2004004265A1 (en) * 2002-06-27 2004-01-08 Samsung Electronics Co., Ltd. Modulation apparatus using mixed-radix fast fourier transform
TWI298448B (en) * 2005-05-05 2008-07-01 Ind Tech Res Inst Memory-based fast fourier transformer (fft)
CN101072218B (en) * 2007-03-01 2011-11-30 华为技术有限公司 FFT/IFFI paired processing system, device and method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101290613A (en) * 2007-04-16 2008-10-22 卓胜微电子(上海)有限公司 FFT processor data storage system and method
CN103176950A (en) * 2011-12-20 2013-06-26 中国科学院深圳先进技术研究院 Circuit and method for achieving fast Fourier transform (FFT) / inverse fast Fourier transform (IFFT)
CN103605636A (en) * 2013-12-09 2014-02-26 中国科学院微电子研究所 Device and method for realizing FFT operation
CN103970718A (en) * 2014-05-26 2014-08-06 苏州威士达信息科技有限公司 Quick Fourier transformation implementation device and method
CN104268122A (en) * 2014-09-12 2015-01-07 安徽四创电子股份有限公司 Point-changeable floating point FFT (fast Fourier transform) processor

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108062289A (en) * 2018-01-25 2018-05-22 天津芯海创科技有限公司 Fast Fourier Transform (FFT) FFT changes sequence method, signal processing method and device in address
CN108062289B (en) * 2018-01-25 2021-09-03 天津芯海创科技有限公司 Fast Fourier Transform (FFT) address order changing method, signal processing method and device
CN110347968A (en) * 2019-07-08 2019-10-18 河海大学常州校区 A kind of optimization fft algorithm and device based on FPGA
CN110347968B (en) * 2019-07-08 2023-06-13 河海大学常州校区 FPGA-based FFT optimization algorithm and device

Also Published As

Publication number Publication date
WO2018027706A1 (en) 2018-02-15
CN106415526B (en) 2019-05-24

Similar Documents

Publication Publication Date Title
CN106415526A (en) FET processor and operation method
CN110515739B (en) Deep learning neural network model load calculation method, device, equipment and medium
CN105022670B (en) Heterogeneous distributed task processing system and its processing method in a kind of cloud computing platform
Demmel et al. Avoiding communication in sparse matrix computations
Shu et al. A parallel transient stability simulation for power systems
CN104765589B (en) Grid parallel computation preprocess method based on MPI
CN103955447B (en) FFT accelerator based on DSP chip
Melab et al. A GPU-accelerated branch-and-bound algorithm for the flow-shop scheduling problem
CN106775594A (en) A kind of Sparse Matrix-Vector based on the domestic processor of Shen prestige 26010 multiplies isomery many-core implementation method
CN102135951A (en) FPGA (Field Programmable Gate Array) implementation method based on LS-SVM (Least Squares-Support Vector Machine) algorithm restructured at runtime
CN106933777B (en) The high-performance implementation method of the one-dimensional FFT of base 2 based on domestic 26010 processor of Shen prestige
Shi et al. Efficient sparse-dense matrix-matrix multiplication on GPUs using the customized sparse storage format
CN109240644A (en) A kind of local search approach and circuit for Yi Xin chip
CN109522127A (en) A kind of fluid machinery simulated program isomery accelerated method based on GPU
CN104572588B (en) Matrix inversion process method and apparatus
Wei et al. Reconstructing permutation table to improve the Tabu Search for the PFSP on GPU
CN108647007A (en) Arithmetic system and chip
CN102722472A (en) Complex matrix optimizing method
CN103493039A (en) Data processing method and related device
Kumar et al. Massively parallel simulations for disordered systems
CN113112084B (en) Training plane rear body research and development flow optimization method and device
CN108920097A (en) A kind of three-dimensional data processing method based on Laden Balance
CN102968388B (en) Data layout's method and device thereof
CN115328440A (en) General sparse matrix multiplication implementation method and device based on 2D systolic array
Giles Jacobi iteration for a Laplace discretisation on a 3D structured grid

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant