CN104679721A

CN104679721A - Operation method of FFT (Fast Fourier Transformation) processor

Info

Publication number: CN104679721A
Application number: CN201510116879.0A
Authority: CN
Inventors: 黄建喜; 刘宇波
Original assignee: CHENGDU GOLDENWAY TECHNOLOGY Co Ltd
Current assignee: Nanjing Jing Da Micro Electronics Technology Co., Ltd.
Priority date: 2015-03-17
Filing date: 2015-03-17
Publication date: 2015-06-03
Anticipated expiration: 2035-03-17
Also published as: CN104679721B

Abstract

The invention provides an operation method of an FFT (Fast Fourier Transformation) processor. The method comprises the following steps: setting multiple stages of butterfly operation in an FPGA (Field Programmable Gate Array) program, wherein all the stages share a block floating point shifting factor; performing judgment through data according to the judgment state of the previous block floating point factor before each stage of operation to decide a shifting choice during the output of a data memory; controlling a gain which is output finally through the shifting sum of all the stages. The invention provides a floating point operation method, the contradiction between a fixed point algorithm and a floating point algorithm is solved, the floating point operation efficiency is improved, and the cost is reduced.

Description

A kind of operational method of fft processor

Technical field

The present invention relates to programmable processor, particularly a kind of operational method of fft processor.

Background technology

In communication and Radar Signal Processing, FFT is a kind of conventional instrument, mostly uses FPGA to complete when rate request is higher or integrated level is higher.Most processor processes data adopts fixed-point data form, although make process structure relatively simple like this, but spillover is then relatively more serious, and adopts simple fixed point cut position can be submerged among large-signal by small-signal, makes result data lose necessary precision.Along with more and more higher to the requirement of data precision, general fixed-point algorithm cannot meet high-precision requirement, need to seek help from floating point processor to carry out computing, to avoid the overflow problem in application.External FFT Core adopts fixed-point arithmetic or block float point arithmetic mostly, and external FFT Core generally adopts and is less than 24 fixed points or is less than 24 block floating points.But floating point processor consumption of natural resource is larger, includes complicated hardware configuration (performance element of floating point), thus substantially increase design cost and power consumption, reduce counting yield.Under same processing speed, floating point processor is relatively costly, and power consumption is larger.Floating-point operation performance element can only designed, designed, also will consider the compromise etc. of operational precision, arithmetic speed, resource occupation, design complexities in design process.Therefore relative fixed-point arithmetic, floating-point operation has the shortcomings such as development difficulty is large, the R&D cycle is long, development cost is high.

Therefore, for the problems referred to above existing in correlation technique, at present effective solution is not yet proposed.

Summary of the invention

For solving the problem existing for above-mentioned prior art, the present invention proposes a kind of operational method of fft processor, comprising:

In the program of FPGA, arrange multistage butterfly computation, every one-level shares a block floating point shifted divisor; By data before every grade of computing according on block floating point factor judgement state once judge, the displacement decided when this secondary data storer exports is selected, by controlling the last gain exported to the displacement sum of every grade.

Preferably, the method also comprises:

Before FFT computing, the form arrangement of all data address in order, it is identical that every grade of butterfly computation front peek address and computing terminate rear poke address, the first order get every two number addresses and be divided into the half of always counting mutually, every one-level is that upper level is separated by the half of counting, every grade of butterfly computation is stream line operation, each clock is from memory read data, by defining a clock counter and a level counter in master routine, level counter adds certainly with the increase of progression, reset after often completing a FFT, clock counter adds certainly with each clock, reset after often completing one-level FFT, the clock number expended by clock counter being deducted butterfly computation realizes the consistance reading data and store data address, and address gaps clock counter ring shift right produced between each two data.

Preferably, the method also comprises:

In base two DIF form D FT computing, what first carry out between the raw data of the butterfly input of every one-level is simple plus and minus calculation, every grade of all data are needed shifted left/right when entering butterfly, if by all data shift rights one, so input more than data layout a decimal place with regard to making this grade export data layout relative to this grade, it is 1/2 of previous stage after every grade of computing, and be referred to as the floating-point factor, after the butterfly iteration of m level, if altogether moved to right m position, i.e. floating-point factor position m, then calculation result data amplifies 2 ^mnet result is obtained doubly.

The present invention compared to existing technology, has the following advantages:

The present invention proposes a kind of floating-point operation method, solve the contradiction between fixed-point algorithm and floating-point arithmetic, improve floating-point operation efficiency, reduce cost.

Accompanying drawing explanation

Fig. 1 is the process flow diagram of the operational method of fft processor according to the embodiment of the present invention.

Fig. 2 is the block floating point FFT structural drawing according to the embodiment of the present invention.

Fig. 3 is three block floating point factor ordo judiciorum process flow diagrams according to the embodiment of the present invention.

Embodiment

Detailed description to one or more embodiment of the present invention is hereafter provided together with the accompanying drawing of the diagram principle of the invention.Describe the present invention in conjunction with such embodiment, but the invention is not restricted to any embodiment.Scope of the present invention is only defined by the claims, and the present invention contain many substitute, amendment and equivalent.Set forth many details in the following description to provide thorough understanding of the present invention.These details are provided for exemplary purposes, and also can realize the present invention according to claims without some in these details or all details.

An aspect of of the present present invention provides a kind of operational method of fft processor.Fig. 1 is the operational method process flow diagram of the fft processor according to the embodiment of the present invention.

Original N point sequence is decomposed into two or more comparatively short data records by FFT, the DFT of the former sequence of the reconfigurable one-tenth of DFT of these short data records, and total operation times is really fewer than direct DFT many, calculated amount can be reduced greatly, thus reach the object improving arithmetic speed.Frequency domain X (k) separates by the odd, even of sequence number k by base two DIF form D FT, supposes N=2 ^m, then first time separately obtains the DFT of two N/2 points, is called the first order (Class m); Again it is decomposed respectively the DFT that can obtain four N/4 points, be called the second level (Class m-1); The like, until obtain the DFT of 2.The elementary cell of FFT computing is butterfly processing element, and the butterfly processing element arithmetic expression of base two DIF is as follows:

x′ _a+jy′ _a＝x _a+x _b+j(y _a+y _b)

x′ _b+jy′ _b＝(x _a-x _b)w _r-(y _a-y _b)w _i+j[(x _a-x _b)w _i+(y _a-y _b)w _r]

That is: x ' _a=x _a+ x _b

y′ _a＝(y _a+y _b)

x′ _b＝(x _a-x _b)w _r-(y _a-y _b)w _i

y′ _b＝(x _a-x _b)w _i+(y _a-y _b)w _r

Can draw from above formula, base two butterfly computation only needs a complex multiplication and twice complex addition, then N=2 ⁿthe DFT complex multiplication amount of individual point is by N ²secondaryly to reduce to secondary, complex addition reduces to Nlog by N (N-1) is secondary ₂n.So when counting greatly DFT computing, using FFT will reduce operand greatly, improving operation efficiency.

Binary point position in number is changeless is called fixed point system.In fixed point system, the fraction part of each bit representation number on the right of radix point, the integral part of each bit representation number in the left side.Generally for conveniently, take two kinds of methods, one is all expressed as integer all data exactly; Another kind is exactly the decimal form of numerical limits between-1.0 to+1.0.In the second approach, after radix point is fixed on first binary code, integer-bit is as " sign bit ", and what count itself only has fraction part.Second method is more conventional comparatively speaking, and in fixed-point number computing, the resultful absolute value of institute all can not more than 1.If the absolute value of number is more than 1 in calculating process, the sign bit of integral part just there will be mistake carry, is called spilling, and this is the situation cannot being avoided occurring in fixed-point algorithm, the simple cut position of general employing solves, but like this will time some small datas be submerged among large data.

A numerical representation is divided into exponent part and magnitude portion by floating-point system.Exponent is an integer having symbol, and the size of former number should be moved to obtain towards left or right in its binary point position illustrated in mantissa; There are two parts in mantissa: one one bigit (being also cited as J-bit) and a binary fraction, and J-bit does not show usually, but as default value.Four kinds of floating-point formats comprise single-precision format, expansion single-precision format, double precision formats and expansion double precision formats, and the scope of the value that it can represent wants much wide, can avoid the overflow problem in great majority application.In most of the cases, processor represents real number with normalized form.That is, except zero, mantissa is always made up of an integer and a decimal, as follows:

For the value being less than 1, mainly disappear high order zero, can become standardizing number.(often disappear a high order zero, and exponent all will subtract 1) to represent data, makes the significance bit number of the mantissa of the given width of loading maximum with normal form.In a word, the mantissa of standardizing number represents the real number between 1 to 2, and provides the position of actual radix point by exponent.

Floating-point format represents that the dynamic range of data is larger than fixed-point number, exponential part can be made to add and subtract " 1 " and carry out growth data, can keep higher precision like this to small data in computing, be conducive to protecting small data in computation process.When hardware implementing, need to arrange special floating-point operation performance element to carry out the computing of floating-point addition subtraction multiplication and division, due to exponent and the mantissa of data will be considered in computing respectively, so the structure of performance element of floating point and control all more complicated, badly influence the efficiency of floating-point operation.

Which kind of hardware processor no matter is used to carry out implementation algorithm, most processors uses fixed-point arithmetic algorithm, process data acquisition fixed-point data form, although make process structure relatively simple like this, but spillover is then more serious, and adopt simple fixed point cut position can be submerged among large-signal by small-signal, make result data lose necessary precision.Along with more and more higher to the requirement of data precision, general fixed-point algorithm cannot meet high-precision requirement, need to seek help from floating point processor to carry out computing, to avoid the overflow problem in application.But floating point processor consumption of natural resource is larger, includes complicated hardware configuration (performance element of floating point), thus substantially increase design cost and power consumption, reduce counting yield.Under same processing speed, compared with fixed-point processor, floating point processor is relatively costly, and power consumption is larger.When using ASIC device layout, due to floating-point operation performance element circuit more complicated, most eda software does not still support floating-point operation at present, floating-point operation performance element can only designed, designed, also will consider the compromise etc. of operational precision, arithmetic speed, resource occupation, design complexities in design process.Therefore relative fixed-point arithmetic, floating-point operation has the shortcomings such as development difficulty is large, the R&D cycle is long, development cost is high.

In order to solve the contradiction between fixed-point algorithm and floating-point arithmetic, the thought of the comprehensive fixed-point algorithm of the present invention and floating-point arithmetic, adopts block floating point algorithm, utilizes the method can will have same index, and the different one group of data of mantissa process as data block.

Data in computing all adopt fixed point format to represent, but will same index be had in the middle of computing, and the different one group of data of mantissa carry out moving to left or moving to right as data block, wherein move to left to adjust data precision, move to right is in order to avoid overflow error appears in fixed-point arithmetic, but like this process data will be made to occur gain, so after computing terminates again by result data divided by preset gain, just obtain correct data.The thought of Here it is block floating point algorithm, its structured flowchart is as Fig. 1.

Due to block floating point algorithm and fixed-point algorithm similar, implement more convenient, and its small data calculate time computational accuracy be far superior to fix a point cut position, be also called displacement fixed-point algorithm.It can either ensure that data meet certain precision, turn avoid the complicacy of standard floating-point operation.

Block floating point fft algorithm is based on the block of data block from gain thought, and block floating point not only adjusts the signal power of FFT input, but also carries out data point reuse according to the result of calculation of the every one-level in inside.Block floating point FFT realizes floating-point in a data block, i.e. one group of data sharing shifted divisor, and it stores with independently data field on hardware.On hardware implementing, there is less cost relative to traditional floating-point arithmetic like this, be one and well trade off.In block floating point, the shifted divisor of data block depends on the maximal value of all data in whole data block.If there is a number comparatively large in data block, this data block shares a larger factor; If data are all less in data block, this data block just shares a less factor.

On data representation, block floating point still adopts fixed-point number representation, and data-carrier store RAM bit wide is M position, and most significant digit M position is sign bit, and all the other positions are valid data position.The arithmetical unit of fixed point is adopted in calculating process, middle addition results keeps precision not cut position, overflow to prevent plus-minus and all adjudicate after every one-level computing, detect whether within the scope of effective data representation, to determine that the figure place that the butterfly computation of next stage reads data is selected.Fig. 2 is block floating point FFT structural drawing.

Due in base two DIF form D FT computing, what first carry out between the raw data (not comprising twiddle factor) of the butterfly input of every one-level is simple plus and minus calculation, so there will not be spilling again while guarantee data operation precision, every grade of all data can be needed shifted left/right several when entering butterfly.If by all data shift rights one, so input more than data layout a decimal place with regard to making this grade export data layout relative to this grade, be 1/2 of previous stage like this after every grade of computing, and be referred to as the floating-point factor, after the butterfly iteration of m level, if altogether moved to right m position, i.e. floating-point factor position m, then calculation result data should amplify 2 ^mdoubly just can obtain real result.

The precision of block floating point FFT is better than fixed point FFT computing, and when data volume increases, this performance is more obvious; The realization of block floating point is simple, and hardware spending and fixed-point arithmetic are substantially identical, complete through a special block float point arithmetic module at the data operation of every one-level.

The Main Basis of block floating point fft algorithm is exactly the block floating point factor, and the judgement proposing three block floating point factors is herein extracted, and namely detects and judges that realizing at most triple motion moves.

The specific practice of three block floating point factor judgements is after each butterfly computation of every one-level, check each result data first four, if this grade of all result data front four all identical, then all data all can be moved to left three and can not overflow, because most significant digit is-symbol position, move to left the is-symbol position of a bit loss, but because all ensuing positions are all equal with most significant digit, so displacement can not change the value of data like this; If the front three of same all data is identical all identical, then all data all can be moved to left two; If the front two of all data is identical all identical, then all data all can be moved to left one.

In the program of FPGA, 4096 point data are a data block, have 12 grades of butterfly computations, and its every one-level shares a block floating point shifted divisor.By 4096 point data before every grade of computing according on once the block floating point factor judgement state judge, decide this secondary data storer export time displacement select.The data precision of each butterfly computation of such guarantee, and finally by the displacement sum of every grade being controlled to the last gain exported.Fig. 3 is three block floating point factor ordo judiciorum process flow diagrams.

Three block floating points are judged to extract, after completing each butterfly computation, has a waiting process.Complete in this process and adjudicate the block floating point factor of this butterfly computation result data, adjudicate to every DBMS block floating point factor the structure that have employed state machine, S0, S1, S2 and S3 are the different conditions of the block floating point factor:

S0-representative data can move to left when the computing of FFT next stage zero-bit;

S1-representative data can move to left 1 when the computing of FFT next stage;

S2-representative data can move to left 2 when the computing of FFT next stage;

S3-representative data can move to left 3 when the computing of FFT next stage.

When every one-level of FFT computing starts, the block floating point factor is set to S3 state, in butterfly operation module, takes veto by one vote system, when detect data front four not identical, just the state of the block floating point factor is set to S2; There is the front three of data not identical equally, the state of the block floating point factor is just S1, the like, when the block floating point factor one group of data being detected is S0 state, this grade of remaining butterfly computation just no longer carries out the judgement of the block floating point factor, because this level does not meet block floating point shifted condition.If the block floating point factor judged from S2 state in every one-level of FFT computing, be exactly that two block floating points judge to extract, be exactly that a block floating point judges to extract from S1 state.Three block floating point factors judge extraction process can in each butterfly computation two clock period more than fixed point FFT, a block floating point factor then has identical processing speed with fixed point FFT process.

Judgement is extracted in order to improve the block floating point factor further, improve its extraction rate, have employed parallel processing, after each butterfly operation module completes butterfly computation, simultaneously to first four of four data, front three, front two compare, finally compare result, the block floating point butterfly computation of three or more multidigit will be made like this than fixed point butterfly computation many clock period, and the sacrifice done like this is the consumption of processor resource.

After all level FFT computings of 4096 point data complete, realize adding up to every one-level shift count by displacement summing elements, to provide this group data block Floating FFT computing terminating rear total shift count, the gain of end product can be determined thus.

The block floating point factor

Front four

First three

Front two

Block floating point

Original state	Position is identical	Position is identical	Position is identical	State of factors
					S3	Y	X	X	S3
S3	N	Y	X	S2
					S3	N	N	Y	S1
S3	N	N	N	S0
					S2	X	Y	X	S2
S2	X	N	Y	S1
					S2	X	N	N	S0
S1	X	X	Y	S1
					S1	X	X	N	S0
S0	X	X	X	S0

Three block floating point factors judge to extract truth table

* Y representative is set up; N representative is false; X representative arbitrarily

Before FFT computing, the form arrangement of all data address in order, it is identical that every grade of butterfly computation front peek address and computing terminate rear poke address, so can unify address time delay during reading to poke; In addition the first order get every two number addresses and be divided into the half of always counting mutually, the second level is then that upper level is separated by the half of counting, the like.Because every grade of butterfly computation is stream line operation, each clock will from memory read data, if address of fetching data all uses special register to store, arithmetic speed and storage resources will be affected, here we by defining a clock counter and a level counter in master routine, level counter adds certainly with the increase of progression, reset after often completing a FFT, clock counter adds certainly with each clock, reset after often completing one-level FFT, the clock number expended by clock counter being deducted butterfly computation realizes the consistance reading data and store data address, clock counter ring shift right can be produced the address gaps between each two data in addition.

After one group of all level butterfly of data completes, exporting data address is no longer original positive sequence, and this owing to doing produced to x (n) in calculating process odd, even separating.Such as the FFT of 8 base two DIF forms, its first order input data address is positive sequence: 0,1,2,3,4,5,6,7.It is inverted sequence that afterbody data export data address: 0,4,2,6,1,5,3,7.In order to obtain finally correct output data, inverted sequence must be become positive sequence and exporting, by binary code bit reversal, inverted sequence can be become positive sequence.In a program, the address of data is all by binary number representation, inverted sequence 0,4,2,6,1,5,3,7 are expressed as 000 by triad number, 100,010 respectively, 110,001,101,011,111, by the 2nd of every number the and the 0th exchange, 1st maintenance is motionless, can obtain 000,001,010,011,100,101,110,111, namely 0,1,2,3,4,5,6,7, become positive sequence by inverted sequence.For the FFT that other are counted, if data address is by n bit representation, the rule of bit reversal is: (n-1)th and the 0th exchange, the n-th-2 and the 1st exchange, the n-th-3 and the 2nd exchange ..., the rest may be inferred just can be converted to positive sequence by inverted sequence.

In sum, the present invention proposes a kind of floating-point operation method, solve the contradiction between fixed-point algorithm and floating-point arithmetic, improve floating-point operation efficiency, reduce cost.

Obviously, it should be appreciated by those skilled in the art, above-mentioned of the present invention each module or each step can realize with general computing system, they can concentrate on single computing system, or be distributed on network that multiple computing system forms, alternatively, they can realize with the executable program code of computing system, thus, they can be stored and be performed by computing system within the storage system.Like this, the present invention is not restricted to any specific hardware and software combination.

Should be understood that, above-mentioned embodiment of the present invention only for exemplary illustration or explain principle of the present invention, and is not construed as limiting the invention.Therefore, any amendment made when without departing from the spirit and scope of the present invention, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.In addition, claims of the present invention be intended to contain fall into claims scope and border or this scope and border equivalents in whole change and modification.

Claims

1. an operational method for fft processor, is characterized in that, comprising:

2. method according to claim 1, is characterized in that, the method also comprises:

3. method according to claim 2, is characterized in that, the method also comprises: