CN112256236A

CN112256236A - FFT circuit based on approximate constant complex multiplier and implementation method

Info

Publication number: CN112256236A
Application number: CN202011200130.1A
Authority: CN
Inventors: 单伟伟; 朱励轩
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2020-10-30
Filing date: 2020-10-30
Publication date: 2021-01-22

Abstract

The invention discloses an FFT circuit based on an approximate constant complex multiplier and an implementation method, belonging to the field of signal processing and integrated circuit design. The invention utilizes the symmetric mapping characteristic of the twiddle factors to compress the number of the twiddle factors, then carries out CSD coding expression on the twiddle factors in the FFT algorithm after compression, and finally carries out approximate processing on the CSD coding of the twiddle factors, thereby increasing the public part of the twiddle factors during complex multiplication. Compared with the traditional fast Fourier transform circuit, the fast Fourier transform circuit has the advantages that under the condition of tolerating certain errors, the area and the calculation power consumption of the fast Fourier transform circuit are effectively reduced, and the energy efficiency of the circuit is improved.

Description

FFT circuit based on approximate constant complex multiplier and implementation method

Technical Field

The invention discloses a fast Fourier transform circuit based on an approximate constant complex multiplier, belonging to the technical field of signal processing and integrated circuit design.

Background

Fast Fourier Transform (FFT) is a fast algorithm for fourier transform, and is widely used in the field of signal processing, such as signal modulation and demodulation, voice feature extraction, and the like. The fast Fourier transform is used as a method for transforming a signal from a time domain to a frequency domain, and the hardware implementation mainly comprises butterfly operation and twiddle factor multiplication operation. The fast Fourier transform circuit is generally divided into two types of implementation methods, namely a serial FFT algorithm and a parallel FFT algorithm, wherein the two types of FFT algorithms basically consist of a butterfly operation module and a twiddle factor product module. The butterflies in the FFT algorithm are essentially a set of complex additions, while the twiddle factor product module is usually synthesized into a set of general complex multipliers, consisting of 4 constant multipliers. Therefore, the twiddle factor product module occupies a large amount of calculation amount and chip area in the FFT chip implementation, and the optimization of the constant multipliers can effectively reduce the area and the power consumption of the FFT processor.

Conventional constant multipliers have many optimization algorithms, such as digital coding: the multiplier is first converted into corresponding code and then calculated. Such algorithms include regular signed number encoding, booth encoding. Common sub-expression elimination: the multiplier is changed into corresponding code by using a digital coding algorithm, then a series of identical sub-expressions are found in the code, and then the sub-expressions and other multipliers are decomposed. However, these methods are less hardware intensive to implement without much requirement on the accuracy of the results. The invention provides a fast Fourier transform circuit using an approximate constant complex multiplier. By reasonably approximately modifying the rotation factor and increasing the number of the common sub-expressions in the multi-constant multiplier, the data multiplexing can be increased, the adders in the circuit can be reduced, and the purposes of reducing the area and the power consumption can be achieved.

Disclosure of Invention

The present invention is directed to overcome the above-mentioned drawbacks of the prior art, and an object of the present invention is to provide a fast fourier transform circuit based on an approximate constant complex multiplier, which effectively reduces the number of constants of a twiddle factor and increases the number of common subexpression expressions in a multi-constant multiplier by performing reasonable approximation processing on the twiddle factor using a symmetric mapping relationship of the twiddle factor in an FFT circuit architecture, thereby increasing data multiplexing, sharing partial products in complex multiplication, reducing the number of operations, and reducing adders in the circuit, so as to achieve the purpose of reducing area and power consumption.

The invention adopts the following technical scheme for realizing the aim of the invention:

a realization method of FFT circuit based on approximate constant complex multiplier is characterized in that CSD coding is carried out on the real part and the imaginary part of the complex constant of the twiddle factor in the FFT algorithm, and the CSD coding result is processed approximately, namely, in the allowed error, the bit of the CSD coding part of at least one of the real part and the imaginary part is carried out zero-one replacement based on the principle of increasing the public part of the real part and the imaginary part, so as to obtain the approximate constant of the real part and the imaginary part, then the public part of the approximate constant of the real part and the imaginary part of each twiddle factor is used as a public sub-expression to participate in the multiplication operation of the twiddle factor, and the multiplication operation of the twiddle factor is realized by the combination of shift operation and addition.

An FFT circuit based on approximate constant complex multiplier comprises a twiddle factor product module, and is characterized in that the real part and the imaginary part of the twiddle factor complex constant in the twiddle factor product module are respectively subjected to CSD coding, and the CSD coding result is subjected to approximate processing, namely, the bits of the CSD coded part of at least one of the real part and the imaginary part are subjected to zero-one permutation within an allowable error based on the principle of adding the common parts of the real part and the imaginary part to obtain the approximate constant of the real part and the imaginary part, the common part of the real part and the imaginary part of each twiddle factor approximate constant is used as a common subexpression to participate in the multiplication operation of the twiddle factor, and the twiddle factor product module is realized by the combination of shift operation and addition operation.

Radix-2 input with a T (T is an integer, and T is always 2 to N power in order to satisfy the requirement of a subsequent FFT module)²Single path delay feedback (Radix-2)²Single-path delay feedback, Radix-2 for short²SDF) serial fast Fourier algorithm, which comprises N/2(N is an exponential term of 2 in T above) Radix-2²SDF operation unit (if N/2 is a decimal number, the unit number takes an integer larger than N/2, and the last Radix-2²SDF arithmetic units comprise only one butterfly arithmetic unit), and each Radix-2²The SDF operation unit is composed of two butterfly operation units and a twiddle factor product module.

First butterfly unit (BF 1 for short): first Radix-2²The first butterfly operation unit BF1 in the SDF operation unit contains a memory with T/2A (A is integer greater than 1) bits, inputs T data in one frame, and stores the first T/2 dataAnd (4) inputting the data into a memory, then performing first butterfly operation on the last T/2 data and the first T/2 data to obtain two groups of data with the length of T/2, and returning and storing the second group of data into the memory.

Second butterfly unit (abbreviated BF 2): first Radix-2²The second butterfly operation unit BF2 in the SDF operation unit contains a memory with the size of T/4A bit, the first group of data (T/2 length) output by the first butterfly operation unit BF1 firstly stores T/4 data and enters the memory, the part is similar to BF1, the next T/4 data and the T/4 data in the memory are processed by BF2 butterfly operation, the obtained second group of data is returned, and the first group of data is output.

A twiddle factor product module: the serial output data of BF2 unit is multiplied by its corresponding twiddle factor, the value of twiddle factor is complex number of B bit, the FFT process data C + Dj (C, D is signed binary number of A bit) input of one A bit is multiplied by twiddle factor E + Fj (E, F is signed binary number of B bit), because E and F are constant, the complex multiplication is composed of two constant multiplication, which are C and E, F multiplication, D and E, F multiplication. While the constant multiplication of E, F can be regarded as the shift addition of the FFT process data, the common part can be reused to avoid the repeated shift accumulation. The result of the final calculation is used as the next Radix-2²Input to the SDF unit.

Next Radix-2²The SDF cell still includes a BF1 cell, a BF2 cell, and a twiddle factor product module. Radix-2 however²The memory size corresponding to the butterfly operation module in the SDF unit is gradually reduced by half. Finally, up to the N/2 Radix-2²The size of the memory cell of BF1 in SDF unit is 2A bit, BF2 will output the complex result of FFT module directly, and the final data flows out according to the order of bit permutation.

The further preferable scheme of the invention is that the symmetrical mapping relation of the twiddle factors is utilized to reduce the T twiddle factors corresponding to the T-point FFT to T/8, and the purpose is to reduce the actual calling number of the twiddle factors and the optimization complexity during constant number complex multiplication. For an FFT algorithm with T-point input, the number of the called twiddle factors is at most T. And the T twiddle factors can be viewed as points on a set of equally-spaced complex planar unit circles. The unit circle is divided into 8 parts according to the central angle, the twiddle factors of the second quadrant, the third quadrant and the fourth quadrant can all find values with equal absolute values in the first quadrant due to the symmetry of the unit circle, and in the first quadrant, the real part of the twiddle factor in the area of 0-45 degrees is equal to the imaginary part of the twiddle factor in the area of 45-90 degrees, and the imaginary part of the twiddle factor in the area of 0-45 degrees is equal to the real part of the twiddle factor in the area of 45-90 degrees. Thus, for an FFT system, we need only find the twiddle factors in the 0 to 45 region to find all the called twiddle factors. The number of twiddle factors will eventually be reduced to 1/8.

In the invention, the purpose of the optimization of the approximate constant number multiplier is to approximately process the twiddle factors so as to increase the common part in the complex number multiplication. The twiddle factor product module is essentially a constant multiplier. A fixed constant multiplier is optimized by firstly representing the real part and the imaginary part of the B-bit twiddle factors by using a regular signed number (CSD) code, finding out a common part between two fixed constants of the real part and the imaginary part of each twiddle factor, wherein the common part is a CSD code with the length of K (K is less than B/2), the characteristic head and the tail of the code consist of nonzero numbers, and the middle part is B-2 zeros. Generally, the more common sub-expressions, the less hardware consumption of the twiddle factor product module. Then, approximate processing is carried out on the rotation factor, the single non-zero number (non-zero bit which cannot form a common sub expression) in the low B/2 bit of the CSD coding of the rotation factor is used as the head or the tail of the common part, the existing common part in the coding is matched, the zero bit of the corresponding tail or the head is found out and zero-one permutation is carried out (for the CSD coding, one of the zero-one permutation can be 1 or-1, namely the position which is 0 originally is changed into 1 or-1, or the position which is 1 or-1 originally is changed into 0), and therefore the number of the common part in the rotation factor is increased. For example, assuming that the CSD codes of the real part and the imaginary part are 1000-10-1001000001 and 010-100000-10-1 respectively, and the common part is only-10-1, the two numbers are modified to 1000-10-1001000-101 and 010-100000-10-1 according to the above principle, the common part becomes 10-1(10-1 and-101 are regarded as the same common part in the hardware implementation, -101 is regarded as the opposite number of 10-1) and-10-1, and a common part is added before the approximation processing. In the FFT, a certain resulting error is tolerated, and under the tolerance of the approximation error, the method can increase the common part in the twiddle factor as much as possible. Then, a common part between the twiddle factors is extracted and preferentially calculated as a common sub-expression part. Therefore, the common part in the constant multiplier multiplied by two constants can be reused without recalculation.

Has the advantages that:

1) compared with the traditional fast Fourier transform circuit, the method realizes the constant number complex multiplication by using the combination of shift operation and addition operation, and uses the public part between the real part constant and the imaginary part constant in the twiddle factor complex constant in the shift operation, thereby greatly reducing the area.

2) Compared with the traditional fast Fourier transform circuit, the invention carries out CSD coding on the twiddle factor in the twiddle factor product module, reduces the number of non-zero bits in the twiddle factor (the number of the non-zero bits determines the addition times of the shift operation and the addition operation in the addition operation), reduces the process operation of multiplication by utilizing approximate processing, increases the part of a common product and further reduces the operation power consumption.

Drawings

Fig. 1 is a diagram of an exemplary hardware architecture for a serial FFT of the present invention.

FIG. 2 is a schematic diagram of the twiddle factor symmetric mapping compression of the present invention.

FIG. 3 is a graph of the result of twiddle factor approximation processing according to the present invention.

FIG. 4 is a hardware diagram of the twiddle factor product module of the present invention.

FIG. 5 is a graph of an exemplary 128-point FFT output error of the present invention.

FIG. 6 is a table of area optimizations for a 128-point FFT at 28nm process of the present invention.

FIG. 7 is a flow chart of the method of the present invention.

Detailed Description

The following describes the overall technical solution in further detail according to the above invention, wherein R2 with 128 points, 16bit data stream, and 16bit twiddle factor data bit width²The SDF fast fourier transform algorithm (T128, N7, a 16, and B16) is an example to illustrate a specific implementation of the present invention, but is not limited to the scope of the present invention.

FIG. 1 shows a 128 point R2 of the present invention²The SDF FFT diagram is implemented as shown in FIG. 7.

Referring to fig. 1, the present invention relates to R2 based on complex constant number multiplier²The SDF fast Fourier transform circuit mainly comprises 3 Radix-2²The SDF unit is formed, and the operation of the circuit can be divided into the following steps:

step 1: the number of twiddle factors is compressed and then the compressed twiddle factors are approximated. As shown in fig. 2, the maximum number of twiddle factors to be called for a 128-point FFT is 128, the number of finally used twiddle factors is 16 by using symmetric mapping of twiddle factors, and then the resulting CSD representation is shown in fig. 3.

Step 2: the 128 input 16-bit data pipeline enters the FFT module. First, since the number of points of the Fourier transform is 128, 4 levels of Radix-2 are required²SDF units, each Radix-2²The SDF unit comprises a BF1 arithmetic unit, a BF2 arithmetic unit and a twiddle factor product module, and the operation formula of the serial FFT module is as follows:

in the above formula, (k)₁+2k₂+4k₃) Indicating the order of the output signals, k₁Take 0, 1, k₂Take 0, 1, k₃Taking an integer from 0 to 15. Inside the sum symbol in the right equation of equal signThe practical meaning of the formula is the mathematical interpretation of the butterfly operation. Wherein

As a result of the BF1 butterfly operation,

as a BF2 butterfly.

And step 3: when the first 64 data of the input are filled with the first Radix-2²After the 64 × 16 bits of the SDF unit are stored, BF1 butterfly operations are sequentially performed on the 64 data and the 64 data in the storage unit, as shown in the figure, an addition part of the output result is output to a BF2 butterfly operation unit, and a subtraction part of the output result is stored in the 64 × 16 bits storage to overwrite the original data. And after the last 64 data operations are finished, the data in the 64 × 16bit storage sequentially flow out to a BF2 butterfly operation unit.

And 4, step 4: after the 32 × 16bit storage corresponding to the BF2 cell is filled, the next incoming data and the stored data are sequentially subjected to BF2 butterfly operation, and the operation result is partially output to the twiddle factor multiplication module and partially continuously returned to be stored in the 32 × 16bit storage as shown in step 2, until the operation is finished, the data in the 32 × 16bit storage sequentially flows out again.

And 5: the data output from BF2 enters into twiddle factor multiplication module, and the corresponding twiddle factor of the data is determined according to the position of the data in FFT algorithm, as shown in FIG. 3, which includes R2 with 128 inputs²The SDF algorithm comprises a part of twiddle factors called in the SDF algorithm and an approximate coding expression thereof, wherein a digital part coloring part is the approximate twiddle factors, and a coloring part of the coding part is a public sub expression. The data adopts the result of calculating the common sub-expression by using shift addition when calculating the twiddle factor product, and then the final output uses the common sub-tableThe result of the expression and the data itself are shift added as shown in fig. 4. The output data will be the next Radix-2²Input to the SDF unit.

Step 6: the data undergoes a total of 4 rounds of Radix-2²And the SDF unit operates and finally sequentially outputs the FFT results of the bit replacement sequence. By analyzing and comparing the FFT outputs after adding the approximate constant multiplier, the error result is shown in fig. 5, (a) and (b) show the root mean square error of the FFT after the approximation processing on all data sets, and the vertical axis value of the red line on the graph is the average root mean square error. The real-part mean root-mean-square error is only 0.2943 and the imaginary-part mean root-mean-square error is only 0.2948 at the real and imaginary outputs. In fig. 5, (c) and (d) the Z-axis direction represents all errors occurring in the real part and the imaginary part of the output, the XOY plane represents the position where the error occurs in each frame and the error belongs to the error number. The maximum error of the real and imaginary parts of the FFT output is no more than 10 for all input data, and the FFT output is 16-bit signed complex, so the maximum relative error is 3.05 x 10^-4The error is relatively low. In the error case, the 128-point FFT hardware circuit is simulated under the 28nm process, the area gain is shown in FIG. 6, and the whole area can be reduced by 13.7%.

Claims

1. A realization method of FFT circuit based on approximate constant complex multiplier is characterized in that CSD coding is carried out on the real part and the imaginary part of the complex constant of the twiddle factor in the FFT algorithm, and the CSD coding result is processed approximately, namely, in the allowed error, the bit of the CSD coding part of at least one of the real part and the imaginary part is carried out zero-one replacement based on the principle of increasing the public part of the real part and the imaginary part, so as to obtain the approximate constant of the real part and the imaginary part, then the public part of the approximate constant of the real part and the imaginary part of each twiddle factor is used as a public sub-expression to participate in the multiplication operation of the twiddle factor, and the multiplication operation of the twiddle factor is realized by the combination of shift operation and addition.

2. The method of claim 1, wherein T twiddle factors corresponding to the T-point FFT are reduced to T/8 twiddle factors by using a symmetric mapping relationship of the twiddle factors before CSD coding.

3. The FFT circuit implementation method based on approximated constant complex multipliers of claim 2, wherein the method for compressing the T twiddle factors to T/8 is: the T twiddle factors are regarded as points on a group of equally-divided complex plane unit circles, the unit circles are equally divided into 8 parts according to central angles, according to the symmetry of the unit circles, the twiddle factors of the second quadrant, the third quadrant and the fourth quadrant can find values with equal absolute values in the first quadrant, the real part of the twiddle factor in the 0-45-degree region is equal to the imaginary part of the twiddle factor in the 45-90-degree region, the imaginary part of the twiddle factor in the 0-45-degree region is equal to the real part of the twiddle factor in the 45-90-degree region, all the called twiddle factors can be obtained by obtaining the twiddle factors in the 0-45-degree region, and the number of the twiddle factors is finally reduced to T/8.

4. The FFT circuit implementation method based on approximately constant complex multipliers of claim 1, wherein:

in the serial FFT circuit, the multiplication operation of the twiddle factors is realized by the combination of a multiplexer and the shift accumulation logics of all the twiddle factors, and the operation of data and the appointed twiddle factors is selected by the multiplexer;

in the parallel FFT circuit, the multiplication operation of the twiddle factors is realized by an approximate constant multiplier without a multiplexer.

5. The FFT circuit implementation method based on approximated constant complex multipliers of claim 1, wherein the approximation processing method is: and taking the independent non-zero number in the low B/2 bit of the CSD code as the head or tail of the common part, matching the existing common part in the CSD code, and finding out the zero bit of the corresponding tail or head to carry out zero-one permutation, wherein B is the bit number of the real part and the imaginary part of the twiddle factor.

6. An FFT circuit based on approximate constant complex multiplier comprises a twiddle factor product module, and is characterized in that the real part and the imaginary part of the twiddle factor complex constant in the twiddle factor product module are respectively subjected to CSD coding, and the CSD coding result is subjected to approximate processing, namely, the bits of the CSD coded part of at least one of the real part and the imaginary part are subjected to zero-one permutation within an allowable error based on the principle of adding the common parts of the real part and the imaginary part to obtain the approximate constant of the real part and the imaginary part, the common part of the real part and the imaginary part of each twiddle factor approximate constant is used as a common subexpression to participate in the multiplication operation of the twiddle factor, and the twiddle factor product module is realized by the combination of shift operation and addition operation.

7. The FFT circuit based on approximately constant complex multipliers of claim 6 wherein the number of twiddle factors is T/8, where T is the number of points of the FFT.

8. The FFT circuit based on approximately constant complex multipliers of claim 6, wherein:

in the serial FFT circuit, a twiddle factor product module is realized by a multiplexer and the shift accumulation logic combination of all twiddle factors, and the operation of data and appointed twiddle factors is selected by the multiplexer;

in the parallel FFT circuit, a twiddle factor product module is realized by an approximate constant multiplier without a multiplexer.

9. The FFT circuit based on approximated constant complex multipliers of claim 6, wherein said approximation processing method is: and taking the independent non-zero number in the low B/2 bit of the CSD code as the head or tail of the common part, matching the existing common part in the CSD code, and finding out the zero bit of the corresponding tail or head to carry out zero-one permutation, wherein B is the bit number of the real part and the imaginary part of the twiddle factor.