CN118051709A - FFT processor and operation method - Google Patents

FFT processor and operation method Download PDF

Info

Publication number
CN118051709A
CN118051709A CN202410193134.3A CN202410193134A CN118051709A CN 118051709 A CN118051709 A CN 118051709A CN 202410193134 A CN202410193134 A CN 202410193134A CN 118051709 A CN118051709 A CN 118051709A
Authority
CN
China
Prior art keywords
fft
data
columns
complex
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410193134.3A
Other languages
Chinese (zh)
Inventor
杜力
李承睿
邵壮
杜源
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN202410193134.3A priority Critical patent/CN118051709A/en
Publication of CN118051709A publication Critical patent/CN118051709A/en
Pending legal-status Critical Current

Links

Landscapes

  • Complex Calculations (AREA)

Abstract

The application provides an FFT processor and an operation method, wherein the FFT processor comprises the following components: the device comprises an acquisition module, a serial-parallel FFT module and a restoration module; the serial-parallel FFT module is electrically connected with the acquisition module and the reduction module; the acquisition module is configured to: real time domain data are obtained, complex columns formed by even columns and odd columns of the real time domain data are decomposed into 4 paths, data points of the complex columns are rearranged into two-dimensional data from one-dimensional data, and the two-dimensional data are input into a serial FFT module; the serial-parallel FFT module is configured to: based on the complex columns, acquiring FFT calculation results of the complex columns by using a library Li Tuji algorithm, and inputting the FFT calculation results of the complex columns to a reduction module; the restoration module is configured to: based on the FFT calculation result of the complex sequence, generating the frequency domain result of the real time domain data so as to solve the problems of longer processing time and larger power consumption of the current FFT processor in the real time processing process of the large-point real FFT.

Description

FFT processor and operation method
Technical Field
The present application relates to the field of digital signal processing technologies, and in particular, to an FFT processor and an operation method.
Background
The FFT is an efficient algorithm for DFT, known as the fast fourier transform. Fourier transform is one of the most basic methods in time-domain and frequency-domain transform analysis, and discrete fourier transform applied in the field of digital processing is the basis of many digital signal processing methods. Fourier transform is widely used in digital signal processing, and becomes a key technology in the fields of optical communication, radar and electronic countermeasure, and satellite image processing.
However, in the above application scenario, the actual calculation process of the FFT with a large number of points often faces the problems of long required time and high power. Meanwhile, for a CPU/FPGA/GPU hardware platform, the universality and the performance of the hardware platform are in conflict, namely, the performance (required time/power consumption) of the hardware platform is poor when the FFT with large points is calculated through the universal hardware platform.
Disclosure of Invention
The application provides an FFT processor and an operation method, which are used for solving the technical problems of longer processing time and larger power consumption in the real-time processing process of a large-point real FFT of the conventional FFT processor.
The first aspect of the present application provides an FFT processor comprising: the device comprises an acquisition module, a serial-parallel FFT module and a restoration module; the serial-parallel FFT module is electrically connected with the acquisition module and the reduction module;
the acquisition module is configured to:
Real time domain data are obtained, complex columns formed by even columns and odd columns of the real time domain data are decomposed into 4 paths, data points of the complex columns are rearranged from one-dimensional data into two-dimensional data, and the two-dimensional data are input into the serial FFT module; the even columns and the odd columns are respectively the real part and the imaginary part of the complex columns;
the serial-parallel FFT module is configured to:
Based on the complex columns, acquiring FFT calculation results of the complex columns by using a library Li Tuji algorithm, and inputting the FFT calculation results of the complex columns to the reduction module;
the restoration module is configured to:
and generating a frequency domain result of the real time domain data based on the FFT calculation result of the complex series.
In some embodiments, the serial-parallel FFT module comprises: four sets of serial FFT units and parallel FFT units; the parallel FFT unit is electrically connected with four groups of the serial FFT units;
the serial FFT unit is configured to:
performing a row FFT operation on the complex columns to generate intermediate columns;
the parallel FFT unit is configured to:
Multiplying all data points of the intermediate sequence by twiddle factors of the data points at respective positions, and performing column FFT operation to obtain FFT calculation results of the complex sequence.
In some embodiments, each set of the serial FFT units is a 2048-point FFT unit, which consists of an 11-stage SDF structure.
In some embodiments, the reduction module comprises: the data rearrangement unit, the real FFT calculation unit and the butterfly unit; the real FFT resolving unit is electrically connected with the data rearrangement unit and the butterfly unit;
The data rearrangement unit is configured to:
the FFT calculation result is used for caching the complex sequence FFT calculation result, calculating the FFT calculation results of the real time domain data even sequence and the real time domain data odd sequence according to the complex sequence FFT calculation result, and outputting the FFT calculation results to the FFT calculation unit;
the real FFT computation element is configured to:
Based on FFT calculation results of even columns and odd columns of the real time domain data, generating component data by using a basic idea of a base 2FFT and inputting the component data to the butterfly unit; the component data are: an 8-point even component and an 8-point odd component;
the butterfly unit is configured to:
and based on the component data, performing radix-2 butterfly calculation, and calculating a frequency domain result of the real time domain data.
In some embodiments, the data reordering unit comprises: the device comprises a cache subunit, a matching subunit and a calculating subunit; the matching subunit is electrically connected with the cache subunit and the calculating subunit;
The cache subunit is configured to:
caching FFT calculation results of the complex series;
The matching subunit is configured to:
Matching a first parameter and a second parameter in the FFT calculation result of the complex series, and outputting the matched first parameter and second parameter to the calculation subunit;
The computing subunit is configured to:
and calculating FFT calculation results of even columns and odd columns of the real time domain data according to the matched first parameter and second parameter, and outputting the FFT calculation results to the FFT calculation unit.
In some embodiments, the even column FFT computation results are conjugate symmetric and the odd column FFT computation results are conjugate symmetric.
In some embodiments, the FFT processor further comprises:
A squaring module electrically connected with the restoration module configured to:
And obtaining a square sum result of the frequency domain result of the real time domain data based on the frequency domain result of the real time domain data.
In some embodiments, the FFT processor further comprises: a result sram module electrically connected with the squaring integration module configured to:
And integrating the square sum result in a preset time based on the square sum result of the frequency domain result of the real time domain data, generating and outputting the square sum integration result of the frequency domain result of the real time domain data.
A second aspect of the present application provides an operation method of an FFT processor, which is applied to the FFT processor of any one of the first aspect, including:
acquiring real time domain data;
Based on a complex number column formed by an even number column and an odd number column of the real number time domain data, acquiring an FFT calculation result of the complex number column by using a library Li Tuji algorithm; the even columns and the odd columns are respectively the real part and the imaginary part of the complex columns;
and generating a frequency domain result of the real time domain data based on the FFT calculation result of the complex series.
In some embodiments, the obtaining the FFT calculation result of the complex sequence using the library Li Tuji algorithm based on the complex sequence consisting of the even sequence and the odd sequence of the real time domain data comprises:
rearranging data points of a complex number column formed by even number columns and odd number columns of the real number time domain data from one-dimensional data to two-dimensional data, performing line FFT operation on the complex number column, and generating an intermediate number column;
Multiplying all data points of the intermediate sequence by twiddle factors of the data points at respective positions, and performing column FFT operation to obtain FFT calculation results of the complex sequence.
The application provides an FFT processor and an operation method, wherein the FFT processor comprises the following components: the device comprises an acquisition module, a serial-parallel FFT module and a restoration module; the serial-parallel FFT module is electrically connected with the acquisition module and the reduction module; the acquisition module is configured to: real time domain data are acquired, complex columns formed by even columns and odd columns of the real time domain data are decomposed into 4 paths, and data points of the complex columns are rearranged from one-dimensional data into two-dimensional data and input into the serial FFT module; the even columns and the odd columns are respectively the real part and the imaginary part of the complex columns; the serial-parallel FFT module is configured to: based on the complex columns, acquiring FFT calculation results of the complex columns by using a library Li Tuji algorithm, and inputting the FFT calculation results of the complex columns to the reduction module; the restoration module is configured to: and generating a frequency domain result of the real time domain data based on the FFT calculation result of the complex sequence so as to reduce the processing time and reduce the power consumption of the FFT processor in the real-time processing process of the large-point real FFT.
Drawings
In order to more clearly illustrate the technical solution of the present application, the drawings that are needed in the embodiments will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.
FIG. 1 is a schematic diagram of an FFT processor according to the present application;
FIG. 2 is a schematic diagram of a data rearrangement unit according to the present application;
FIG. 3 is a flowchart of an operation method of the FFT processor in the present application;
FIG. 4 is a flowchart illustrating the operation of the FFT processor according to the present application;
FIG. 5 is a schematic diagram of the algorithm of library Li Tuji in the present application;
FIG. 6 is a schematic diagram of a serial FFT operation structure in the present application;
FIG. 7 is a schematic diagram of a single stage SDF configuration in accordance with the present application;
FIG. 8 is a schematic diagram of a single stage SDF structure in accordance with one embodiment of the present application;
Fig. 9 is a schematic diagram of a single stage SDF structure in accordance with another embodiment of the present application;
fig. 10 is a schematic diagram of the real FFT solution of the inverse bit data stream in the present application.
Reference numerals illustrate:
1-an acquisition module; a 2-serial-parallel FFT module; a 21-serial FFT unit; a 22-parallel FFT unit; a 3-reduction module; 31-a data rearrangement unit; 311-cache subunit; 312-matching subunit; 313-a computing subunit; a 32-real FFT calculation unit; 33-butterfly unit; a 4-square integration module; 5-result sram module.
Detailed Description
In order to make the technical solution of the present application better understood by those skilled in the art, the technical solution of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.
In order to solve the technical problems, the application provides an FFT processor and an operation method, which are described below, because in some technologies, the FFT processor has longer processing time and larger power consumption in the real-time processing process of a large-point real FFT:
As can be seen from fig. 1, a first aspect of the present application provides an FFT processor comprising: an acquisition module 1, a serial-parallel FFT module 2 and a reduction module 3; the serial-parallel FFT module 2 is electrically connected with the acquisition module 1 and the reduction module 3; the acquisition module 1 is configured to: real time domain data are acquired, complex columns formed by even columns and odd columns of the real time domain data are decomposed into 4 paths, data points of the complex columns are rearranged from one-dimensional data into two-dimensional data, and the two-dimensional data are input into the serial FFT module 2; the even columns and the odd columns are respectively the real part and the imaginary part of the complex columns; the serial-parallel FFT module 2 is configured to: based on the complex columns, acquiring FFT calculation results of the complex columns by using a library Li Tuji algorithm, and inputting the FFT calculation results of the complex columns to the reduction module 3; the reduction module 3 is configured to: and generating a frequency domain result of the real time domain data based on the FFT calculation result of the complex series.
The application provides an FFT processor, wherein data received by a computing chip in the FFT processor are input in a real number form, and intermediate results and final results in the FFT computing process are represented and computed in a complex number form, so that collected 2N point data are respectively regarded as a real part and an imaginary part input by the FFT processor to be input into the FFT processor, and then a real FFT result of the input 2N point data is calculated from a computing result of the final FFT processor, thereby achieving the purpose that an N point FFT computing circuit completes the 2N point real FFT computing result.
In this embodiment, the mathematical procedure of the present application to generate the frequency domain result of real time domain data is as follows: firstly, an even column X 1 [ N ] of X [2N ] namely 2N point data and an odd column X 2 [ N ] form Y [ N ] =x 1[n]+j*x2 [ N ] to be input into a 2N point FFT circuit, FFT (Y [ N ])=Y (k) is obtained, namely an FFT calculation result Y [ N ] of the array Y [ N ], and then an FFT result X1 (k) of an even column X 1 [ N ] and an FFT result X2 (k) of an odd column X 2 [ N ] are obtained according to the following formula, wherein the following formula is as follows:
wherein, the subscript RE represents the real part, the subscript IM represents the imaginary part, and after the FFT calculation result Y [ k ] of the array Y [ N ] is obtained, the FFT result of the x [2N ] odd array and the FFT result of the x [2N ] even array are obtained according to the real part and the imaginary part of the Y [ k ]. After the odd-numbered columns FFT result and the even-numbered columns FFT result are obtained, the FFT result of x [2N ] is obtained through further resolving and restoring, and the FFT result of x [2N ] is obtained through restoring by utilizing the following formula by utilizing the basic idea of the base 2 FFT.
In the formula, k is less than or equal to 0 and is less than N, namely, the most basic radix 2 butterfly calculation is carried out on X1 (k) and X2 (k), and then X (k) calculation results with the final length of 2N can be obtained through reduction.
As can be seen from fig. 1 and 4, the serial-parallel FFT module 2 includes: four sets of serial FFT units 21, parallel FFT units 22; the parallel FFT unit 22 is electrically connected with the four sets of serial FFT units 21; the serial FFT unit is configured to: as shown in fig. 5, performing a row FFT operation on the complex columns to generate intermediate columns; the parallel FFT unit 22 is configured to: multiplying all data points of the intermediate sequence by twiddle factors of the data points at respective positions, and performing column FFT operation to obtain FFT calculation results of the complex sequence.
Exemplary, the basic principle of the kuli-graph-based algorithm: for an N-point DFT, the frequency domain transform is:
For the N-point FFT, according to the formula, the N-point DFT operation is completed by the operation amount of N 2 orders of magnitude, and for achieving the purpose of the same time-frequency domain Fourier transform, the FFT reduces the operation amount of the DFT to the Nlog N level, namely, the purpose of the same time-frequency domain transform is achieved by both the DFT and the FFT, and the FFT is a quick implementation method of the DFT. Let n=m×l, i.e. the natural number N is decomposed into the product of two natural numbers M and L, N being typically an integer power of 2 for fourier transformation. The library Li Tuji algorithm is to decompose the N-point DFT into M L-point DFTs and L M-point DFTs, and when N is a larger natural number (e.g. 16384), the decomposition is significant for the hardware implementation, because as the number of N-point points increases, the hardware overhead (power consumption/area) increases linearly, and when the library Li Tuji algorithm is used to decompose the large-point DFT into a plurality of small-point DFTs and execute the small-point DFTs sequentially in time, the purpose of calculating a fourier transform of a larger point on a chip with smaller calculation power can be achieved. The specific formula of the kuli graph base algorithm is as follows:
Where 0+.k 1<L,0≦k2 < M.
As shown in fig. 5, the implementation of the above formula is shown, N-point one-dimensional data is rearranged into point two-dimensional data, firstly M-point DFT (or FFT) is performed on L rows, then all data of m×l points are multiplied by twiddle factors at corresponding positions, finally L-point DFT (or FFT) is performed on M columns, so that the finally obtained m×l-point frequency domain data is the result of directly performing DFT (or FFT) on N-point data, the two methods are mathematically equivalent, and mathematical derivation is shown in the above formula.
As shown in fig. 4, for example, the 8192-point FFT is first decomposed into 4×2048, i.e., 4 paths of parallel input data, each path of input data is first processed into 2048-point serial FFT operation, then the "point-by-point multiplication twiddle factor" and the "column FFT operation" in the kuli-radix formula are completed simultaneously in the parallel FFT unit 22, and in the present application, the calculation of the two is integrated, i.e., in the process of "point-by-point multiplication twiddle factor", the weight of the next-stage column FFT operation is integrated into the twiddle factor multiplied in the previous-stage "point-by-point multiplication twiddle factor", and the next-stage operation in fig. 5 can be completed simultaneously by one complex multiplication, thereby improving the operation efficiency.
In this embodiment, each group of the serial FFT units 21 is a 2048-point FFT unit, and the serial FFT units 21 are composed of 11-stage SDF structures. As shown in fig. 6, in the structure for implementing 2048-point FFT in the present application, the structure is composed of 11-stage SDF structures in total, and is composed of a storage unit and a butterfly unit, the SDF structure input and output are single-port input and single-port output, i.e. each cycle has only one data input during input, each cycle has only one data output during output, because of the calculation mode of FFT, in different stages, each input data needs to wait for different cycles until the paired data completes the butterfly operation, for example, in the first stage of 2048-point FFT, the first 1024 input data needs to wait for 1024 cycles until the corresponding data completes operation, in the second stage, the first 512 data waits for 512 cycles and the second 512 data completes operation, the third 512 data waits for 512 cycles and the fourth 512 data completes operation, and so on until in the eleventh stage, each data can be output only by waiting for one cycle and the next data completes operation.
For example, as shown in fig. 7, the SDF basic structure is shown, since the butterfly operation is two inputs and two outputs, the data selector needs to determine whether the input data is directly input into the butterfly unit to complete the operation or is input into the SRAM to be buffered at the input end, and similarly, the data selector also has the data selector at the output end to select whether the two outputs of the butterfly operation are output into the SDF operation unit at the next stage or are stored into the SRAM to be buffered. The memory access mechanism for a single port SRAM in the serial FFT operation circuit of the present application is mainly explained as shown in fig. 8 and 9. In general, the access mechanism is implemented byInstead of an N-point dual-port SRAM, the number of N points is generally large in the case of serial FFT, which is significant, and when n=2048, the mechanism can reduce the occupied area and power consumption of the memory by 30% -40%. In the process of calculating FFT by SDF, there are three phases in total, in the first phase, the SDF operation unit receives the data to be operated continuously from the previous stage, at this time, no read-write conflict problem is generated, and each period SRAM only completes the write operation of the corresponding position. In the second stage, the SDF operation unit reads the data written into the SRAM buffer in the first stage, and also writes the data into the second output of the butterfly operation unit (the first output of the butterfly operation unit is directly output to the SDF operation unit in the next stage), and at this time, a read-write collision problem occurs for the monolithic SRAM, and the read-write collision is derived from the collision of "read" and "write" to the SRAM in the FFT calculation. In the third stage, the SDF operation unit is to read out the second output of the butterfly operation written into the SRAM buffer before, and at the same time, the SDF operation unit is also required to write the data of the next round of FFT (the serial FFT is often continuously pipelined), at this time, the read-write collision is generated again for the monolithic SRAM, the read-write collision is derived from the requirement of the present FFT operation "read" and the requirement of the next FFT operation "write", and the read-write collision source is different from the second stage, and is the read-write collision problem generated by two continuous FFT operations. In the present application, two/>, are proposedThe point single-port SRAM replaces an SRAM access mechanism of an N point double-port SRAM, the problem of SRAM read-write conflict caused by different reasons at different stages is well solved by utilizing an alternative read-write mode, and meanwhile, the problem that data which is written first cannot be written again before being read out to cause data to be mutually covered is guaranteed, so that the use of the double-port SRAM is avoided under the condition, and when the number of points of N is larger, the reduction in memory area and power consumption caused by the access mechanism is also larger.
As can be seen from fig. 1 and 4, the reduction module 3 comprises: a data rearrangement unit 31, the real FFT resolving unit 32, a butterfly unit 33; the real FFT resolving unit 32 is electrically connected to the data rearranging unit 31 and the butterfly unit 33; the data rearranging unit 31 is configured to: the FFT calculation result for buffering the complex sequence, and calculating the even sequence and the odd sequence of the real time domain data according to the FFT calculation result of the complex sequence, and outputting the calculated result to the FFT calculation unit 32; as shown in fig. 2, the data rearrangement unit 31 includes: a cache subunit 311, a matching subunit 312, a computation subunit 313; the matching subunit 312 is electrically connected to the cache subunit 311 and the computing subunit 313; the cache subunit 311 is configured to: caching FFT calculation results of the complex series; the matching subunit 312 is configured to: matching a first parameter and a second parameter in the FFT calculation result of the complex series, and outputting the matched first parameter and second parameter to the calculation subunit 313; the first parameter is an FFT calculation result parameter value of the complex series corresponding to the kth item in the FFT calculation results of the complex series; the second parameter is the parameter value of the FFT calculation result of the complex series corresponding to the N-k item in the FFT calculation result of the complex series; the computing subunit 313 is configured to: and calculating FFT calculation results of even columns and odd columns of the real time domain data according to the matched first parameter and second parameter, and outputting the FFT calculation results to the FFT calculation unit 32. The real FFT computation unit 32 is configured to: based on the FFT calculation results of even and odd columns of the real time domain data, using the basic idea of the radix-2 FFT, generating component data and inputting the component data to the butterfly unit 33; the component data are: an 8-point even component and an 8-point odd component; the butterfly unit 33 is configured to: and based on the component data, performing radix-2 butterfly calculation, and calculating a frequency domain result of the real time domain data.
In this embodiment, as shown in the formula of the FFT result X1 (k) of the even column X 1 [ N ] and the FFT result X2 (k) of the odd column X 2 [ N ], in order to obtain X1 (k) and X2 (k) by calculation, Y (k) and Y (N-k) need to be obtained simultaneously, but for the former serial-parallel FFT structure, only one Y (k) is obtained per cycle of each parallel data stream, so the data rearrangement unit 31 is required, and the data is buffered in the buffer subunit 311 that outputs Y (k) first in time sequence until the former FFT circuit outputs Y (N-k) corresponding to the same, and then the two are obtained, and then input to the real FFT calculation unit 32 to perform the operation of the formula of the FFT result X1 (k) of the even column X 1 [ N ] and the FFT result X2 (k) of the odd column X 2 [ N ].
It should be noted that, in the formulas of the FFT result X1 (k) of the even column X 1 [ N ] and the FFT result X2 (k) of the odd column X 2 [ N ], k and N-k are symmetric in rotation, i.e., let N-k=k ', then N-k' =k, after knowing Y (k) and Y (N-k), not only the value of X1 IM (k) can be obtained by the formula of the FFT result X1 (k) of the even column X 1 [ N ] and the FFT result X2 (k) of the odd column X 2 [ N ], but also the value of X1 IM (N-k) can be obtained by the formula of X1 IM(N-k)=-X1IM (k), wherein the above formula is the nature of the real FFT result "conjugate symmetry", i.e., the complex number X (k) is conjugate symmetry with respect to any k in the real column X [ N ] after performing FFT calculation. I.e. the even column FFT computation results are conjugate symmetric and the odd column FFT computation results are conjugate symmetric. The real FFT calculation unit 32 receives four paths Y (k) and Y (N-k), and then calculates the result of 8 paths of complex numbers X1 (k) and X2 (k) by utilizing the conjugate symmetry of the real FFT result. Then 8 paths of butterfly operations of basic radix-2 are completed in 8 butterfly units 33, and finally, the results of 8 points X (k) and X (k+N), namely the frequency domain results of the real time domain data, are output.
As can be seen from fig. 1, the FFT processor further comprises: a squaring module 4, the squaring module 4 being electrically connected to the reduction module 3 and configured to: and obtaining a square sum result of the frequency domain result of the real time domain data based on the frequency domain result of the real time domain data. Because the obtained fourier transform result is expressed in a complex form, the real part and the imaginary part of the fourier transform result are generally converted into complex modular values when spectrum analysis is performed, and therefore, after the complex FFT result is obtained, the square integral module 4 is required to obtain the square value of the result modular value, and the square sum is stored in the corresponding final result sram module 5 for storage.
As can be seen from fig. 1, the FFT processor further comprises: a result sram module 5, the result sram module 5 being electrically connected with the square integration module 4, configured to: and integrating the square sum result in a preset time based on the square sum result of the frequency domain result of the real time domain data, generating and outputting the square sum integration result of the frequency domain result of the real time domain data. The round 16384 point real number FFT operation needs to integrate within 10ms/20ms in an actual application scene, that is, according to external configuration information, about 16384 points FFT results of 3000 rounds (integrating 20 ms) or about 16384 points FFT results of 1600 rounds (integrating 10 ms) are output to the outside through the sram module 5, so that the functions of performing spectrum analysis calculation on input data in fixed time and outputting calculation results are realized.
The application provides an FFT processor, 4 paths of time domain complex data are input in parallel, each path of parallel time domain data firstly carries out 2048-point serial FFT, then 4 paths of data together complete basic 4-point parallel FFT operation, and the first part FFT calculation circuit completes 8192-point FFT operation on the input data. Then, since the parallel FFT circuit can only output 4 data in parallel per cycle, in order to complete the operation of k and N-k data in the subsequent real FFT calculation, the data output first needs to be temporarily stored in the buffer subunit 311 to implement the function of data rearrangement, a state machine is arranged in the data rearrangement unit 31 to control the current calculation sequence, when k and N-k data are identified to be paired, 8 paths of data are output to the operation module to complete the real FFT calculation, the odd-numbered and even-numbered FFT results of the original 16384 point data are calculated, then the radix-2 butterfly operation is completed in the butterfly unit 33, and finally the 16384 point real FFT result is obtained.
As can be seen from fig. 3, a second aspect of the present application provides an operation method of an FFT processor, which is applied to an FFT processor according to any of the foregoing embodiments, including:
s100: acquiring real time domain data;
S200: based on a complex number column formed by an even number column and an odd number column of the real number time domain data, acquiring an FFT calculation result of the complex number column by using a library Li Tuji algorithm; the even columns and the odd columns are respectively the real part and the imaginary part of the complex columns;
The obtaining the FFT calculation result of the complex sequence by using the library Li Tuji algorithm based on the complex sequence formed by the even sequence and the odd sequence of the real time domain data comprises:
S210: rearranging data points of a complex number column formed by even number columns and odd number columns of the real number time domain data from one-dimensional data to two-dimensional data, performing line FFT operation on the complex number column, and generating an intermediate number column;
S220: multiplying all data points of the intermediate sequence by twiddle factors of the data points at respective positions, and performing column FFT operation to obtain FFT calculation results of the complex sequence.
S300: and generating a frequency domain result of the real time domain data based on the FFT calculation result of the complex series. The effects of the above method embodiments may be referred to the above FFT processor embodiments, and are not described herein. For the "operation method", it is necessary to supplement: after the FFT operation result of the complex number column is obtained, the FFT result of the real part of the complex number column and the FFT result of the imaginary part of the complex number column are obtained through calculation, namely the FFT result of the even component of the real time domain data and the FFT result of the odd component of the time domain data are obtained, then the basic idea of radix-2 FFT is utilized, and the FFT operation result of the original real time domain data can be restored and obtained through butterfly operation.
The application provides an FFT processor and an operation method, wherein the real FFT calculation flow under the reverse bit data flow is as follows: as shown in fig. 4, in the first stage of the present application, a serial FFT circuit in which the input data sequence is continuous in the time domain and the output data sequence is inverse in the frequency domain due to the fundamental characteristics of FFT computation. As shown in fig. 8, in the present application, the output sequence of the 8192-point FFT circuit is 4-way parallel, each path is formed by 2048-point serial FFT, taking the first row as an example, 0 is represented by 11-bit 2 as "00000000000",1024 is represented by 11-bit 2 as "10000000000",512 is represented by 11-bit 2 as "01000000000", and these 11 bits are represented by reverse, then the 11-bit reverse bit 2 of 0 is represented by "0000000", 1024 is represented by 11-bit reverse bit 2 as "00000000001",512 is represented by 11-bit reverse bit 2 as "00000000010", that is, the reverse bit sequence means that the binary is represented from the back to the front, that is, the sequence of the normal 2-bit representation is obtained. Since only 1 data is output per cycle of the serial FFT, in the matrix shown in fig. 8, it can be considered that the first cycle four-way parallel outputs the first column with frequency domain number "0/2048/4096/6144", the second cycle four-way parallel outputs the second column with frequency domain number "1024/3072/5120/7168", and so on. The first line of data is arranged in a reverse bit sequence within the range of 11 bits, and corresponding parallel four-way data are respectively added with 2048/4096/6144 on the basis of the first way, so that for convenience, the four-way data streams which are uniformly called as parallel are data streams in the reverse bit sequence, and the data of the first line are arranged in the reverse bit sequence within the range of 11 bits. In the "data rearrangement" module after the parallel FFT, four paths of parallel data are received in reverse bit order in each period, and in the subsequent real FFT resolving unit 32, two sets of data with frequency domain numbers of k and N-k need to be resolved according to the formula of the FFT result X1 (k) of the even column X 1 [ N ] and the FFT result X2 (k) of the odd column X 2 [ N ] obtained by resolving.
In the application, a rule for resolving two groups of data of k and N-k in an inverse bit data stream is provided, as shown in fig. 10, the inverse bit data stream in the range of 11 bits is divided into stages of "stage0", "stage1", and the like, wherein stage0 and stage1 have only one column of data, when N is more than or equal to 2, stage N has 2 n-1 columns of data, and in the same stage (N is more than or equal to 2), the m columns and the 2 n-1 -m columns can just achieve the data resolving of k and N-k. As shown in stage4 in fig. 8, there is 2 4-1 =8 columns of data in stage4, where the first column may form a solution pair of k and N-k with the eighth column, and the second column may form a solution pair of k and N-k with the seventh column, as shown by the double-headed arrow in fig. 8. Similarly, for stage5 (not shown), stage5 has 16 rows of data, the first row and the sixteenth row are resolved, and the second row and the fifteenth row are resolved. The method for resolving the inverse bit data stream is universal, FFT is basically carried out in the power of 2, any power of 2 can be resolved into a plurality of stages, and real FFT resolving operation of k and N-k is carried out in each independent stage. In the FFT inverse bit stream calculation of arbitrary power of 2, stage0 and stage1 have only one column of data, and a single column of data can finish the calculation of k and N-k by itself, as shown in FIG. 8, 2048 and 6144 in stage0 form a calculation pair of k and N-k, 4096 and 4096 form a calculation pair of k and N-k by themselves, and 0 form a calculation pair of k and N-k by themselves. stage1 comprises 1024 and 7168 as a solution pair of k and N-k, 3072 and 5120 as a solution pair of k and N-k. When n in stage n is more than or equal to 2, the calculation can be performed according to the calculation mode shown in the previous section.
The real FFT resolving method of the reverse bit data stream provided by the application is very friendly to hardware, m columns and columns are resolved in different stages in the design of a hardware circuit, the behavior of a hardware circuit in each stage is regular, meanwhile, for stage n, only the first half data is stored in the SRAM, and the resolving operation can be completed by correspondingly reading the first half data from the SRAM when the second half data arrives one by one according to the period. In terms of storage resource consumption, when performing real FFT calculation on a data stream with N-point bit sequence, in order to finish calculation of k and N-k, storing the first half of data, and finishing calculation when the second half of data arrives, wherein the required SRAM size isFor the N-point bit reverse sequence data stream in the application, the required maximum SRAM is determined by the last stage, in the last stage, the calculation of the column data is completed, the first half of the column data is stored in the SRAM, and the calculation is waited for the arrival of the subsequent data, in the application, the size of the SRAM required by the bit necessary data stream in the real FFT calculation process is/>When the number of N points is large, it is significant by the above-described resolving method.
The application utilizes the joint optimization of a mathematical layer and a circuit layer, realizes a serial-parallel combined large-point real number FFT processor utilizing a base drawing algorithm, is applied to the fields of optical communication, radar and satellite image processing, and the design of a flow architecture enables each input data to be accepted and processed in real time without losing and discarding time data. Meanwhile, the circuit structure and the optimization method of the data flow layer surface enable the consumption of hardware resources on the memory layer surface to be reduced as much as possible in the circuit implementation process, and further the power consumption and the area of the hardware circuit are saved. The application provides an 8192-point computing 16384-point real FFT operation circuit, but the memory access optimization method and the data stream optimization method of the memory are universal, not only aiming at the 8192-point FFT computation circuit and the 16384-point real FFT computation circuit, but also can reduce the power consumption of hardware and the area of hardware to a greater extent along with the increase of the FFT computation point.
The foregoing detailed description of the embodiments of the present application further illustrates the purposes, technical solutions and advantageous effects of the embodiments of the present application, and it should be understood that the foregoing is merely a specific implementation of the embodiments of the present application, and is not intended to limit the scope of the embodiments of the present application, and any modifications, equivalent substitutions, improvements, etc. made on the basis of the technical solutions of the embodiments of the present application should be included in the scope of the embodiments of the present application.

Claims (10)

1. An FFT processor, comprising: the device comprises an acquisition module (1), a serial-parallel FFT module (2) and a reduction module (3); the serial-parallel FFT module (2) is electrically connected with the acquisition module (1) and the reduction module (3);
The acquisition module (1) is configured to:
Real time domain data are acquired, complex columns formed by even columns and odd columns of the real time domain data are decomposed into 4 paths, data points of the complex columns are rearranged from one-dimensional data into two-dimensional data, and the two-dimensional data are input into the serial FFT module (2); the even columns and the odd columns are respectively the real part and the imaginary part of the complex columns;
the serial-parallel FFT module (2) is configured to:
based on the complex columns, acquiring FFT calculation results of the complex columns by using a library Li Tuji algorithm, and inputting the FFT calculation results of the complex columns to the reduction module (3);
the reduction module (3) is configured to:
and generating a frequency domain result of the real time domain data based on the FFT calculation result of the complex series.
2. An FFT processor according to claim 1, characterized in that the serial-parallel FFT module (2) comprises: four groups of serial FFT units (21) and parallel FFT units (22); the parallel FFT unit (22) is electrically connected with four groups of the serial FFT units (21);
the serial FFT unit is configured to:
performing a row FFT operation on the complex columns to generate intermediate columns;
The parallel FFT unit (22) is configured to:
Multiplying all data points of the intermediate sequence by twiddle factors of the data points at respective positions, and performing column FFT operation to obtain FFT calculation results of the complex sequence.
3. An FFT processor according to claim 2, characterized in that each set of said serial FFT units (21) is a 2048-point FFT unit, said serial FFT units (21) being composed of 11-stage SDF structures.
4. An FFT processor according to claim 1, characterized in that the reduction module (3) comprises: a data rearranging unit (31), the real FFT resolving unit (32), a butterfly unit (33); the real FFT resolving unit (32) is electrically connected with the data rearranging unit (31) and the butterfly unit (33);
The data rearranging unit (31) is configured to:
The FFT calculation result used for caching the complex sequence is output to the FFT calculation unit (32) according to the FFT calculation result of the complex sequence, and the FFT calculation results of the real time domain data even sequence and the real time domain data odd sequence are calculated;
the real FFT computation unit (32) is configured to:
based on the FFT calculation results of even columns and odd columns of the real time domain data, generating component data by using the basic idea of a base 2FFT and inputting the component data to the butterfly unit (33); the component data are: an 8-point even component and an 8-point odd component;
the butterfly unit (33) is configured to:
and based on the component data, performing radix-2 butterfly calculation, and calculating a frequency domain result of the real time domain data.
5. An FFT processor according to claim 4, characterized in that the data reordering unit (31) comprises: a cache subunit (311), a matching subunit (312), a computation subunit (313); the matching subunit (312) is electrically connected with the cache subunit (311) and the computing subunit (313);
The cache subunit (311) is configured to:
caching FFT calculation results of the complex series;
the matching subunit (312) is configured to:
Matching a first parameter and a second parameter in the FFT computation results of the complex series, and outputting the matched first parameter and second parameter to the computation subunit (313);
the computing subunit (313) is configured to:
And calculating FFT calculation results of even columns and odd columns of the real time domain data according to the matched first parameter and second parameter, and outputting the FFT calculation results to the FFT calculation unit (32).
6. The FFT processor of claim 5 wherein the even column FFT computation results are conjugate symmetric and the odd column FFT computation results are conjugate symmetric.
7. The FFT processor of claim 1, wherein the FFT processor further comprises:
a squaring module (4), the squaring module (4) being electrically connected with the reduction module (3) and configured to:
And obtaining a square sum result of the frequency domain result of the real time domain data based on the frequency domain result of the real time domain data.
8. The FFT processor of claim 1, wherein the FFT processor further comprises: -a result sram module (5), the result sram module (5) being electrically connected with the square integration module (4) configured to:
And integrating the square sum result in a preset time based on the square sum result of the frequency domain result of the real time domain data, generating and outputting the square sum integration result of the frequency domain result of the real time domain data.
9. An operation method of an FFT processor applied to an FFT processor as claimed in any one of claims 1 to 8, comprising:
acquiring real time domain data;
Based on a complex number column formed by an even number column and an odd number column of the real number time domain data, acquiring an FFT calculation result of the complex number column by using a library Li Tuji algorithm; the even columns and the odd columns are respectively the real part and the imaginary part of the complex columns;
and generating a frequency domain result of the real time domain data based on the FFT calculation result of the complex series.
10. The method according to claim 9, wherein the obtaining the FFT calculation result of the complex sequence using the library Li Tuji algorithm based on the complex sequence consisting of the even sequence and the odd sequence of the real time domain data comprises:
rearranging data points of a complex number column formed by even number columns and odd number columns of the real number time domain data from one-dimensional data to two-dimensional data, performing line FFT operation on the complex number column, and generating an intermediate number column;
Multiplying all data points of the intermediate sequence by twiddle factors of the data points at respective positions, and performing column FFT operation to obtain FFT calculation results of the complex sequence.
CN202410193134.3A 2024-02-21 2024-02-21 FFT processor and operation method Pending CN118051709A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410193134.3A CN118051709A (en) 2024-02-21 2024-02-21 FFT processor and operation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410193134.3A CN118051709A (en) 2024-02-21 2024-02-21 FFT processor and operation method

Publications (1)

Publication Number Publication Date
CN118051709A true CN118051709A (en) 2024-05-17

Family

ID=91049777

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410193134.3A Pending CN118051709A (en) 2024-02-21 2024-02-21 FFT processor and operation method

Country Status (1)

Country Link
CN (1) CN118051709A (en)

Similar Documents

Publication Publication Date Title
JP3749022B2 (en) Parallel system with fast latency and array processing with short waiting time
CN110765709B (en) FPGA-based base 2-2 fast Fourier transform hardware design method
KR101162649B1 (en) A method of and apparatus for implementing fast orthogonal transforms of variable size
US8880575B2 (en) Fast fourier transform using a small capacity memory
US7761495B2 (en) Fourier transform processor
KR100989797B1 (en) Fast fourier transform/inverse fast fourier transform operating core
CN112231626A (en) FFT processor
US20100128818A1 (en) Fft processor
Chu et al. A prime factor FTT algorithm using distributed arithmetic
US6728742B1 (en) Data storage patterns for fast fourier transforms
WO2001078290A2 (en) Traced fast fourier transform apparatus and method
CN118051709A (en) FFT processor and operation method
KR100444729B1 (en) Fast fourier transform apparatus using radix-8 single-path delay commutator and method thereof
CN104572578B (en) Novel method for significantly improving FFT performance in microcontrollers
TWI402695B (en) Apparatus and method for split-radix-2/8 fast fourier transform
Dawwd et al. Reduced Area and Low Power Implementation of FFT/IFFT Processor.
US6438568B1 (en) Method and apparatus for optimizing conversion of input data to output data
JP3709291B2 (en) Fast complex Fourier transform method and apparatus
KR19990077845A (en) Pipelined fast fourier transform processor
CN118152710A (en) Implementation of pipeline structure of multipath parallel input and output of butterfly unit of DSP (digital Signal processor) core FFT (fast Fourier transform) coprocessor
Reisis et al. Address generation techniques for conflict free parallel memory accessing in FFT architectures
EP4307138A1 (en) Self-ordering fast fourier transform for single instruction multiple data engines
Kumar et al. Design and Implementation of AGU based FFT Pipeline Architecture
Vishwanath Efficient Hardware Architecture for Ultra-High Sampling Rate FFT Analysis of Acoustic Emission Signals
CN115146222A (en) FFT processing system, processing method and DSP processor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination