CN117194861A - Reconfigurable mixed-base FFT device supporting output pruning - Google Patents

Reconfigurable mixed-base FFT device supporting output pruning Download PDF

Info

Publication number
CN117194861A
CN117194861A CN202310921249.5A CN202310921249A CN117194861A CN 117194861 A CN117194861 A CN 117194861A CN 202310921249 A CN202310921249 A CN 202310921249A CN 117194861 A CN117194861 A CN 117194861A
Authority
CN
China
Prior art keywords
base
stage
address
data
bank
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310921249.5A
Other languages
Chinese (zh)
Inventor
黄凯
夏榕
熊东亮
蒋小文
郑丹丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202310921249.5A priority Critical patent/CN117194861A/en
Publication of CN117194861A publication Critical patent/CN117194861A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Complex Calculations (AREA)

Abstract

The application belongs to the technical field of FFT hardware design and application, and discloses a reconfigurable mixed base FFT device supporting output pruning, which comprises an FFT processor supporting arbitrary 5 x 2 k input points and arbitrary 2 x 2 k output points, wherein the FFT processor at least comprises a reconfigurable operation unit supporting base-5 and base-2 operation, a control unit supporting arbitrary 5 x 2 k input points and arbitrary 2 x 2 k output points, a memory access management unit supporting 5-point and 4-point parallel reading and writing at the same time, and an address generation unit corresponding to the memory access management unit supporting 5-point and 4-point parallel reading and writing at the same time. The application carries out reconfigurable hardware design aiming at the base-5 base-2 mixed base algorithm and part of output pruning, supports 6 reconfigurable operation modules, has no delay and high bandwidth for data read-write, greatly reduces resource expenditure while ensuring operation performance, and improves hardware utilization rate.

Description

Reconfigurable mixed-base FFT device supporting output pruning
Technical Field
The application belongs to the technical field of FFT hardware design and application, and particularly relates to a reconfigurable mixed-base FFT device supporting output pruning.
Background
The fast fourier transform (Fast Fourier Transform, FFT) is the acceleration of the discrete fourier transform (Discrete Fourier Transform, DFT), and is more suitable for being implemented in a computer system than the discrete fourier transform, so that the fast fourier transform is widely applied to various digital signal processing application fields, and the different requirements of various brand-new scenes on processing precision, processing speed and the like also bring about brand-new requirements on the FFT algorithm and corresponding implementation methods. The radix-2 FFT algorithm is suitable for sampling sequences with the sampling point number being the power of 2, has the advantages of simple realization, flexible supporting point number and the like, but in the large-point FFT operation, the disadvantage of the radix-2 FFT algorithm in the operation speed becomes obvious. In order to solve the problem, a large radix FFT such as a radix-4 FFT algorithm, a radix-8 FFT algorithm and the like are sequentially proposed, and the method is also applicable to a sampling sequence with the sampling point number of powers of 2, and the improvement of the radix can obviously improve the operation efficiency in the large-point operation and save the operation time. However, in many scenarios, the total number of points of the sampling sequence is not necessarily the power of 2, and for such cases, the simple base 2, base 4, base 8 algorithms cannot directly process, and the input sampling sequence needs to be subjected to zero padding to the nearest power of 2 and then FFT operation, or the small-point DFT operation is performed after the small-point 2 power point FFT operation. The former greatly increases redundant operation, resulting in greatly prolonged operation time; the latter operation speed will be significantly reduced in the DFT stage, which results in an extended operation time.
Based on the above requirements, the mixed-base algorithm has a great application space in many fields. In the field of electric energy quality detection, the sampling rate specified by a full-digital substation automation system meeting IEC 61850 series standards is multiple, such as 80 points under a single cycle, 200 points or 1280 points, 2560 points under multiple cycles, and the like, and a sampling sequence of non-2 power points exists. And the traditional zero padding operation based on the base-2 FFT algorithm is not suitable for all scenes, such as signal processing in a synthetic aperture radar beam-forming mode. In addition, the mixed-base FFT algorithm and corresponding hardware can be applied to the fields of LTE systems, audio processing, image processing and the like, and has a wide application range. The existing mixed base is based on base 2-base 4 and base 2-base 3, and is applied to a base 5-base 2 mixed base FFT processor suitable for the number of 5 x 2 k points.
In some scenes, such as power quality harmonic analysis, a seismic data acquisition system, and part of signal processing, which require narrow-band analysis, only partial results need to be output to meet the requirements, namely, the calculation characteristics that the output points in the traditional FFT harmonic calculation are equal to the input points have larger redundancy operation amount in the scenes. In order to solve the situation, pruning operation can be performed on the output points, so that the operation speed is further improved, and the operation time is shortened.
Therefore, in many of the above scenarios, there is still a significant gap in supporting the output pruning mixed base FFT processor suitable for the 5 x 2 x k point type, and there is a large lifting space in terms of operation speed, output point pruning mode support and multiplexing of corresponding hardware resources.
Disclosure of Invention
The application aims to provide a reconfigurable mixed-base FFT device supporting output pruning so as to solve the technical problems.
In order to solve the technical problems, the application provides a reconfigurable mixed base FFT device supporting output pruning, which comprises the following specific technical scheme:
a reconfigurable mixed base FFT device supporting output pruning comprises an FFT processor supporting arbitrary 5 x 2 x k input points and arbitrary 2 x 2 k output points calculation modes, wherein the FFT processor at least comprises a reconfigurable operation unit supporting base-5 and base-2 operation, a control unit supporting arbitrary 5 x 2 x k input points and arbitrary 2 x k output points calculation modes, a memory access management unit supporting 5 x 4 x parallel reading and writing at the same time, and an address generation unit corresponding to the memory access management unit supporting 5 x 4 x parallel reading and writing at the same time;
the reconfigurable operation unit is used for executing all operations, and the operation mode and the operation are completed under the instruction of a control signal sent by the control unit;
the control unit determines a calculation mode according to the configured points and completes the control of the whole operation, address generation and storage access process;
the address generation unit generates a bank number and an address corresponding to an operand required by each operation under the instruction of a control signal generated by the control unit;
the access management unit generates the chip selection enabling operation, the read-write control operation and the assignment operation of address data of the corresponding memory block under the instruction of the control signal generated by the control unit.
Further, the reconfigurable operation unit comprises two-stage operation and 8 sets of interstage registers, wherein the two-stage operation comprises 4 complex multipliers and 9 complex adders, a group of intermediate data is stored into the interstage registers after the operation of the first stage, and after the second stage operation of the next period starts, corresponding operands are read from the interstage registers to perform the second stage operation; the second-stage operation result is still stored in the inter-stage register, and the result data is read from the inter-stage register in the next period and then written into the corresponding memory address.
Further, the reconfigurable operation unit realizes the base-5 operation and the 2-path parallel-2-path cascade base-2 operation by multiplexing the same set of hardware resources and the output point pruning mode operation.
Further, the control unit determines a calculation mode through the combined control of the state machine and the counter, controls the operation data flow and completes the whole mixed base FFT operation.
Further, the control unit comprises a state machine and a three-stage counter, and the state machine jointly controls the working mode and various actions of the whole hardware, and comprises 3 groups of 16 states: the R5_ L, R5_C1, R5_C2LR5, R5_SC1 and R5_C2LR2 belong to a base-5 calculation state, four operations of reading state input data, first-stage operation, second-stage operation and writing result data of the base-5 operation are respectively completed in each state, the whole is four-stage running water, the second-stage operation of the last base-5 operation is completed in the last operation state R5_C2LR2 of the base-5, and operands of the base-2 operation of the next period are read in advance; R2_SC1R2, R2_C2LR2, R2_SC1 and R2_C2LP belong to a base-2 calculation state, four operations of reading state input data, first-stage operation, second-stage operation and writing result data of the base-2 operation are respectively completed in each state, R2_SC1R2 is the first base-2 operation state, the first-stage operation of the first base-2 operation is completed at the same time, the last group of base-5 operation results of the previous cycle are written into a memory, the second-stage operation of the last base-2 operation is completed in the last operation state R2_C2LP of the base-2, and the operand of the pruning operation of the next cycle is read in advance; P_Sc P, P _C L, P _Sc1, P_C2 and P_S are pruning operation states, and in each state, four operations of reading state input data, first-stage operation, second-stage operation and writing result data of pruning operation are respectively completed, P_Sc1P is the first pruning operation state, the first-stage operation of the first pruning operation is completed at the same time, the last group of base-2 operation results of the previous period are written into a memory, P_C2 and P_S correspond to the state of the last pruning operation, and a calculation completion signal is given out in the FINISH state after calculation is completed.
Further, the switching between the states is judged by counting through a counter, the three-stage counter is used for counting the stage number, the group number and the butterfly operation number of the current operation respectively, the respective counting threshold values of the three-stage counter are determined according to the configured N, MAX _ N, A, stage _num and a_num, and MAX_N is the total point number of the current operation; n=max_n/5, a is the number of output points required by the operation, and a=2 a The above information belongs to the predicted configuration content, and when in operation, software configures corresponding information registers through an AHB bus interface to finish information initialization, then writes in a start signal, starts the whole operation, and hardware finishes the whole operation in sequence according to the configured point information.
Further, the address generating unit supports generating the bank number and address of the data memory accessed in 5-point or 4-point parallel at the same time, and generating the bank number and address of the 4-way twiddle factor at the same time, so that data preparation of each operation is completed.
Further, the address generation unit includes two part functions: a data address generation module and a twiddle factor address generation module; the data memory is divided into 5 banks, the specific bank number generation is generated according to the three-level counter information in the control unit, three index values are used for accessing the lookup table, index_s comes from the last two bits of the level counter (stage_cnt), and the index of the outermost layer of the lookup table corresponds to the range of the sub-table of the current level; index_b is from a group counter and represents the current group offset, and corresponds to the second layer index of the lookup table, namely the beginning index of the sub-table, for the base-5 operation, two-stage indexes can generate the required bank number, for the base-2 operation and pruning operation, index_o is also required, the last one of a group of sub-table contents which are taken out last time is the currently required bank number inner layer table head index, and the three-stage indexes are combined, namely the required input data bank number can be generated before the operation of each stage in a mode of accessing the lookup table; the address generation rule in each address is the same as that of a common base-2 algorithm, every 5 adjacent data are in a group, the addresses of the adjacent data are in binary inverted sequence arrangement of N, 4 original address generation counters are adopted, the self-increment step length initial value is 1 after each operation according to a natural sequence, the step length of each stage of the series is increased to be twice of the previous stage, the initial values of the 4 original address generation counters correspond to the group counters in a bank generation unit respectively, and each bit of the original address is inverted, so that a real original data address is obtained; the output data address, namely the write data address, is consistent with the input data address, the corresponding input data address is stored in an output address register according to the operation data flow, and the corresponding bank corresponding address is written into according to the bank number and address in the address register after two periods.
Further, the main body of the twiddle factor address generating module is an address counter, 1 is self-increased in each base-5 operation, a current value is saved when an operand is read in the last base-5 operation and is used as an initial value of a twiddle factor address in the base-2 operation, the base address of the twiddle factor address counter is the saved twiddle factor address value in the last base-5 operation, two step sizes are needed, the step size 1 is the same as index_b generated by a bank number, and the twiddle factor address generating module is used for recording which group is currently located and is used as a second-layer initial value offset from the group counter; step 2 is self-offset of each operation, self-increment after the number is taken out of each operation, self-increment initial value is MAX_N/20, then the self-increment initial value is changed into half of the previous stage step by step, the sum of the three values is the address of the twiddle factor in the base-2 operation mode, the bank number is constant to 0-3, and the corresponding address generation module is bypassed in the pruning mode.
Further, the access management unit supports 5-way or 4-way parallel reading or writing operation on the data memory and supports 4-way parallel reading on the twiddle factor memory, the access management unit is a module for converting the bank number and the address value of the address generation unit into chip selection and reading-writing access operation on a specific memory, a bank number group to be accessed is selected according to the content of a bank number lookup table, if the bank number group has a number corresponding to a certain RAM bank, a chip selection signal corresponding to the bank is pulled down to select the bank, and an address value corresponding to the bank number is assigned to a read data address raddr of the RAM bank; if not, the chip selection is kept at a high level, the read-write selection of the RAM bank is determined according to a load/store signal from the control unit, the read is performed when the load is at the high level, the we_n signal of the RAM bank is at the low level, the write is performed when the store is at the high level, the chip selection of the twiddle factor memory is effective when the high level of the load signal comes each time, and twiddle factor address values are assigned to read data addresses of the corresponding bank.
The reconfigurable mixed base FFT device and method supporting output pruning have the following advantages: the application carries out reconfigurable hardware design aiming at the base-5 base-2 mixed base algorithm and part of output pruning, supports 6 reconfigurable operation modules, has no delay and high bandwidth for data read-write, greatly reduces resource expenditure while ensuring operation performance, and improves hardware utilization rate.
Drawings
FIG. 1 is a block diagram of a reconfigurable hybrid FFT apparatus according to the application;
FIG. 2 is a schematic diagram of a reconfigurable computing unit according to the present application;
FIG. 3 is a table of input data for the arithmetic unit element of the present application;
FIG. 4 is a state transition diagram of a controller according to the present application;
FIG. 5 is a schematic diagram of a three stage control counter according to the present application;
FIG. 6 is a diagram of a 20-point conflict-free access process according to the present application;
FIG. 7 is a table of look-up table of collision-free data addresses bank numbers according to the present application;
FIG. 8 is a schematic diagram of an address generation unit according to the present application;
FIG. 9 is a schematic diagram of a conflict-free twiddle factor address generation module according to the present application.
Detailed Description
For a better understanding of the objects, structures and functions of the present application, a reconfigurable hybrid-based FFT apparatus supporting output pruning will be described in further detail with reference to the accompanying drawings.
As shown in FIG. 1, the reconfigurable mixed-base FFT device supporting output pruning comprises an FFT processor supporting arbitrary 5 x 2 k input points and arbitrary 2 x 2 k output points calculation modes, the FFT processor at least comprises a reconfigurable operation unit supporting base-5 and base-2 operations, the memory system comprises a control unit supporting any 5 x 2 k input point number and any 2 x 2 k output point number calculation mode, a memory management unit supporting 5-point and 4-point parallel reading and writing at the same time, and an address generation unit corresponding to the memory management unit supporting 5-point and 4-point parallel reading and writing at the same time.
The reconfigurable operation unit is used for executing all operations, and the operation mode and the operation are completed under the instruction of the control signal sent by the control unit;
the control unit determines a calculation mode according to the configured points and completes the control of the whole operation, address generation and storage access process;
the address generation unit generates a bank number and an address corresponding to an operand required by each operation under the instruction of a control signal generated by the control unit;
and the access management unit generates operations such as chip selection enabling, read-write control and address data assignment of the corresponding memory block under the instruction of the control signal generated by the control unit.
The reconfigurable operation unit takes the optimized base-5 operation as a core to realize multiplexing the same set of hardware resources to realize base-2 operation hardware, and can simultaneously support the base-5 operation and 2 parallel-2 cascade base-2 operation by adding a plurality of selectors and corresponding control signals, and also support an output point pruning mode so as to reduce the operation amount and the operation time. As shown in fig. 2, the reconfigurable operation unit includes 4 complex multipliers, 9 complex adders (the positions of add/sub8 in the path are dynamically configured according to the pipeline stage), and 8 sets of inter-stage registers. The complete base-5 operation is divided into two stages, and the operation elements of each stage participating in the operation are all 4 complex multipliers and 9 complex adders. After the operation of the first stage, a group of intermediate data is stored into the inter-stage register, and after the operation of the second stage of the next period begins, the corresponding operand is read from the inter-stage register group to perform the operation of the second stage. The second-stage operation result is still stored in the inter-stage register, and the result data is read from the inter-stage register in the next period and then written into the corresponding memory address. Therefore, the read-write resource conflict of the memory can be avoided in the whole calculation process.
The 2-way parallel-2-way cascade base-2 operation is also divided into two-stage operation, which is to optimize the timing and shorten the critical path. In each stage of operation, 2 complex multipliers and 4 complex adders are used respectively, and after the operation of the first stage, a group of intermediate data is stored into the inter-stage registers, and the inter-stage registers only need 4. And after the second-stage operation starts, reading corresponding operands from the inter-stage register set to perform the second-stage operation. The second stage operation result is still stored in the 4 inter-stage registers, and is rewritten in the corresponding memory addresses in the next cycle. Therefore, the read-write resource conflict of the memory can be avoided in the whole calculation process.
In pruning mode, the operation is still divided into two stages, the first stage uses 2 complex adders, the second stage uses 1 complex adder, 2 inter-stage registers are used when storing intermediate result data of the first stage operation, only 1 inter-stage register is needed when storing the second stage result data, and the second stage result data is written into a corresponding memory address in the next period.
Input data of each element of the reconfigurable operation unit in each operation mode is shown in fig. 3.
The control unit determines a calculation mode through the joint control of the state machine and the counter, controls the operation data flow and completes the whole mixed base FFT operation. The control unit comprises a state machine and a three-stage counter, and jointly controls the working mode and various actions of the whole hardware. As shown in FIG. 4, the state machine comprises 3 groups of 16 states, namely R5-L, R-C1, R5-C2 LR5, R5-SC 1 and R5-C2 LR2, which belong to the base-5 computing states, and four operations of reading state input data, first-stage operation, second-stage operation and writing result data of the base-5 operation are respectively completed in each state, and the whole is four-stage pipelining. The second stage operation of the last base-5 operation is completed in the last operation state R5_C2LR2 of the base-5, and the operand of the base-2 operation of the next cycle is read in advance. R2-SC 1R2, R2-C2 LR2, R2-SC 1, R2-C2 LP belong to the base-2 computing states, and each state is respectively completed with four operations of reading state input data, first-stage operation, second-stage operation and writing result data of the base-2 operation. R2 SC1R2 is the first base-2 operation state, and the first level operation of the first base-2 operation is completed, and the last group of base-5 operation results of the previous period are written into the memory. The second-stage operation of the last base-2 operation is completed in the last operation state R2_C2LP of the base-2, and the operand of the pruning operation of the next period is read in advance. The P_Sc P, P _C2L, P _Sc1, P_C2 and P_S are pruning operation states, and four operations of reading state input data, first-stage operation, second-stage operation and writing result data of pruning operation are respectively completed in each state. P_SC1P is the first pruning operation state, and the first stage operation of the first pruning operation is completed at the same time, and the last group of base-2 operation results of the previous period are written into the memory. P_C2 and P_S correspond to the state of the last pruning operation, and a calculation completion signal is given out in the FINISH state after calculation is completed.
The switching between the states is judged by counting through a counter, and a three-stage counter is shown in fig. 5 and is used for counting the stage number, the group number and the butterfly operation number of the current operation. The respective counting threshold values of the three-level counter are determined according to the configured N, MAX _ N, A, stage _num and a_num, and MAX_N is the total number of the operation; n=max_n/5, for example, for max_n=1280, there is n=256; a is the number of output points required by the operation, and a=2 a For example a=32, a=5. The above information belongs to the foreseeable configuration content. At run-time, the software passes through the AHB busThe interface configures a corresponding information register to finish information initialization, then writes a start signal, starts the whole operation, and the hardware finishes the whole operation in sequence according to the configured point information.
The address generation unit supports the simultaneous generation of the bank number and address of the 5-point or 4-point parallel access data memory, and supports the simultaneous generation of the bank number and address of the 4-way twiddle factor, thereby completing the data preparation of each operation. The address generation unit includes two part functions: the device comprises a data address generation module and a twiddle factor address generation module. The data memory is divided into 5 banks, taking 20 points as an example, the storage rule of the conflict-free access memory is shown in fig. 6, the data of the same color block represents the data accessed in parallel in the same group, the access rule can be realized in the form of a lookup table, and the lookup table is shown in fig. 7. The specific bank number is generated according to the three-level counter information in the control unit. Accessing the lookup table with three index values, wherein index_s is from the last two bits of a stage counter (stage_cnt) and corresponds to the outermost index of the lookup table, and the range of the sub-table of the current stage; index_b is from the group counter and represents the current group offset, corresponding to the second layer index of the lookup table, i.e. the start index of the sub-table. For the base-5 operation, the two-stage index can generate the required bank number, and for the base-2 operation and pruning operation, index_o is also required, and the last one of a group of sub-table contents fetched last time is the currently required bank number inner layer header index. The three-level index combination can generate the required input data bank number before the operation of each level by accessing the lookup table.
The generation rule of the address in each address is more obvious, and similar to the common base-2 algorithm, every 5 adjacent data are in a group, and the addresses are arranged in a binary reverse order of N. Therefore, 4 original address generation counters are adopted, as shown in fig. 8, the self-increment step size is self-increased after each operation according to a natural sequence, the initial value of the self-increment step size is 1, and each step size of the number of stages is increased by one step size to be twice as large as that of the previous stage. The initial values of the 4 original address generation counters correspond to the group counters in the bank generation unit, respectively. And reversing each bit of the original address to obtain a real original data address.
Because the application reserves the characteristic of the in-situ operation of the FFT algorithm, the output data address, namely the write data address, is consistent with the input data address, the corresponding input data address is stored in the output address register according to the operation data flow, and the corresponding address is written into the corresponding bank according to the bank number and address in the address register after two periods.
As shown in FIG. 9, the twiddle factor address generation module is mainly an address counter, which is self-increased by 1 in each base-5 operation, and stores the current value when the operand is read in the last base-5 operation as the initial value of the twiddle factor address of the base-2 operation. In the operation of the base-2, the base address of the twiddle factor address counter is the twiddle factor address value of the last saved base-5 operation, two steps are needed, the step 1 is similar to index_b generated by a bank number, and the twiddle factor address counter is used for recording which group is currently in and taking the group as the initial value offset of a second layer; step 2 is the self-offset of each operation, the self-increment is performed after the number is fetched for each operation, the self-increment initial value is MAX_N/20, and then the step by step is changed into half of the upper stage. The sum of the three values is the address of the twiddle factor in the base-2 operation mode, and the bank number is constant to 0-3. In the pruning mode, the multiplication operation is not involved, so that the twiddle factor does not need to be read, and the corresponding address generation module is bypassed for reducing the power consumption.
The memory management unit supports 5-way or 4-way parallel read or write operations to the data memory and 4-way parallel reads to the twiddle factor memory. The memory management unit is a module for converting the bank number and the address value of the address generation unit into chip selection and read-write access operation to a specific memory. And selecting a bank number group to be accessed currently according to the content of the bank number lookup table, if the bank number group contains the number corresponding to a certain RAM bank, pulling down (valid low level) a chip selection signal corresponding to the bank to select the bank, and assigning an address value corresponding to the bank number to a read data address (raddr) of the RAM bank. If not, the chip select remains high. The read/write selection of the RAM bank is determined based on the load/store signal from the control unit, the we_n signal of the RAM bank is read when the load is high (active low), the we_n signal of the RAM bank is write when the store is high. The chip select of the twiddle factor memory is active whenever the high level of the load signal comes, and twiddle factor address values are assigned to read data addresses of the corresponding bank.
It will be understood that the application has been described in terms of several embodiments, and that various changes and equivalents may be made to these features and embodiments by those skilled in the art without departing from the spirit and scope of the application. In addition, many modifications may be made to adapt a particular situation or material to the teachings of the application without departing from the essential scope thereof. Therefore, it is intended that the application not be limited to the particular embodiment disclosed, but that the application will include all embodiments falling within the scope of the appended claims.

Claims (10)

1. A reconfigurable mixed-base FFT device supporting output pruning is characterized by comprising an FFT processor supporting arbitrary 5 x 2 x k input points and arbitrary 2 x 2 k output points calculation modes, wherein the FFT processor at least comprises a reconfigurable operation unit supporting base-5 and base-2 operations, a control unit supporting arbitrary 5 x 2 k input points and arbitrary 2 x 2 k output points calculation modes, a memory access management unit supporting 5-point and 4-point parallel reading and writing at the same time, and an address generation unit corresponding to the memory access management unit supporting 5-point and 4-point parallel reading and writing at the same time;
the reconfigurable operation unit is used for executing all operations, and the operation mode and the operation are completed under the instruction of a control signal sent by the control unit;
the control unit determines a calculation mode according to the configured points and completes the control of the whole operation, address generation and storage access process;
the address generation unit generates a bank number and an address corresponding to an operand required by each operation under the instruction of a control signal generated by the control unit;
the access management unit generates the chip selection enabling operation, the read-write control operation and the assignment operation of address data of the corresponding memory block under the instruction of the control signal generated by the control unit.
2. The reconfigurable hybrid-based FFT apparatus of claim 1, wherein the reconfigurable operation unit comprises a two-stage operation and 8 sets of inter-stage registers, the two-stage operation comprises 4 complex multipliers and 9 complex adders, one set of intermediate data is stored into the inter-stage registers after the operation of the first stage, and after the second-stage operation of the next cycle begins, the corresponding operands are read from the inter-stage register set for the second-stage operation; the second-stage operation result is still stored in the inter-stage register, and the result data is read from the inter-stage register in the next period and then written into the corresponding memory address.
3. The reconfigurable hybrid-base FFT apparatus according to claim 1, wherein the reconfigurable operation unit implements a base-5 operation and a 2-way parallel-2-way concatenated base-2 operation and an output point pruning mode operation by multiplexing the same set of hardware resources.
4. The reconfigurable hybrid-base FFT apparatus of claim 1, wherein the control unit determines the calculation mode by the joint control of the state machine and the counter, controls the operation data flow, and completes the entire hybrid-base FFT operation.
5. The reconfigurable hybrid-based FFT apparatus of claim 1, wherein the control unit comprises a state machine and a three-stage counter, which collectively control the operation mode and various actions of the entire hardware, the state machine comprising 3 sets of 16 states: the R5_ L, R5_C1, R5_C2LR5, R5_SC1 and R5_C2LR2 belong to a base-5 calculation state, four operations of reading state input data, first-stage operation, second-stage operation and writing result data of the base-5 operation are respectively completed in each state, the whole is four-stage running water, the second-stage operation of the last base-5 operation is completed in the last operation state R5_C2LR2 of the base-5, and operands of the base-2 operation of the next period are read in advance; R2_SC1R2, R2_C2LR2, R2_SC1 and R2_C2LP belong to a base-2 calculation state, four operations of reading state input data, first-stage operation, second-stage operation and writing result data of the base-2 operation are respectively completed in each state, R2_SC1R2 is the first base-2 operation state, the first-stage operation of the first base-2 operation is completed at the same time, the last group of base-5 operation results of the previous cycle are written into a memory, the second-stage operation of the last base-2 operation is completed in the last operation state R2_C2LP of the base-2, and the operand of the pruning operation of the next cycle is read in advance; P_Sc P, P _C L, P _Sc1, P_C2 and P_S are pruning operation states, and in each state, four operations of reading state input data, first-stage operation, second-stage operation and writing result data of pruning operation are respectively completed, P_Sc1P is the first pruning operation state, the first-stage operation of the first pruning operation is completed at the same time, the last group of base-2 operation results of the previous period are written into a memory, P_C2 and P_S correspond to the state of the last pruning operation, and a calculation completion signal is given out in the FINISH state after calculation is completed.
6. The reconfigurable hybrid-based FFT apparatus of claim 5, wherein the switching between the states is determined by counting by a counter, the three-stage counter is used for counting the stage number, the group number and the butterfly operation number of the current operation, the respective count threshold of the three-stage counter is determined according to the configured N, MAX _ N, A, stage _num and a_num, and max_n is the total number of points of the current operation; n=max_n/5, a is the number of output points required by the operation, and a=2 a The above information belongs to the predicted configuration content, and when in operation, software configures corresponding information registers through an AHB bus interface to finish information initialization, then writes in a start signal, starts the whole operation, and hardware finishes the whole operation in sequence according to the configured point information.
7. The reconfigurable hybrid-base FFT apparatus of claim 1, wherein the address generation unit supports simultaneous generation of a bank number and an address of a 5-point or 4-point parallel access data memory, and supports simultaneous generation of a bank number and an address of a 4-way twiddle factor, thereby completing data preparation per operation.
8. The reconfigurable hybrid-base FFT apparatus of claim 7, wherein the address generation unit comprises two-part functions: a data address generation module and a twiddle factor address generation module; the data memory is divided into 5 banks, the specific bank number generation is generated according to the three-level counter information in the control unit, three index values are used for accessing the lookup table, index_s comes from the last two bits of the level counter (stage_cnt), and the index of the outermost layer of the lookup table corresponds to the range of the sub-table of the current level; index_b is from a group counter and represents the current group offset, and corresponds to the second layer index of the lookup table, namely the beginning index of the sub-table, for the base-5 operation, two-stage indexes can generate the required bank number, for the base-2 operation and pruning operation, index_o is also required, the last one of a group of sub-table contents which are taken out last time is the currently required bank number inner layer table head index, and the three-stage indexes are combined, namely the required input data bank number can be generated before the operation of each stage in a mode of accessing the lookup table; the address generation rule in each address is the same as that of a common base-2 algorithm, every 5 adjacent data are in a group, the addresses of the adjacent data are in binary inverted sequence arrangement of N, 4 original address generation counters are adopted, the self-increment step length initial value is 1 after each operation according to a natural sequence, the step length of each stage of the series is increased to be twice of the previous stage, the initial values of the 4 original address generation counters correspond to the group counters in a bank generation unit respectively, and each bit of the original address is inverted, so that a real original data address is obtained; the output data address, namely the write data address, is consistent with the input data address, the corresponding input data address is stored in an output address register according to the operation data flow, and the corresponding bank corresponding address is written into according to the bank number and address in the address register after two periods.
9. The reconfigurable hybrid-base FFT apparatus of claim 8, wherein the twiddle factor address generation module body is an address counter, which is self-incremented by 1 in each radix-5 operation, stores the current value as the initial value of the radix-2 operation twiddle factor address when the operand is read in the last radix-5 operation, and in the radix-2 operation, the radix address of twiddle factor address counter is the saved twiddle factor address value of the last radix-5 operation, two steps are additionally required, step 1 is the same as index_b generated by bank number, and is derived from a group counter for recording which group is currently located, which is used as the second layer initial value offset; step 2 is self-offset of each operation, self-increment after the number is taken out of each operation, self-increment initial value is MAX_N/20, then the self-increment initial value is changed into half of the previous stage step by step, the sum of the three values is the address of the twiddle factor in the base-2 operation mode, the bank number is constant to 0-3, and the corresponding address generation module is bypassed in the pruning mode.
10. The reconfigurable hybrid-base FFT apparatus of claim 1, wherein the memory management unit supports 5-way or 4-way parallel read or write operations to a data memory and supports 4-way parallel read operations to a twiddle factor memory, the memory management unit is a module that converts a bank number and an address value of an address generation unit into chip select and read-write access operations to a specific memory, selects a bank number group to be accessed currently according to the content of a bank number lookup table, pulls down a chip select signal corresponding to a bank by selecting the bank if the bank number group has a number corresponding to a certain RAM bank, and assigns an address value corresponding to the bank number to a read data address raddr of the RAM bank; if not, the chip selection is kept at a high level, the read-write selection of the RAM bank is determined according to a load/store signal from the control unit, the read is performed when the load is at the high level, the we_n signal of the RAM bank is at the low level, the write is performed when the store is at the high level, the chip selection of the twiddle factor memory is effective when the high level of the load signal comes each time, and twiddle factor address values are assigned to read data addresses of the corresponding bank.
CN202310921249.5A 2023-07-26 2023-07-26 Reconfigurable mixed-base FFT device supporting output pruning Pending CN117194861A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310921249.5A CN117194861A (en) 2023-07-26 2023-07-26 Reconfigurable mixed-base FFT device supporting output pruning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310921249.5A CN117194861A (en) 2023-07-26 2023-07-26 Reconfigurable mixed-base FFT device supporting output pruning

Publications (1)

Publication Number Publication Date
CN117194861A true CN117194861A (en) 2023-12-08

Family

ID=88993118

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310921249.5A Pending CN117194861A (en) 2023-07-26 2023-07-26 Reconfigurable mixed-base FFT device supporting output pruning

Country Status (1)

Country Link
CN (1) CN117194861A (en)

Similar Documents

Publication Publication Date Title
US4980817A (en) Vector register system for executing plural read/write commands concurrently and independently routing data to plural read/write ports
US7640284B1 (en) Bit reversal methods for a parallel processor
WO2019205617A1 (en) Calculation method and apparatus for matrix multiplication
WO2022037257A1 (en) Convolution calculation engine, artificial intelligence chip, and data processing method
CN112487750A (en) Convolution acceleration computing system and method based on memory computing
WO2018027706A1 (en) Fft processor and algorithm
CN109993293B (en) Deep learning accelerator suitable for heap hourglass network
US11397791B2 (en) Method, circuit, and SOC for performing matrix multiplication operation
CN112905530B (en) On-chip architecture, pooled computing accelerator array, unit and control method
US5168573A (en) Memory device for storing vector registers
CN116521611A (en) Generalized architecture design method of deep learning processor
CN111353586A (en) System for realizing CNN acceleration based on FPGA
US9268744B2 (en) Parallel bit reversal devices and methods
US20220004855A1 (en) Convolution processing engine and control method, and corresponding convolutional neural network accelerator
CN117194861A (en) Reconfigurable mixed-base FFT device supporting output pruning
CN113448624B (en) Data access method, device, system and AI accelerator
US20220180162A1 (en) Ai accelerator, cache memory and method of operating cache memory using the same
CN113392963B (en) FPGA-based CNN hardware acceleration system design method
US11500629B2 (en) Processing-in-memory (PIM) system including multiplying-and-accumulating (MAC) circuit
CN115374395A (en) Hardware structure for carrying out scheduling calculation through algorithm control unit
CN111368250B (en) Data processing system, method and equipment based on Fourier transformation/inverse transformation
CN115081603A (en) Computing device, integrated circuit device and board card for executing Winograd convolution
CN109117114A (en) A kind of low complex degree approximation multiplier based on look-up table
CN114185514B (en) Polynomial multiplier based on fee Ma Moshu
Kong et al. A high efficient architecture for convolution neural network accelerator

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination