CN113190787A - FFT processor based on approximate complex multiplier - Google Patents

FFT processor based on approximate complex multiplier Download PDF

Info

Publication number
CN113190787A
CN113190787A CN202110452797.9A CN202110452797A CN113190787A CN 113190787 A CN113190787 A CN 113190787A CN 202110452797 A CN202110452797 A CN 202110452797A CN 113190787 A CN113190787 A CN 113190787A
Authority
CN
China
Prior art keywords
input
multiplier
unit
gate
output signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110452797.9A
Other languages
Chinese (zh)
Other versions
CN113190787B (en
Inventor
刘伟强
杜锦鹤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN202110452797.9A priority Critical patent/CN113190787B/en
Publication of CN113190787A publication Critical patent/CN113190787A/en
Application granted granted Critical
Publication of CN113190787B publication Critical patent/CN113190787B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/14Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
    • G06F17/141Discrete Fourier transforms
    • G06F17/142Fast Fourier transforms, e.g. using a Cooley-Tukey type algorithm
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/52Multiplying; Dividing
    • G06F7/523Multiplying only
    • G06F7/533Reduction of the number of iteration steps or stages, e.g. using the Booth algorithm, log-sum, odd-even
    • G06F7/5334Reduction of the number of iteration steps or stages, e.g. using the Booth algorithm, log-sum, odd-even by using multiple bit scanning, i.e. by decoding groups of successive multiplier bits in order to select an appropriate precalculated multiple of the multiplicand as a partial product
    • G06F7/5336Reduction of the number of iteration steps or stages, e.g. using the Booth algorithm, log-sum, odd-even by using multiple bit scanning, i.e. by decoding groups of successive multiplier bits in order to select an appropriate precalculated multiple of the multiplicand as a partial product overlapped, i.e. with successive bitgroups sharing one or more bits being recoded into signed digit representation, e.g. using the Modified Booth Algorithm
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Discrete Mathematics (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses an FFT processor based on approximate complex multiplier, which reduces the resource consumption during operation and improves the operation rate and the processor performance under the condition that the precision is kept at a certain level by approximating a Booth coding unit and a partial product compression unit in the multiplier in the FFT processor.

Description

FFT processor based on approximate complex multiplier
Technical Field
The invention belongs to the field of design of FFT (fast Fourier transform) processors, and particularly relates to an FFT processor based on an approximate complex multiplier.
Background
Conventional computer performance is simply pursuing accurate operation. This trend faces technical challenges in terms of power consumption, circuit reliability, and high performance. Approximation calculations have been proposed for energy efficient systems for emerging fault tolerant applications (e.g., speech recognition, image processing, data mining, video processing, etc.) that do not require the full accuracy sought. Digital Signal Processing (DSP) is also a fault tolerant calculation, and applying approximate calculations in DSP calculations is an efficient way to achieve low power consumption and high performance.
The Fast Fourier Transform (FFT) is a fast algorithm of the Discrete Fourier Transform (DFT). It is obtained by improving the algorithm of discrete Fourier transform according to the characteristics of odd, even, imaginary and real of the discrete Fourier transform. Since the FFT can be widely applied to various applications, the hardware structure of the FFT is also widely studied and optimized to adapt to different applications. The hardware implementation scheme of FFT mainly has two structures of reconfigurable structure and fixed structure. The variable-length FFT generally uses a reconfigurable FFT, and a common method used for FFTs of various transform lengths is a mixed-basis algorithm. The FFT of the fixed structure can also be divided into a parallel structure and a pipeline structure. The most typical of the pipeline architecture are multi-path delay switching (MDC) and single-path delay feedback (SDC). Different structures have different advantages and disadvantages, the parallel structure can process N inputs simultaneously, the delay is small, the control is easy, but more hardware resources are consumed; the pipeline architecture is simpler, but the data is processed sequentially with greater latency.
Disclosure of Invention
In order to solve the technical problems mentioned in the background art, the present invention provides an FFT processor based on an approximate complex multiplier.
In order to achieve the technical purpose, the technical scheme of the invention is as follows:
an FFT processor based on approximate complex multiplier comprises a plurality of basic units which are sequentially cascaded, each basic unit comprises a butterfly operation unit and m feedback units, each butterfly operation unit comprises a signal input end, a signal output end, m feedback input ends and m feedback output ends, m is a positive integer, each feedback output end is connected with the corresponding feedback output end through the corresponding feedback unit, the signal output end of the butterfly operation unit in the previous basic unit is connected with the signal input end of the butterfly operation unit in the next basic unit through a complex multiplier, the signal output by the butterfly operation unit in the previous basic unit is subjected to complex multiplication with twiddle factors in the complex multiplier and then is used as the input signal of the butterfly operation unit in the next basic unit, and the complex multiplier comprises a first subtractor, a second subtractor, a third subtractor, a fourth subtractor, a fifth a sixth subtractor, a sixth subtractor, a sixth subtractor, a sixth subtractor, a sixth adder, First to third adders and first to third multipliers, two input ends of the first subtractor respectively input a real part and an imaginary part of an output signal of the butterfly operation unit in the previous basic unit, two input ends of the second subtractor respectively input a real part and an imaginary part of the twiddle factor, two input ends of the first adder respectively input a real part and an imaginary part of an output signal of the butterfly operation unit in the previous basic unit, two input ends of the first multiplier respectively input an output signal of the first subtractor and an imaginary part of the twiddle factor, two input ends of the second multiplier respectively input an output signal of the second subtractor and a real part of an output signal of the butterfly operation unit in the previous basic unit, two input ends of the third multiplier respectively input an output signal of the first adder and a real part of the twiddle factor, two input ends of the second adder respectively input an output signal of the first multiplier and an output signal of the second multiplier, two input ends of the third adder respectively input the inverted signal of the output signal of the second multiplier and the output signal of the third multiplier; each multiplier comprises a Booth coding unit, a partial product compression unit and a quick summation unit, wherein the Booth coding unit is used for coding two multipliers to quickly generate partial products, the partial product compression unit is used for compressing the generated partial products to quickly obtain two rows of partial products, and the quick summation unit is used for adding the two rows of partial products by using a quick adder to generate a final product;
and performing approximate design on the Booth coding unit and the partial product compression unit, wherein the partial product expression of the Booth coding unit after the approximate design is as follows:
Figure BDA0003039467180000031
wherein the generated partial products are arranged as partial product arrays, ppijIs the partial product of the ith row and the jth column in the partial product array, ajIs the j-th bit of the multiplier, b2i+1Is the 2i +1 th bit of data in another multiplier,
Figure BDA0003039467180000032
represents an exclusive or operation;
designing an approximate 4-2 compressor for the partial volume compression unit, wherein the approximate 4-2 compressor comprises an OR gate, a first NOR gate, a third NOR gate and a first NOR gate, a second NOR gate, for 4 partial products in the same column in the partial product array, two input terminals of a first nor gate respectively input the partial products of the first row and the second row in the column, two input terminals of a second nor gate respectively input the partial products of the third row and the fourth row in the column, two input terminals of a third nor gate respectively input an output signal of the first nor gate and an output signal of the second nor gate, two input terminals of a first nor gate respectively input the partial products of the first row and the second row in the column, two input terminals of a second nor gate respectively input the partial products of the third row and the fourth row in the column, the two input ends of the OR gate respectively input the output signal of the first AND gate and the output signal of the second OR gate.
Furthermore, after the Booth coding unit generates a partial product array, the symbol compensation bit of the last row is directly deleted.
Further, a non-precision factor n is set, only the lower n least significant bits of the multiplier are approximated, and n is a positive integer.
Adopt the beneficial effect that above-mentioned technical scheme brought:
the invention approximates the Booth coding unit and the partial product compression unit in the multiplier in the FFT processor, reduces the resource consumption during operation and improves the operation rate and the processor performance under the condition that the precision is kept at a certain level.
Drawings
FIG. 1 is a view of the structure of an N-point R4 SDF;
FIG. 2 is a diagram of one stage operation of Radix4 pipeline type FFT;
FIG. 3 is a schematic diagram of a butterfly unit;
FIG. 4 is a diagram of a logic gate structure for Booth encoding with approximate optimization in the present invention;
FIG. 5 is a partial product array dot diagram according to the present invention;
FIG. 6 is a block diagram of the logic gates of the near optimized 4-2 compressor of the present invention;
fig. 7 is a 16-bit multiplier partial product array point diagram with non-precision factors of 8 and 16 in accordance with the present invention.
Detailed Description
The technical scheme of the invention is explained in detail in the following with the accompanying drawings.
Fig. 1 is an N-point R4SDF structure, which uses a shift register to delay data, and the storage utilization rate is improved by arranging the transmission path appropriately. The difference from the MDC structure is that the number of data paths in the MDC structure is directly related to the selected Radix alpha algorithm, the whole resource utilization rate is reduced rapidly along with the increase of alpha, most time is wasted in storage reading, and only one data path is needed among each level of the SDF structure no matter the algorithms such as Radix2, Radix4 and Radix8 are selected. Although the number of stages and the number of butterfly units required by the SDF structure are the same as those of the MDC structure, the resource utilization rate of each module is greatly improved.
The butterfly operation of the Radix4 algorithm can be shown in the left half of fig. 2, and the Radix4 butterfly unit completes the operation amount of Radix2 at two stages, wherein the required operation steps are also simplified to a certain extent. The first stage operation of Radix4 pipeline FFT can be represented as fig. 2, where the butterfly module has a small part of resources for controlling the data storage and butterfly operation. In the pipeline type FFT processor, the data input is a time continuous sequence, and there are different distances between butterfly input data according to the difference of "stages", so that it is necessary to perform delay operation processing on the input sequence and extract the stored data at an appropriate timing when necessary. Taking a butterfly operation unit of Radix4 as an example, the module interface design is shown in fig. 3, where Bank1_ in, Bank2_ in and Bank3_ in are input sequences of previous time read from three memory cells respectively, Data _ in is valid input of the current stage, validity of Data input is controlled by Data _ in _ valid, Clk and Rst _ n are global clock and reset signals, Dout _ re and Dout _ im are real and imaginary parts of output, and Dout _ valid controls validity of Data output. The butterfly operation unit only needs to operate butterfly calculation based on complex addition and subtraction operation and real part and imaginary part exchange, and the data storage read-write control part is completed by the aid of the control unit in the stage where the butterfly operation unit is located.
The complex multiplier part is shown in the right half of fig. 2. The design of the invention adopting the non-precise basis-4 Booth multiplier mainly comprises three parts: a radix-4 Booth coding unit, a partial product compression unit and a fast summation unit. The Booth coding is used for coding the multiplier and the multiplicand, quickly generating partial products and reducing the number of rows and the number of the partial products. The partial product compression unit compresses the partial product to quickly obtain the final two-row partial product, and effectively shortens the key path of the multiplier. The fast summation is the addition of the final two-row partial products with a fast adder to produce the final product. Booth encoding units and compression units in the three modules are the units used most in the operation process of the multiplier. Taking a 16-bit multiplier as an example, 144 Booth encoders are needed to generate 144 Booth encoders to generate 144 regular partial products (dividing sign extension bits and sign compensation bits), then two rows of partial products are generated through 80 4-2 compressors and 32 carry-save adders, and finally a fast adder is adopted to generate a final product. Therefore, the Booth coding module is subjected to non-precise optimization design, and the performance of the multiplier can be improved to the greatest extent.
The non-exact base-4 Booth code designed by the present invention is shown in Table 1 below:
TABLE 1
Figure BDA0003039467180000051
The partial product expression is as follows:
Figure BDA0003039467180000061
the structure of the logic gate is shown in fig. 4.
The exact radix-4 Booth encoding produces a partial product array with a sign offset bit in the last row, which makes the partial product array irregular. To ensure that an accurate final product is obtained, the design of an accurate multiplier requires one stage of compression for this row alone, so that the multiplier design requires more compressors and longer critical paths. In order to design a more regular partial product array, the non-exact radix-4 Booth multiplier design in the design can directly discard the sign offset bits of the last row. An 8-bit multiplier partial product array point diagram is taken as an example, as shown in fig. 5. FIG. 5 (a) shows the irregular partial product array distribution of an exact 8-bit multiplier after radix-4 Booth encoding, with solid black boxes representing the compressor; ● denotes the conventional partial product; o represents a symbol extension bit; circa represents a symbol compensation bit; Δ represents the sign offset bit of row 5. Fig. 5 (b) shows a regular partial product array of the non-exact 8-bit multiplier after radix-4 Booth encoding, which can generate two rows of partial products to be added finally after one-stage compression.
The non-exact regular partial product array reduces one-stage compression of the non-exact multiplier at the expense of dropping one sign offset bit to yield an error probability of 37.5%. Meanwhile, in the partial product array of the radix-4 Booth multiplier, the position generated by the sign compensation bit belongs to the lower weight bit, and the error distance is within the error tolerance range.
The logic gate circuit of the approximate 4-2 compressor designed by the invention is shown in FIG. 6 and comprises three NOR gates, two same OR gates and one OR gate, and the logic expression of the logic gate circuit is as follows:
Figure BDA0003039467180000062
Figure BDA0003039467180000063
the present invention may employ multipliers of different degrees of approximation. The inexact factor is set to 8, that is, the last 8 bits of the multiplier operation are approximate operation, and the rest are exact operation, the operation point diagram is shown as (1) in fig. 7, only the last 9 bits of the last result are approximate, when the intermediate bit width of the FFT is 16 bits, the result after the first 16 bits is truncated, so that the 9-bit inexact bits have little influence on the last result. The inexact factor is set to be 16, that is, the last 17 bits of the multiplier operation are approximate operation, and the rest are exact operation, the operation point diagram is shown as (2) in fig. 7, only the last 17 bits of the final result are approximate, when the intermediate bit width of the FFT is 16 bits, the result after the first 16 bits is truncated, so that the influence of the 17-bit inexact bits on the final result is very small.
The parameter results of this example were compared with those of the prior art, and the comparison results are shown in table 2 below:
TABLE 2
Figure BDA0003039467180000071
In the table, FFT-ex is the prior art, and FFT-1 is replaced by an approximate Booth multiplier with a non-precision factor of 8; FFT-1 is replaced by an approximate Booth multiplier of non-exact factor 16.
As can be seen from the table, in the case that the SNR reduction does not exceed 1dB, the present invention reduces 5.9% of LUT in 64-point FFT-1, 12.7% of LUT in 64-point FFT-2, 11.6% of LUT in 256-point FFT-1, 20.3% of LUT in 256-point FFT-2, 7.8% of LUT in 1024-point FFT-1, and 16.5% of LUT in 256-point FFT-2, compared with the prior art. The design can also obviously reduce the use amount of FF in FFT and can improve the frequency to the original 141.8% -146.1%.
The embodiments are only for illustrating the technical idea of the present invention, and the technical idea of the present invention is not limited thereto, and any modifications made on the basis of the technical scheme according to the technical idea of the present invention fall within the scope of the present invention.

Claims (3)

1. An FFT processor based on approximate complex multiplier comprises a plurality of basic units which are sequentially cascaded, each basic unit comprises a butterfly operation unit and m feedback units, each butterfly operation unit comprises a signal input end, a signal output end, m feedback input ends and m feedback output ends, m is a positive integer, each feedback output end is connected with the corresponding feedback output end through the corresponding feedback unit, the signal output end of the butterfly operation unit in the previous basic unit is connected with the signal input end of the butterfly operation unit in the next basic unit through a complex multiplier, the signal output by the butterfly operation unit in the previous basic unit is subjected to complex multiplication with twiddle factors in the complex multiplier and then is used as the input signal of the butterfly operation unit in the next basic unit, and the complex multiplier comprises a first subtractor, a second subtractor, a third subtractor, a fourth subtractor, a fifth a sixth subtractor, a sixth subtractor, a sixth subtractor, a sixth subtractor, a sixth adder, First to third adders and first to third multipliers, two input ends of the first subtractor respectively input a real part and an imaginary part of an output signal of the butterfly operation unit in the previous basic unit, two input ends of the second subtractor respectively input a real part and an imaginary part of the twiddle factor, two input ends of the first adder respectively input a real part and an imaginary part of an output signal of the butterfly operation unit in the previous basic unit, two input ends of the first multiplier respectively input an output signal of the first subtractor and an imaginary part of the twiddle factor, two input ends of the second multiplier respectively input an output signal of the second subtractor and a real part of an output signal of the butterfly operation unit in the previous basic unit, two input ends of the third multiplier respectively input an output signal of the first adder and a real part of the twiddle factor, two input ends of the second adder respectively input an output signal of the first multiplier and an output signal of the second multiplier, two input ends of the third adder respectively input the inverted signal of the output signal of the second multiplier and the output signal of the third multiplier; each multiplier comprises a Booth coding unit, a partial product compression unit and a quick summation unit, wherein the Booth coding unit is used for coding two multipliers to quickly generate partial products, the partial product compression unit is used for compressing the generated partial products to quickly obtain two rows of partial products, and the quick summation unit is used for adding the two rows of partial products by using a quick adder to generate a final product;
the method is characterized in that: and performing approximate design on the Booth coding unit and the partial product compression unit, wherein the partial product expression of the Booth coding unit after the approximate design is as follows:
Figure FDA0003039467170000021
wherein the generated partial products are arranged as partial product arrays, ppijIs the partial product of the ith row and the jth column in the partial product array, ajIs the j-th bit of the multiplier, b2i+1Is another multiplierThe 2i +1 th bit of data,
Figure FDA0003039467170000022
represents an exclusive or operation;
designing an approximate 4-2 compressor for the partial volume compression unit, wherein the approximate 4-2 compressor comprises an OR gate, a first NOR gate, a third NOR gate and a first NOR gate, a second NOR gate, for 4 partial products in the same column in the partial product array, two input terminals of a first nor gate respectively input the partial products of the first row and the second row in the column, two input terminals of a second nor gate respectively input the partial products of the third row and the fourth row in the column, two input terminals of a third nor gate respectively input an output signal of the first nor gate and an output signal of the second nor gate, two input terminals of a first nor gate respectively input the partial products of the first row and the second row in the column, two input terminals of a second nor gate respectively input the partial products of the third row and the fourth row in the column, the two input ends of the OR gate respectively input the output signal of the first AND gate and the output signal of the second OR gate.
2. The approximate complex multiplier based FFT processor of claim 1, wherein: and after the Booth coding unit generates a partial product array, directly deleting the symbol compensation bit of the last row.
3. The approximate complex multiplier based FFT processor of claim 1, wherein: and setting a non-precision factor n, and only approximating the lower n least significant bits of the multiplier, wherein n is a positive integer.
CN202110452797.9A 2021-04-26 2021-04-26 FFT processor based on approximate complex multiplier Active CN113190787B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110452797.9A CN113190787B (en) 2021-04-26 2021-04-26 FFT processor based on approximate complex multiplier

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110452797.9A CN113190787B (en) 2021-04-26 2021-04-26 FFT processor based on approximate complex multiplier

Publications (2)

Publication Number Publication Date
CN113190787A true CN113190787A (en) 2021-07-30
CN113190787B CN113190787B (en) 2024-01-30

Family

ID=76979291

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110452797.9A Active CN113190787B (en) 2021-04-26 2021-04-26 FFT processor based on approximate complex multiplier

Country Status (1)

Country Link
CN (1) CN113190787B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112256236A (en) * 2020-10-30 2021-01-22 东南大学 FFT circuit based on approximate constant complex multiplier and implementation method

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112256236A (en) * 2020-10-30 2021-01-22 东南大学 FFT circuit based on approximate constant complex multiplier and implementation method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
JINHE DU等: "Design of An Approximate FFT Processor Based on Approximate Complex Multipliers", 2021 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI (ISVLSI), pages 308 - 313 *
操天等: "面向可容错计算的近似Booth乘法器设计", 微电子学与计算机, vol. 35, no. 07, pages 67 - 71 *
管海亮等: "用于FFT处理器的高速、低功耗定宽Booth乘法器的设计与实现", 军事通信技术, vol. 31, no. 04, pages 61 - 65 *

Also Published As

Publication number Publication date
CN113190787B (en) 2024-01-30

Similar Documents

Publication Publication Date Title
Vahdat et al. TOSAM: An energy-efficient truncation-and rounding-based scalable approximate multiplier
WO2020029018A1 (en) Matrix processing method and apparatus, and logic circuit
CN111488133B (en) High-radix approximate Booth coding method and mixed-radix Booth coding approximate multiplier
US11715456B2 (en) Serial FFT-based low-power MFCC speech feature extraction circuit
Park et al. Fixed-point error analysis of CORDIC processor based on the variance propagation formula
Lenart et al. Architectures for dynamic data scaling in 2/4/8K pipeline FFT cores
Du et al. Design of an approximate FFT processor based on approximate complex multipliers
US20050289207A1 (en) Fast fourier transform processor, dynamic scaling method and fast Fourier transform with radix-8 algorithm
CN112669819B (en) Ultra-low power consumption voice feature extraction circuit based on non-overlapping framing and serial FFT
Fan et al. Fast center weighted Hadamard transform algorithms
Waris et al. AxSA: On the design of high-performance and power-efficient approximate systolic arrays for matrix multiplication
Alam et al. A new time distributed DCT architecture for MPEG-4 hardware reference model
CN113190787A (en) FFT processor based on approximate complex multiplier
Takala et al. Scalable FFT processors and pipelined butterfly units
CN114115803B (en) Approximate floating-point multiplier based on partial product probability analysis
He et al. A probabilistic prediction based fixed-width booth multiplier
CN110532510B (en) Generator for generating twiddle factor and correction factor
CN115885249A (en) System and method for accelerating training of deep learning networks
Azarmehr et al. High-speed and low-power reconfigurable architectures of 2-digit two-dimensional logarithmic number system-based recursive multipliers
Ban et al. Design, synthesis and application of a novel approximate adder
Pyrgas et al. An FPGA design for the two-band fast discrete Hartley transform
TWI825935B (en) System, computer-implemented process and decoder for computing-in-memory
CN114285711B (en) Scaling information propagation method and application thereof in VLSI implementation of fixed-point FFT
Sultan et al. A compact low-power Mitchell-based error tolerant multiplier
Ismail et al. A fast discrete transform architecture for frequency domain motion estimation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant