CN113190787A - FFT processor based on approximate complex multiplier - Google Patents
FFT processor based on approximate complex multiplier Download PDFInfo
- Publication number
- CN113190787A CN113190787A CN202110452797.9A CN202110452797A CN113190787A CN 113190787 A CN113190787 A CN 113190787A CN 202110452797 A CN202110452797 A CN 202110452797A CN 113190787 A CN113190787 A CN 113190787A
- Authority
- CN
- China
- Prior art keywords
- input
- multiplier
- unit
- gate
- output signal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000006835 compression Effects 0.000 claims abstract description 16
- 238000007906 compression Methods 0.000 claims abstract description 16
- 239000000047 product Substances 0.000 claims description 61
- 239000012467 final product Substances 0.000 claims description 5
- 238000000034 method Methods 0.000 claims description 4
- 238000003491 array Methods 0.000 claims description 2
- 238000010586 diagram Methods 0.000 description 9
- 238000004364 calculation method Methods 0.000 description 5
- 238000013500 data storage Methods 0.000 description 2
- 230000001788 irregular Effects 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/14—Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
- G06F17/141—Discrete Fourier transforms
- G06F17/142—Fast Fourier transforms, e.g. using a Cooley-Tukey type algorithm
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/52—Multiplying; Dividing
- G06F7/523—Multiplying only
- G06F7/533—Reduction of the number of iteration steps or stages, e.g. using the Booth algorithm, log-sum, odd-even
- G06F7/5334—Reduction of the number of iteration steps or stages, e.g. using the Booth algorithm, log-sum, odd-even by using multiple bit scanning, i.e. by decoding groups of successive multiplier bits in order to select an appropriate precalculated multiple of the multiplicand as a partial product
- G06F7/5336—Reduction of the number of iteration steps or stages, e.g. using the Booth algorithm, log-sum, odd-even by using multiple bit scanning, i.e. by decoding groups of successive multiplier bits in order to select an appropriate precalculated multiple of the multiplicand as a partial product overlapped, i.e. with successive bitgroups sharing one or more bits being recoded into signed digit representation, e.g. using the Modified Booth Algorithm
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Theoretical Computer Science (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- Discrete Mathematics (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- Complex Calculations (AREA)
Abstract
The invention discloses an FFT processor based on approximate complex multiplier, which reduces the resource consumption during operation and improves the operation rate and the processor performance under the condition that the precision is kept at a certain level by approximating a Booth coding unit and a partial product compression unit in the multiplier in the FFT processor.
Description
Technical Field
The invention belongs to the field of design of FFT (fast Fourier transform) processors, and particularly relates to an FFT processor based on an approximate complex multiplier.
Background
Conventional computer performance is simply pursuing accurate operation. This trend faces technical challenges in terms of power consumption, circuit reliability, and high performance. Approximation calculations have been proposed for energy efficient systems for emerging fault tolerant applications (e.g., speech recognition, image processing, data mining, video processing, etc.) that do not require the full accuracy sought. Digital Signal Processing (DSP) is also a fault tolerant calculation, and applying approximate calculations in DSP calculations is an efficient way to achieve low power consumption and high performance.
The Fast Fourier Transform (FFT) is a fast algorithm of the Discrete Fourier Transform (DFT). It is obtained by improving the algorithm of discrete Fourier transform according to the characteristics of odd, even, imaginary and real of the discrete Fourier transform. Since the FFT can be widely applied to various applications, the hardware structure of the FFT is also widely studied and optimized to adapt to different applications. The hardware implementation scheme of FFT mainly has two structures of reconfigurable structure and fixed structure. The variable-length FFT generally uses a reconfigurable FFT, and a common method used for FFTs of various transform lengths is a mixed-basis algorithm. The FFT of the fixed structure can also be divided into a parallel structure and a pipeline structure. The most typical of the pipeline architecture are multi-path delay switching (MDC) and single-path delay feedback (SDC). Different structures have different advantages and disadvantages, the parallel structure can process N inputs simultaneously, the delay is small, the control is easy, but more hardware resources are consumed; the pipeline architecture is simpler, but the data is processed sequentially with greater latency.
Disclosure of Invention
In order to solve the technical problems mentioned in the background art, the present invention provides an FFT processor based on an approximate complex multiplier.
In order to achieve the technical purpose, the technical scheme of the invention is as follows:
an FFT processor based on approximate complex multiplier comprises a plurality of basic units which are sequentially cascaded, each basic unit comprises a butterfly operation unit and m feedback units, each butterfly operation unit comprises a signal input end, a signal output end, m feedback input ends and m feedback output ends, m is a positive integer, each feedback output end is connected with the corresponding feedback output end through the corresponding feedback unit, the signal output end of the butterfly operation unit in the previous basic unit is connected with the signal input end of the butterfly operation unit in the next basic unit through a complex multiplier, the signal output by the butterfly operation unit in the previous basic unit is subjected to complex multiplication with twiddle factors in the complex multiplier and then is used as the input signal of the butterfly operation unit in the next basic unit, and the complex multiplier comprises a first subtractor, a second subtractor, a third subtractor, a fourth subtractor, a fifth a sixth subtractor, a sixth subtractor, a sixth subtractor, a sixth subtractor, a sixth adder, First to third adders and first to third multipliers, two input ends of the first subtractor respectively input a real part and an imaginary part of an output signal of the butterfly operation unit in the previous basic unit, two input ends of the second subtractor respectively input a real part and an imaginary part of the twiddle factor, two input ends of the first adder respectively input a real part and an imaginary part of an output signal of the butterfly operation unit in the previous basic unit, two input ends of the first multiplier respectively input an output signal of the first subtractor and an imaginary part of the twiddle factor, two input ends of the second multiplier respectively input an output signal of the second subtractor and a real part of an output signal of the butterfly operation unit in the previous basic unit, two input ends of the third multiplier respectively input an output signal of the first adder and a real part of the twiddle factor, two input ends of the second adder respectively input an output signal of the first multiplier and an output signal of the second multiplier, two input ends of the third adder respectively input the inverted signal of the output signal of the second multiplier and the output signal of the third multiplier; each multiplier comprises a Booth coding unit, a partial product compression unit and a quick summation unit, wherein the Booth coding unit is used for coding two multipliers to quickly generate partial products, the partial product compression unit is used for compressing the generated partial products to quickly obtain two rows of partial products, and the quick summation unit is used for adding the two rows of partial products by using a quick adder to generate a final product;
and performing approximate design on the Booth coding unit and the partial product compression unit, wherein the partial product expression of the Booth coding unit after the approximate design is as follows:
wherein the generated partial products are arranged as partial product arrays, ppijIs the partial product of the ith row and the jth column in the partial product array, ajIs the j-th bit of the multiplier, b2i+1Is the 2i +1 th bit of data in another multiplier,represents an exclusive or operation;
designing an approximate 4-2 compressor for the partial volume compression unit, wherein the approximate 4-2 compressor comprises an OR gate, a first NOR gate, a third NOR gate and a first NOR gate, a second NOR gate, for 4 partial products in the same column in the partial product array, two input terminals of a first nor gate respectively input the partial products of the first row and the second row in the column, two input terminals of a second nor gate respectively input the partial products of the third row and the fourth row in the column, two input terminals of a third nor gate respectively input an output signal of the first nor gate and an output signal of the second nor gate, two input terminals of a first nor gate respectively input the partial products of the first row and the second row in the column, two input terminals of a second nor gate respectively input the partial products of the third row and the fourth row in the column, the two input ends of the OR gate respectively input the output signal of the first AND gate and the output signal of the second OR gate.
Furthermore, after the Booth coding unit generates a partial product array, the symbol compensation bit of the last row is directly deleted.
Further, a non-precision factor n is set, only the lower n least significant bits of the multiplier are approximated, and n is a positive integer.
Adopt the beneficial effect that above-mentioned technical scheme brought:
the invention approximates the Booth coding unit and the partial product compression unit in the multiplier in the FFT processor, reduces the resource consumption during operation and improves the operation rate and the processor performance under the condition that the precision is kept at a certain level.
Drawings
FIG. 1 is a view of the structure of an N-point R4 SDF;
FIG. 2 is a diagram of one stage operation of Radix4 pipeline type FFT;
FIG. 3 is a schematic diagram of a butterfly unit;
FIG. 4 is a diagram of a logic gate structure for Booth encoding with approximate optimization in the present invention;
FIG. 5 is a partial product array dot diagram according to the present invention;
FIG. 6 is a block diagram of the logic gates of the near optimized 4-2 compressor of the present invention;
fig. 7 is a 16-bit multiplier partial product array point diagram with non-precision factors of 8 and 16 in accordance with the present invention.
Detailed Description
The technical scheme of the invention is explained in detail in the following with the accompanying drawings.
Fig. 1 is an N-point R4SDF structure, which uses a shift register to delay data, and the storage utilization rate is improved by arranging the transmission path appropriately. The difference from the MDC structure is that the number of data paths in the MDC structure is directly related to the selected Radix alpha algorithm, the whole resource utilization rate is reduced rapidly along with the increase of alpha, most time is wasted in storage reading, and only one data path is needed among each level of the SDF structure no matter the algorithms such as Radix2, Radix4 and Radix8 are selected. Although the number of stages and the number of butterfly units required by the SDF structure are the same as those of the MDC structure, the resource utilization rate of each module is greatly improved.
The butterfly operation of the Radix4 algorithm can be shown in the left half of fig. 2, and the Radix4 butterfly unit completes the operation amount of Radix2 at two stages, wherein the required operation steps are also simplified to a certain extent. The first stage operation of Radix4 pipeline FFT can be represented as fig. 2, where the butterfly module has a small part of resources for controlling the data storage and butterfly operation. In the pipeline type FFT processor, the data input is a time continuous sequence, and there are different distances between butterfly input data according to the difference of "stages", so that it is necessary to perform delay operation processing on the input sequence and extract the stored data at an appropriate timing when necessary. Taking a butterfly operation unit of Radix4 as an example, the module interface design is shown in fig. 3, where Bank1_ in, Bank2_ in and Bank3_ in are input sequences of previous time read from three memory cells respectively, Data _ in is valid input of the current stage, validity of Data input is controlled by Data _ in _ valid, Clk and Rst _ n are global clock and reset signals, Dout _ re and Dout _ im are real and imaginary parts of output, and Dout _ valid controls validity of Data output. The butterfly operation unit only needs to operate butterfly calculation based on complex addition and subtraction operation and real part and imaginary part exchange, and the data storage read-write control part is completed by the aid of the control unit in the stage where the butterfly operation unit is located.
The complex multiplier part is shown in the right half of fig. 2. The design of the invention adopting the non-precise basis-4 Booth multiplier mainly comprises three parts: a radix-4 Booth coding unit, a partial product compression unit and a fast summation unit. The Booth coding is used for coding the multiplier and the multiplicand, quickly generating partial products and reducing the number of rows and the number of the partial products. The partial product compression unit compresses the partial product to quickly obtain the final two-row partial product, and effectively shortens the key path of the multiplier. The fast summation is the addition of the final two-row partial products with a fast adder to produce the final product. Booth encoding units and compression units in the three modules are the units used most in the operation process of the multiplier. Taking a 16-bit multiplier as an example, 144 Booth encoders are needed to generate 144 Booth encoders to generate 144 regular partial products (dividing sign extension bits and sign compensation bits), then two rows of partial products are generated through 80 4-2 compressors and 32 carry-save adders, and finally a fast adder is adopted to generate a final product. Therefore, the Booth coding module is subjected to non-precise optimization design, and the performance of the multiplier can be improved to the greatest extent.
The non-exact base-4 Booth code designed by the present invention is shown in Table 1 below:
TABLE 1
The partial product expression is as follows:
the structure of the logic gate is shown in fig. 4.
The exact radix-4 Booth encoding produces a partial product array with a sign offset bit in the last row, which makes the partial product array irregular. To ensure that an accurate final product is obtained, the design of an accurate multiplier requires one stage of compression for this row alone, so that the multiplier design requires more compressors and longer critical paths. In order to design a more regular partial product array, the non-exact radix-4 Booth multiplier design in the design can directly discard the sign offset bits of the last row. An 8-bit multiplier partial product array point diagram is taken as an example, as shown in fig. 5. FIG. 5 (a) shows the irregular partial product array distribution of an exact 8-bit multiplier after radix-4 Booth encoding, with solid black boxes representing the compressor; ● denotes the conventional partial product; o represents a symbol extension bit; circa represents a symbol compensation bit; Δ represents the sign offset bit of row 5. Fig. 5 (b) shows a regular partial product array of the non-exact 8-bit multiplier after radix-4 Booth encoding, which can generate two rows of partial products to be added finally after one-stage compression.
The non-exact regular partial product array reduces one-stage compression of the non-exact multiplier at the expense of dropping one sign offset bit to yield an error probability of 37.5%. Meanwhile, in the partial product array of the radix-4 Booth multiplier, the position generated by the sign compensation bit belongs to the lower weight bit, and the error distance is within the error tolerance range.
The logic gate circuit of the approximate 4-2 compressor designed by the invention is shown in FIG. 6 and comprises three NOR gates, two same OR gates and one OR gate, and the logic expression of the logic gate circuit is as follows:
the present invention may employ multipliers of different degrees of approximation. The inexact factor is set to 8, that is, the last 8 bits of the multiplier operation are approximate operation, and the rest are exact operation, the operation point diagram is shown as (1) in fig. 7, only the last 9 bits of the last result are approximate, when the intermediate bit width of the FFT is 16 bits, the result after the first 16 bits is truncated, so that the 9-bit inexact bits have little influence on the last result. The inexact factor is set to be 16, that is, the last 17 bits of the multiplier operation are approximate operation, and the rest are exact operation, the operation point diagram is shown as (2) in fig. 7, only the last 17 bits of the final result are approximate, when the intermediate bit width of the FFT is 16 bits, the result after the first 16 bits is truncated, so that the influence of the 17-bit inexact bits on the final result is very small.
The parameter results of this example were compared with those of the prior art, and the comparison results are shown in table 2 below:
TABLE 2
In the table, FFT-ex is the prior art, and FFT-1 is replaced by an approximate Booth multiplier with a non-precision factor of 8; FFT-1 is replaced by an approximate Booth multiplier of non-exact factor 16.
As can be seen from the table, in the case that the SNR reduction does not exceed 1dB, the present invention reduces 5.9% of LUT in 64-point FFT-1, 12.7% of LUT in 64-point FFT-2, 11.6% of LUT in 256-point FFT-1, 20.3% of LUT in 256-point FFT-2, 7.8% of LUT in 1024-point FFT-1, and 16.5% of LUT in 256-point FFT-2, compared with the prior art. The design can also obviously reduce the use amount of FF in FFT and can improve the frequency to the original 141.8% -146.1%.
The embodiments are only for illustrating the technical idea of the present invention, and the technical idea of the present invention is not limited thereto, and any modifications made on the basis of the technical scheme according to the technical idea of the present invention fall within the scope of the present invention.
Claims (3)
1. An FFT processor based on approximate complex multiplier comprises a plurality of basic units which are sequentially cascaded, each basic unit comprises a butterfly operation unit and m feedback units, each butterfly operation unit comprises a signal input end, a signal output end, m feedback input ends and m feedback output ends, m is a positive integer, each feedback output end is connected with the corresponding feedback output end through the corresponding feedback unit, the signal output end of the butterfly operation unit in the previous basic unit is connected with the signal input end of the butterfly operation unit in the next basic unit through a complex multiplier, the signal output by the butterfly operation unit in the previous basic unit is subjected to complex multiplication with twiddle factors in the complex multiplier and then is used as the input signal of the butterfly operation unit in the next basic unit, and the complex multiplier comprises a first subtractor, a second subtractor, a third subtractor, a fourth subtractor, a fifth a sixth subtractor, a sixth subtractor, a sixth subtractor, a sixth subtractor, a sixth adder, First to third adders and first to third multipliers, two input ends of the first subtractor respectively input a real part and an imaginary part of an output signal of the butterfly operation unit in the previous basic unit, two input ends of the second subtractor respectively input a real part and an imaginary part of the twiddle factor, two input ends of the first adder respectively input a real part and an imaginary part of an output signal of the butterfly operation unit in the previous basic unit, two input ends of the first multiplier respectively input an output signal of the first subtractor and an imaginary part of the twiddle factor, two input ends of the second multiplier respectively input an output signal of the second subtractor and a real part of an output signal of the butterfly operation unit in the previous basic unit, two input ends of the third multiplier respectively input an output signal of the first adder and a real part of the twiddle factor, two input ends of the second adder respectively input an output signal of the first multiplier and an output signal of the second multiplier, two input ends of the third adder respectively input the inverted signal of the output signal of the second multiplier and the output signal of the third multiplier; each multiplier comprises a Booth coding unit, a partial product compression unit and a quick summation unit, wherein the Booth coding unit is used for coding two multipliers to quickly generate partial products, the partial product compression unit is used for compressing the generated partial products to quickly obtain two rows of partial products, and the quick summation unit is used for adding the two rows of partial products by using a quick adder to generate a final product;
the method is characterized in that: and performing approximate design on the Booth coding unit and the partial product compression unit, wherein the partial product expression of the Booth coding unit after the approximate design is as follows:
wherein the generated partial products are arranged as partial product arrays, ppijIs the partial product of the ith row and the jth column in the partial product array, ajIs the j-th bit of the multiplier, b2i+1Is another multiplierThe 2i +1 th bit of data,represents an exclusive or operation;
designing an approximate 4-2 compressor for the partial volume compression unit, wherein the approximate 4-2 compressor comprises an OR gate, a first NOR gate, a third NOR gate and a first NOR gate, a second NOR gate, for 4 partial products in the same column in the partial product array, two input terminals of a first nor gate respectively input the partial products of the first row and the second row in the column, two input terminals of a second nor gate respectively input the partial products of the third row and the fourth row in the column, two input terminals of a third nor gate respectively input an output signal of the first nor gate and an output signal of the second nor gate, two input terminals of a first nor gate respectively input the partial products of the first row and the second row in the column, two input terminals of a second nor gate respectively input the partial products of the third row and the fourth row in the column, the two input ends of the OR gate respectively input the output signal of the first AND gate and the output signal of the second OR gate.
2. The approximate complex multiplier based FFT processor of claim 1, wherein: and after the Booth coding unit generates a partial product array, directly deleting the symbol compensation bit of the last row.
3. The approximate complex multiplier based FFT processor of claim 1, wherein: and setting a non-precision factor n, and only approximating the lower n least significant bits of the multiplier, wherein n is a positive integer.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110452797.9A CN113190787B (en) | 2021-04-26 | 2021-04-26 | FFT processor based on approximate complex multiplier |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110452797.9A CN113190787B (en) | 2021-04-26 | 2021-04-26 | FFT processor based on approximate complex multiplier |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113190787A true CN113190787A (en) | 2021-07-30 |
CN113190787B CN113190787B (en) | 2024-01-30 |
Family
ID=76979291
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110452797.9A Active CN113190787B (en) | 2021-04-26 | 2021-04-26 | FFT processor based on approximate complex multiplier |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113190787B (en) |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112256236A (en) * | 2020-10-30 | 2021-01-22 | 东南大学 | FFT circuit based on approximate constant complex multiplier and implementation method |
-
2021
- 2021-04-26 CN CN202110452797.9A patent/CN113190787B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112256236A (en) * | 2020-10-30 | 2021-01-22 | 东南大学 | FFT circuit based on approximate constant complex multiplier and implementation method |
Non-Patent Citations (3)
Title |
---|
JINHE DU等: "Design of An Approximate FFT Processor Based on Approximate Complex Multipliers", 2021 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI (ISVLSI), pages 308 - 313 * |
操天等: "面向可容错计算的近似Booth乘法器设计", 微电子学与计算机, vol. 35, no. 07, pages 67 - 71 * |
管海亮等: "用于FFT处理器的高速、低功耗定宽Booth乘法器的设计与实现", 军事通信技术, vol. 31, no. 04, pages 61 - 65 * |
Also Published As
Publication number | Publication date |
---|---|
CN113190787B (en) | 2024-01-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Vahdat et al. | TOSAM: An energy-efficient truncation-and rounding-based scalable approximate multiplier | |
WO2020029018A1 (en) | Matrix processing method and apparatus, and logic circuit | |
CN111488133B (en) | High-radix approximate Booth coding method and mixed-radix Booth coding approximate multiplier | |
US11715456B2 (en) | Serial FFT-based low-power MFCC speech feature extraction circuit | |
Park et al. | Fixed-point error analysis of CORDIC processor based on the variance propagation formula | |
Lenart et al. | Architectures for dynamic data scaling in 2/4/8K pipeline FFT cores | |
Du et al. | Design of an approximate FFT processor based on approximate complex multipliers | |
US20050289207A1 (en) | Fast fourier transform processor, dynamic scaling method and fast Fourier transform with radix-8 algorithm | |
CN112669819B (en) | Ultra-low power consumption voice feature extraction circuit based on non-overlapping framing and serial FFT | |
Fan et al. | Fast center weighted Hadamard transform algorithms | |
Waris et al. | AxSA: On the design of high-performance and power-efficient approximate systolic arrays for matrix multiplication | |
Alam et al. | A new time distributed DCT architecture for MPEG-4 hardware reference model | |
CN113190787A (en) | FFT processor based on approximate complex multiplier | |
Takala et al. | Scalable FFT processors and pipelined butterfly units | |
CN114115803B (en) | Approximate floating-point multiplier based on partial product probability analysis | |
He et al. | A probabilistic prediction based fixed-width booth multiplier | |
CN110532510B (en) | Generator for generating twiddle factor and correction factor | |
CN115885249A (en) | System and method for accelerating training of deep learning networks | |
Azarmehr et al. | High-speed and low-power reconfigurable architectures of 2-digit two-dimensional logarithmic number system-based recursive multipliers | |
Ban et al. | Design, synthesis and application of a novel approximate adder | |
Pyrgas et al. | An FPGA design for the two-band fast discrete Hartley transform | |
TWI825935B (en) | System, computer-implemented process and decoder for computing-in-memory | |
CN114285711B (en) | Scaling information propagation method and application thereof in VLSI implementation of fixed-point FFT | |
Sultan et al. | A compact low-power Mitchell-based error tolerant multiplier | |
Ismail et al. | A fast discrete transform architecture for frequency domain motion estimation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |