CN102495721A - Single instruction multiple data (SIMD) vector processor supporting fast Fourier transform (FFT) acceleration - Google Patents

Single instruction multiple data (SIMD) vector processor supporting fast Fourier transform (FFT) acceleration Download PDF

Info

Publication number
CN102495721A
CN102495721A CN2011103937120A CN201110393712A CN102495721A CN 102495721 A CN102495721 A CN 102495721A CN 2011103937120 A CN2011103937120 A CN 2011103937120A CN 201110393712 A CN201110393712 A CN 201110393712A CN 102495721 A CN102495721 A CN 102495721A
Authority
CN
China
Prior art keywords
vector
fft
address
storage
unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2011103937120A
Other languages
Chinese (zh)
Inventor
李丽
孙敏敏
王佳文
潘红兵
郑维山
沙金
李伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN2011103937120A priority Critical patent/CN102495721A/en
Publication of CN102495721A publication Critical patent/CN102495721A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Complex Calculations (AREA)

Abstract

The invention discloses a single instruction multiple data (SIMD) vector processor supporting fast Fourier transform (FFT) acceleration, which comprises a control unit, a calculation unit, a storage subsystem, a storage weaving unit and an address generation unit. The calculation unit supports quick processing of various vector calculations. The storage subsystem comprises three storage groups. Each storage group comprises four storage bodies, the bit wide of a single storage body in each storage group is a plural character, and the storage groups support plural vector calculation with concurrent four-way data and real number vector calculation with concurrent eight-way data. The calculation unit, the address generation unit and the storage weaving unit are connected with the control unit. The address generation unit generates required operand address sequence, coefficient address sequence and result address sequence. The storage weaving unit and the address generation unit are connected with the calculation unit to achieve address mapping of the storage bodies. The acceleration efficiency of the SIMD vector processor to FFT/ inverse fast Fourier transform (IFFT) calculation corresponds to a special hardware accelerator. The SIMD vector processor avoids huge extra pay expenses brought by use of the special hardware accelerator, and is suitable for being used in a real-time signal processing system with a large amount of long vector calculation.

Description

A kind of SIMD vector processor of supporting that FFT quickens
Technical field
The present invention relates to SIMD vector processor and method for designing thereof that a kind of FFT of support quickens, variable the counting of specifically a kind of support is to lower SIMD vector processor and the method for designing thereof of the higher and whole hardware spending of FFT/IFFT computing acceleration efficiency.
Background technology
(Fast Fourier Transformation, FFT) dedicated hardware accelerators (being called fft processor) or dsp processor completion are generally all passed through in computing in Fast Fourier Transform (FFT).Dedicated hardware accelerators can obtain higher acceleration efficiency; But can take more extra resource; Comprise on the sheet computational logic resource on the storage resources and sheet, particularly when the length of conversion is very big, the shared extra resource of dedicated hardware accelerators can't bear.Though accomplish can the occupying volume not outer hardware resource of FFT computing and have very big dirigibility with the mode of dsp processor software programming, its processing speed is relatively slow, has satisfied not the real-time requirement of some application.
At some digital signal processing algorithms, in range-doppler algorithm, relate to the Vector Processing of a large amount of all lengths, length can reach 16K even longer.The vector operation (vectorial plus-minus method, vector multiplication etc.) that these vectorial processing had both been comprised rule also comprises the FFT/IFFT computing.The SIMD vector processor can be used to quicken the vector operation of rule; But the SIMD vector processor of FFT computing (acceleration efficiency is suitable with special-purpose accelerator) does not appear can directly quickening simultaneously as yet; In this case; Also need to use in addition the FFT hardware accelerator to quicken various FFT/IFFT computings of counting, resource will be occupied on the extra sheet.
Summary of the invention
Operation efficiency for the FFT that quickens to count greatly avoids the use of the extra hardware expense that the specialised hardware accelerator is brought simultaneously, the purpose of this invention is to provide the SIMD vector processor that a kind of FFT of support quickens.This SIMD vector processor can directly quicken the FFT computing, also can provide the FFT computing suitable with the dedicated hardware accelerators acceleration efficiency to quicken, and in guaranteed performance, avoids the additional hardware expense.
The objective of the invention is to realize through following technical scheme:
A kind of SIMD vector processor of supporting that FFT quickens, it is characterized in that: this processor comprises control module, computing unit, memory sub-system, storage interleave unit and address-generation unit; Said computing unit is supported the fast processing of various vector operations; Said memory sub-system comprise the deposit operation number memory set A, deposit the memory set B of coefficient and deposit the memory set C of operation result; And the bit wide of the single memory bank in memory set A, memory set B and the memory set C is a complex digital, supports complex vector located computing and the parallel real number vector operation of 8 circuit-switched data that 4 circuit-switched data are parallel; Computing unit, address-generation unit and storage interleave unit all are connected with control module; Address-generation unit produces required operand address sequence, coefficient address sequence, result address sequence according to the data parallel degree of arithmetic type, computing and the length of vector; The storage interleave unit is connected with computing unit with address-generation unit, and realizes the map addresses of memory bank.
Among the present invention, memory set A, memory set B and memory set C are 4 memory banks.The storage interleave unit realizes the map addresses of inner 4 memory banks of memory set A, memory set B and memory set C, make 4 operand bits that read simultaneously in 4 different memory banks, and 4 operation results that write simultaneously is positioned at 4 different memory banks; Through the programmable address mapping method, support the regular vector operation and the FFT/IFFT computing of all lengths vector.
Said programmable address mapping method is vector length to be set through the software programming mode; For different vector lengths; Address mapping method is respective change also, and under each vector length, can guarantee all that regular vector operation and FFT/IFFT computing do not have to conflict to read and write.
Computing unit comprises 2 complex multipliers and 4 complex adder; Support complex multiplication, convolution algorithm that 2 circuit-switched data are parallel; Plural plus-minus method, accumulating operation that 4 circuit-switched data are parallel; Modulus of complex number side's computing that 4 circuit-switched data are parallel, the FFT/IFFT computing that 4 circuit-switched data are parallel, and parallel real multiplications, convolution, plus-minus method, the accumulating operation of 8 circuit-switched data.For the parallel vector operation of above-mentioned n circuit-switched data, on average each clock period is handled n vector location (not considering to handle the preceding streamline filling time of each vector).Its acceleration efficiency and dedicated hardware accelerators are suitable, and support variable counting, and therefore in the safeguards system counting yield, have saved storage resources and logical resource expense on huge that in design, brings because of use FFT specialized hardware accelerator module.
Storage subsystem among the present invention comprises three memory set; Difference deposit operation number, coefficient and operation result; Every group of storer is divided into 4 memory banks, and the bit wide of memory bank is a complex digital, with complex vector located computing and the parallel real number vector operation of 8 circuit-switched data of supporting that 4 circuit-switched data are parallel.Address-generation unit; Can be according to the data parallel degree (2,4,8) of arithmetic type (regular computing, FFT/IFFT computing), computing, required operand address sequence, coefficient address sequence (some computing not being needed), the result address sequence of generations such as length of vector like accumulating operation and the computing of modulus of complex number side.
 
The present invention can directly quicken the SIMD vector processor of FFT computing, except can quickening regular vector operation, also can provide the FFT computing suitable with the dedicated hardware accelerators acceleration efficiency to quicken, and in guaranteed performance, avoids the additional hardware expense.
The invention has the beneficial effects as follows:, obtained the acceleration efficiency suitable, and avoided using specialized hardware to quicken the extra hardware expense of being brought with dedicated hardware accelerators through add the mode of FFT assisted instruction to the SIMD vector processor.The present invention can be effectively applied to have a large amount of overlength vector operations system for real-time signal processing of (comprising regular vector operation and FFT/IFFT).
Description of drawings
Fig. 1 is an overall architecture synoptic diagram of the present invention;
Fig. 2 is traditional radix-2 DIT FFT operational data flow graph;
Fig. 3 is a radix-2 DIT FFT operational data flow graph of the present invention.
Embodiment
Below in conjunction with accompanying drawing the SIMD vector processor that the present invention supports FFT to quicken is carried out detailed explanation.
A kind of SIMD vector processor of supporting that FFT quickens is seen Fig. 1, and this processor comprises control module, computing unit, memory sub-system, storage interleave unit and address-generation unit.
Computing unit is supported the fast processing of various vector operations; Computing unit comprises 2 complex multipliers and 4 complex adder; Support complex multiplication, convolution algorithm that 2 circuit-switched data are parallel, plural plus-minus method, accumulating operation that 4 circuit-switched data are parallel, modulus of complex number side's computing that 4 circuit-switched data are parallel; The FFT/IFFT computing that 4 circuit-switched data are parallel, and parallel real multiplications, convolution, plus-minus method, the accumulating operation of 8 circuit-switched data.For the parallel vector operation of above-mentioned n circuit-switched data, on average each clock period is handled n vector location (not considering to handle the preceding streamline filling time of each vector).Its acceleration efficiency and dedicated hardware accelerators are suitable, and support variable counting, and therefore in the safeguards system counting yield, have saved storage resources and logical resource expense on huge that in design, brings because of use FFT specialized hardware accelerator module.
Memory sub-system comprises three memory set, be respectively the deposit operation number memory set A, deposit the memory set B of coefficient and deposit the memory set C of operation result, and be 4 memory banks in each memory set.The bit wide of single memory bank is a complex digital; Support complex vector located computing and the parallel real number vector operation of 8 circuit-switched data that 4 circuit-switched data are parallel; Make 4 operand bits that read simultaneously in 4 different memory banks, and 4 operation results that write simultaneously are positioned at 4 different memory banks; Through the programmable address mapping method, support the regular vector operation and the FFT/IFFT computing of all lengths vector.Computing unit, address-generation unit and storage interleave unit all are connected with control module.
Address-generation unit produces required operand address sequence, coefficient address sequence, result address sequence according to the data parallel degree of arithmetic type, computing and the length of vector; The storage interleave unit is connected with computing unit with address-generation unit, and realizes the map addresses of memory bank.Storage interleave unit and three memory set are suitable, also comprise storage interleave unit A, storage interleave unit BT and three parts of storage interleave unit C.
The programmable address mapping method is vector length to be set through the software programming mode, and for different vector lengths, address mapping method is respective change also, and under each vector length, can guarantee all that regular vector operation and FFT/IFFT computing do not have to conflict to read and write.
As previously mentioned, make the SIMD processor of supporting regular vector operation support the direct biggest obstacle of quickening of FFT to be address conflict.In the design of FFT dedicated hardware accelerators, can face this problem equally, and very ripe solution has been arranged, generally can avoid through design flexible storage system and map addresses.But here problem is just more complicated, because need after adding the FFT assisted instruction, still can support the acceleration of other regular vector operations.
The present invention uses new radix-2 DIT FFT operational data flow graph, and has proposed the conflict-free memory access that a kind of address mapping method is supported regular vector operation and FFT/IFFT simultaneously, and its programmability is supported the computing of all lengths vector.
Fig. 2 is traditional radix-2 DIT FFT operational data flow graph (the input data had been carried out the bit counter-rotating of address).When calculating based on this DFD, the address sequence of operand is identical with result's address sequence, but all is different for each grade arithmetic address sequence, sees table 1.
The address sequence of each operands/results data channel of table 1 (be for length 8 FFT)
Figure 144085DEST_PATH_IMAGE002
The map addresses of former SIMD vector processor is as shown in table 2
The map addresses of the former SIMD vector processor of table 2
Figure 2011103937120100002DEST_PATH_IMAGE003
Can find out, address conflict occur at the 2nd grade, the address of two operands of butterfly computation 2_0 is respectively 0,4; All in memory bank _ 0, the address of two operands of butterfly computation 2_1 is respectively 1,5, all in memory bank _ 1; The address of two operands of butterfly computation 2_2 is respectively 2,6; All in memory bank _ 2, the address of two operands of butterfly computation 2_3 is respectively 3,7, all in memory bank _ 7.
Can avoid address conflict through changing map addresses simply.But; For length greater than for 8 the FFT; From 3rd level backward, all there is address conflict in each level, and more crucial is to cause the conflict address of each grade different owing to the address sequence of each grade is inequality; And change map addresses and also might cause regular vector operation address conflict to occur, thereby can't address this problem through changing map addresses simply.
A kind of new radix-2 DIT FFT operational data flow graph is arranged, and its address sequence is all identical to each grade butterfly computation, and is as shown in Figure 3.New radix-2 DIT FFT operational data flow graph is to change through traditional radix-2 DIT FFT operational data flow graph.In traditional radix-2 DIT FFT operational data flow graph, the 0th grade has N/2 group, every group of 1 butterfly computation; The 1st grade has N/4 group, every group of 2 butterfly computations; The 2nd grade has the N/8 group, and every group has 4 butterfly computations; By that analogy.
In each level, the computation sequence of butterfly computation is: accomplishing the computing of each group from top to bottom successively, also is to carry out each butterfly computation from top to bottom successively in each group.If adjust the computation sequence of above-mentioned butterfly computation once: calculate first butterfly computation of each group earlier from top to bottom successively, and then calculate second butterfly computation of each group from top to bottom successively, go down like this up to all butterfly computations of accomplishing this grade.FFT with N=8 is an example, and according to traditional radix-2 DIT FFT operational data flow graph, then the 1st grade butterfly computation is 1_0-1_1-1_2-1_3 in proper order, and adjusted butterfly computation order then is 1_0-1_2-1_1-1_3.According to adjusted butterfly computation order, and do corresponding adjustment, then obtain new radix-2 DIT FFT operational data flow graph shown in Figure 3 according to new computation sequence is stored data in storer position.
When calculating based on new radix-2 DIT FFT operational data flow graph, the address sequence of operand is different with result's address sequence, but address sequence all is identical for each grade computing, sees table 3 and table 4.
 
Each operand data channel address sequence of table 3 (based on new radix-2 DIT FFT operational data flow graph)
Figure 2011103937120100002DEST_PATH_IMAGE005
Each result data channel address sequence of table 4 (based on new radix-2 DIT FFT operational data flow graph)
Can find out that from table 3 address sequence of operand is identical with the address sequence of regular vector operation, does not have address conflict.Can find out that from table 4 always there is address conflict in result's address sequence, be respectively 0,4 like two results' of butterfly computation 0_0 address, all in memory bank _ 0.The map addresses after guaranteeing to change can solve this problem through changing map addresses, as long as can not make the address sequence of regular vector operation produce address conflict.
For the vector of N=8, can make map addresses as shown in table 5 into.
The new map addresses (for the vector of N=8) of table 5
Figure 778646DEST_PATH_IMAGE008
According to the map addresses of table 5, the address sequence of table 3 and table 4 all is conflict free, and therefore the parallel memory access for regular vector operation and N=8FFT computing all is conflict free, and the SIMD vector processor can be supported the acceleration of these computings simultaneously.
Be generalized to any vector length N, then map addresses is as shown in table 6
Table 6 is for the map addresses of any vector length N
Figure 360806DEST_PATH_IMAGE009
Like this, for the vector of random length N, this SIMD vector processor all can be supported the direct acceleration computing of regular vector operation and FFT/IFFT computing.Can find out that from table 6 map addresses is relevant with vector length N.In the SIMD vector processor that is designed; Map addresses realizes through the storage interleave unit; Therefore, vector need elder generation vector length to be set to the storage interleave unit before being loaded into on-chip memory from chip external memory through the mode of software programming; Can load vector subsequently and carry out a series of acceleration computings, comprise regular vector operation and FFT/IFFT computing to on-chip memory and to vector.Therefore, this address mapping method is called the programmable address mapping.
FFT acceleration in the present embodiment is average each clock period two butterfly computations, and acceleration efficiency (each cycle butterfly computation number/complex multiplier number) reaches the highest, is 1, and is suitable with the maximum acceleration efficiency of dedicated hardware accelerators.
In addition, what need special instruction is that method for designing of the present invention has great extensibility, can select degree of parallelism according to performance requirement, and the butterfly computation number of parallel computation can be chosen as 1,2,4,8 ...The degree of parallelism of general radix-2 FFT hardware accelerator is 1 or log2N, and does not possess the dirigibility of selection, therefore says that this extensibility is significant.
The present invention can be under the prerequisite of the system of assurance operation efficiency, and the enhanced system dirigibility reduces the huge hardware spending that uses the FFT dedicated hardware units and bring simultaneously, therefore in signal processing system, has and excellent application value.

Claims (5)

1. SIMD vector processor of supporting that FFT quickens, it is characterized in that: this processor comprises control module, computing unit, memory sub-system, storage interleave unit and address-generation unit; Said computing unit is supported the fast processing of various vector operations; Said memory sub-system comprise the deposit operation number memory set A, deposit the memory set B of coefficient and deposit the memory set C of operation result; And the bit wide of the single memory bank in memory set A, memory set B and the memory set C is a complex digital, supports complex vector located computing and the parallel real number vector operation of 8 circuit-switched data that 4 circuit-switched data are parallel; Computing unit, address-generation unit and storage interleave unit all are connected with control module; Address-generation unit produces required operand address sequence, coefficient address sequence, result address sequence according to the data parallel degree of arithmetic type, computing and the length of vector; The storage interleave unit is connected with computing unit with address-generation unit, and realizes the map addresses of memory bank.
2. the SIMD vector processor that support FFT according to claim 1 quickens, it is characterized in that: memory set A, memory set B and memory set C are 4 memory banks.
3. the SIMD vector processor that support FFT according to claim 2 quickens; It is characterized in that: the storage interleave unit realizes the map addresses of inner 4 memory banks of memory set A, memory set B and memory set C; Make 4 operand bits that read simultaneously in 4 different memory banks, and 4 operation results that write simultaneously are positioned at 4 different memory banks; Through the programmable address mapping method, support the regular vector operation and the FFT/IFFT computing of all lengths vector.
4. the SIMD vector processor that support FFT according to claim 3 quickens; It is characterized in that: said programmable address mapping method is vector length to be set through the software programming mode; For different vector lengths; Address mapping method is respective change also, and under each vector length, can guarantee all that regular vector operation and FFT/IFFT computing do not have to conflict to read and write.
5. the SIMD vector processor that support FFT according to claim 1 quickens; It is characterized in that: computing unit comprises 2 complex multipliers and 4 complex adder; Support complex multiplication, convolution algorithm that 2 circuit-switched data are parallel, plural plus-minus method, accumulating operation that 4 circuit-switched data are parallel, modulus of complex number side's computing that 4 circuit-switched data are parallel; The FFT/IFFT computing that 4 circuit-switched data are parallel, and parallel real multiplications, convolution, plus-minus method, the accumulating operation of 8 circuit-switched data.
CN2011103937120A 2011-12-02 2011-12-02 Single instruction multiple data (SIMD) vector processor supporting fast Fourier transform (FFT) acceleration Pending CN102495721A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2011103937120A CN102495721A (en) 2011-12-02 2011-12-02 Single instruction multiple data (SIMD) vector processor supporting fast Fourier transform (FFT) acceleration

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2011103937120A CN102495721A (en) 2011-12-02 2011-12-02 Single instruction multiple data (SIMD) vector processor supporting fast Fourier transform (FFT) acceleration

Publications (1)

Publication Number Publication Date
CN102495721A true CN102495721A (en) 2012-06-13

Family

ID=46187550

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2011103937120A Pending CN102495721A (en) 2011-12-02 2011-12-02 Single instruction multiple data (SIMD) vector processor supporting fast Fourier transform (FFT) acceleration

Country Status (1)

Country Link
CN (1) CN102495721A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103023519A (en) * 2012-10-26 2013-04-03 中国兵器科学研究院 Method and device for transforming Fermat number
CN103838704A (en) * 2014-03-20 2014-06-04 南京大学 FFT accelerator with high throughput rate
US9355061B2 (en) 2014-01-28 2016-05-31 Arm Limited Data processing apparatus and method for performing scan operations
WO2017124648A1 (en) * 2016-01-20 2017-07-27 北京中科寒武纪科技有限公司 Vector computing device
CN108710943A (en) * 2018-05-21 2018-10-26 南京大学 A kind of multilayer feedforward neural network Parallel Accelerator
CN109900491A (en) * 2017-12-11 2019-06-18 通用汽车环球科技运作有限责任公司 System, the method and apparatus of troubleshooting detection are carried out by supplemental characteristic using redundant processor framework
CN111213125A (en) * 2017-09-08 2020-05-29 甲骨文国际公司 Efficient direct convolution using SIMD instructions
CN115718724A (en) * 2023-01-09 2023-02-28 阿里巴巴(中国)有限公司 GPU (graphics processing Unit), data selection method and chip
US11734383B2 (en) 2016-01-20 2023-08-22 Cambricon Technologies Corporation Limited Vector and matrix computing device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7219212B1 (en) * 2002-05-13 2007-05-15 Tensilica, Inc. Load/store operation of memory misaligned vector data using alignment register storing realigned data portion for combining with remaining portion
CN101630308A (en) * 2008-07-16 2010-01-20 财团法人交大思源基金会 Design and addressing method for any point number quick Fourier transformer based on memory

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7219212B1 (en) * 2002-05-13 2007-05-15 Tensilica, Inc. Load/store operation of memory misaligned vector data using alignment register storing realigned data portion for combining with remaining portion
CN101630308A (en) * 2008-07-16 2010-01-20 财团法人交大思源基金会 Design and addressing method for any point number quick Fourier transformer based on memory

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
吴云峰等: "三维向量基快速傅立叶算法", 《计算机应用》, vol. 29, no. 2, 28 February 2009 (2009-02-28) *
徐妮妮等: "频域抽取二维向量基快速傅里叶变换", 《天津工业大学学报》, vol. 27, no. 6, 31 December 2008 (2008-12-31) *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103023519B (en) * 2012-10-26 2016-12-21 中国兵器科学研究院 A kind of method and apparatus of Fermat number transform
CN103023519A (en) * 2012-10-26 2013-04-03 中国兵器科学研究院 Method and device for transforming Fermat number
US9355061B2 (en) 2014-01-28 2016-05-31 Arm Limited Data processing apparatus and method for performing scan operations
CN103838704A (en) * 2014-03-20 2014-06-04 南京大学 FFT accelerator with high throughput rate
US11734383B2 (en) 2016-01-20 2023-08-22 Cambricon Technologies Corporation Limited Vector and matrix computing device
WO2017124648A1 (en) * 2016-01-20 2017-07-27 北京中科寒武纪科技有限公司 Vector computing device
CN111213125B (en) * 2017-09-08 2023-11-07 甲骨文国际公司 Efficient direct convolution using SIMD instructions
CN111213125A (en) * 2017-09-08 2020-05-29 甲骨文国际公司 Efficient direct convolution using SIMD instructions
CN109900491A (en) * 2017-12-11 2019-06-18 通用汽车环球科技运作有限责任公司 System, the method and apparatus of troubleshooting detection are carried out by supplemental characteristic using redundant processor framework
CN109900491B (en) * 2017-12-11 2021-05-11 通用汽车环球科技运作有限责任公司 System, method and apparatus for diagnostic fault detection using redundant processor architecture with parametric data
CN108710943B (en) * 2018-05-21 2021-11-16 南京大学 Multilayer feedforward neural network parallel accelerator
CN108710943A (en) * 2018-05-21 2018-10-26 南京大学 A kind of multilayer feedforward neural network Parallel Accelerator
CN115718724A (en) * 2023-01-09 2023-02-28 阿里巴巴(中国)有限公司 GPU (graphics processing Unit), data selection method and chip

Similar Documents

Publication Publication Date Title
CN102495721A (en) Single instruction multiple data (SIMD) vector processor supporting fast Fourier transform (FFT) acceleration
CN109992743B (en) Matrix multiplier
CN106940815B (en) Programmable convolutional neural network coprocessor IP core
CN103970720B (en) Based on extensive coarseness imbedded reconfigurable system and its processing method
CN103984560B (en) Based on extensive coarseness imbedded reconfigurable system and its processing method
CN103955447B (en) FFT accelerator based on DSP chip
CN108805266A (en) A kind of restructural CNN high concurrents convolution accelerator
CN101847137B (en) FFT processor for realizing 2FFT-based calculation
US20230385233A1 (en) Multiple accumulate busses in a systolic array
CN111723336B (en) Cholesky decomposition-based arbitrary-order matrix inversion hardware acceleration system adopting loop iteration mode
CN102945224A (en) High-speed variable point FFT (Fast Fourier Transform) processor based on FPGA (Field-Programmable Gate Array) and processing method of high-speed variable point FFT processor
CN103543984A (en) Modification type balance throughput data path architecture for special corresponding applications
CN110705702A (en) Dynamic extensible convolutional neural network accelerator
Yang et al. Molecular dynamics range-limited force evaluation optimized for FPGAs
CN101894096A (en) FFT computing circuit structure applied to CMMB and DVB-H/T
US10949493B2 (en) Multi-functional computing apparatus and fast fourier transform computing apparatus
CN116710912A (en) Matrix multiplier and control method thereof
CN102567282B (en) In general dsp processor, FFT calculates implement device and method
CN115310037A (en) Matrix multiplication computing unit, acceleration unit, computing system and related method
CN112559954B (en) FFT algorithm processing method and device based on software-defined reconfigurable processor
CN103034621A (en) Address mapping method and system of radix-2*K parallel FFT (fast Fourier transform) architecture
CN104679670A (en) Shared data caching structure and management method for FFT (fast Fourier transform) and FIR (finite impulse response) algorithms
CN111610963B (en) Chip structure and multiply-add calculation engine thereof
Shafiq et al. Exploiting memory customization in FPGA for 3D stencil computations
CN102541813B (en) Method and corresponding device for multi-granularity parallel FFT (Fast Fourier Transform) butterfly computation

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20120613