CN104699624B - Lothrus apterus towards FFT parallel computations stores access method - Google Patents

Lothrus apterus towards FFT parallel computations stores access method Download PDF

Info

Publication number
CN104699624B
CN104699624B CN201510137874.6A CN201510137874A CN104699624B CN 104699624 B CN104699624 B CN 104699624B CN 201510137874 A CN201510137874 A CN 201510137874A CN 104699624 B CN104699624 B CN 104699624B
Authority
CN
China
Prior art keywords
address
operational data
lothrus apterus
fft
memory access
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510137874.6A
Other languages
Chinese (zh)
Other versions
CN104699624A (en
Inventor
陈海燕
刘胜
陈书明
郭阳
燕世林
刘仲
万江华
陈胜刚
杨超
梁停雨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN201510137874.6A priority Critical patent/CN104699624B/en
Publication of CN104699624A publication Critical patent/CN104699624A/en
Application granted granted Critical
Publication of CN104699624B publication Critical patent/CN104699624B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Complex Calculations (AREA)

Abstract

The present invention discloses a kind of Lothrus apterus towards FFT parallel computations and stores access method, and this method step includes:1) judge the structure of current processor, if SIMD architecture, perform step 3);Otherwise step 2) is performed;2) a storage group storage operational data is configured, storage group includes multiple parallel single port memory banks;When performing FFT and calculating, the two-dimentional Lothrus apterus memory access address of the address of cache of the operational data address for corresponding target storage volume and in target storage volume will be treated;3) multiple parallel storage group storage operational datas are configured, every group of storage group includes multiple parallel single port memory banks;When performing FFT and calculating, by treat the address of cache of operational data for corresponding target storage group, target storage volume and in target storage volume address three-dimensional Lothrus apterus memory access address.The present invention can realize the conflict-free access of FFT parallel computations, have the advantages of memory access efficiency high and small hardware spending.

Description

Lothrus apterus towards FFT parallel computations stores access method
Technical field
The present invention relates to FFT computings field in microprocessor, more particularly to it is a kind of towards FFT parallel computations without punching Prominent storage access method.
Background technology
FFT (Fast Fourier Transform, Fast Fourier Transform (FFT)) algorithm be nineteen sixty-five by J.W. Cooleys and What T.W. figure base proposed realizes that one kind of discrete Fourier transform (Discrete Fourier Transform, DFT) is calculated quickly soon Method, is the core algorithm in many Embedded Applications such as radio communication, image procossing, and the height of its operational performance often decides The processing capability in real time of whole digital processing system.The continuous development of application demand it is also proposed more and more higher to FFT performance Requirement, with the development of digital signal processor techniques so that realize efficiently programmable FFT parallel algorithms be possibly realized.
The implementation method of fft algorithm common at present is divided into two kinds, and the first is special FFT hardware accelerator, such as Based on FPGA modes or as the FFT hardware co-processors in microprocessor piece, it is only used for fft algorithm acceleration;Second Kind is that the software programming based on general purpose microprocessor or digital signal processor instructions architecture is realized.First method Restricted application, it is impossible to the development and change of meet demand, and realization price of hardware is high, lacks flexibility;Second method by The realization of instruction set programmed method is then based on, thus with certain flexibility, versatility, and with high-performance microprocessor The development of device technology so that also obtain the operational performance suitable with special FFT hardware accelerator in this way.
N point sequence x (n) DFT calculation formula are as follows:
Wherein, 0≤k<N,Assuming that sequence length N is 2 integer power.
The fft algorithm of the temporal decimation of base 2 is to utilize twiddle factorSymmetry, periodicity and reducibility, by N point sequences X (n) is arranged by being separated in half before and after sequence number, by N point DFT X (k), k=0,1 ..., N-1, is divided into two by the odd even of frequency domain sequence number The DFT of N/2 points, i.e.,:
K=2r, k=2r+1, wherein r=0 are made, 1,2 ..., N/2-1, X (k) are separated by odd even sequence number, had:
If N/2 is still even number, then continue to decompose by such as upper type, untill two point DFT.
The butterfly computation flow of base 2 of the sequence X of N=16 points is as shown in figure 1, the FFT of 16 point sequence bases 2 is calculated and decomposed successively For 8 points, 4 points, 2 point DFT.Length is that the fft algorithm of N sequence base 2 needs to carry out log2N levels, every grade again have n times butterfly fortune Calculate, in every grade of butterfly unit, former and later two treat that operational data is equally spaced, and carry out the butterfly computation of same structure, butterfly The data break of shape unit is N/2j, wherein j be butterfly computation series, j=1,2 ... log2N;Two number sums are stored back to again The original position of first data, the difference of two numbers and the product of butterfly coefficient are then stored back to the original position of second data, and what FFT was calculated should Characteristic is especially suitable for carrying out the parallel processing of data and realizes vector quantities operation in SIMD extension structure.
With the development of integrated circuit technique and performance requirement, single instruction stream multiple data stream (Single Instruction Multiple Data, SIMD) as the important extension of high-performance microprocessor, single-chip can also integrate more and more structure Functional unit.Using superscale or very long instruction word (Very LongInstruction Word, VLIW) structure, can make Multiple functional units carry out computing in a manner of SIMD to data, parallel to develop more instruction-levels, data level, so as to obtain more High operational performance.To make full use of multiplier in microprocessor arithmetic element, adder, computational efficiency, high-performance are improved Microprocessor generally support double access bandwidths (or more) parallel accessing operation.A FFT butterfly computation is normal except coefficient Number, it is also necessary to which a bat can provide two operands, therefore FFT computings need to provide operation using double access bandwidths of microprocessor Number.
Because the area and power consumption of capacity identical dual-port memory bank are usually 2 times of single port memory bank, and on piece Mass storage area and power consumption have strict limitation, therefore storage organization is typically chosen the integral number power that quantity is 2 on piece Individual single port memory bank, by low level crossing parallel mode tissue, double access bands can be provided with relatively low area and power consumption cost It is wide.But due to the discontinuity and symmetry for the treatment of operational data address of FFT butterfly computations, each butterfly computation all exists parallel Memory access conflict;Especially in SIMD extension structure, memory access conflict causes vectorial memory bandwidth service efficiency to reduce, FFT reality Computational efficiency will significantly be less than theoretical peak.
The content of the invention
The technical problem to be solved in the present invention is that:For technical problem existing for prior art, the present invention provides one Kind implementation method is simple, it is small towards FFT to eliminate memory access conflict in FFT parallel computations, memory access efficiency high and hardware consumption The Lothrus apterus storage access method of parallel computation.
In order to solve the above technical problems, technical scheme proposed by the present invention is:
A kind of Lothrus apterus towards FFT parallel computations stores access method, and step includes:
1) judge the structure of current processor, if SIMD architecture, be transferred to and perform step 3);Otherwise it is transferred to execution step 2);
2) a storage group storage operational data is configured, the storage group includes multiple parallel single port memory banks;Hold When row FFT is calculated, the linear address for treating operational data is mapped as two-dimentional Lothrus apterus memory access address, the two-dimentional Lothrus apterus memory access Address corresponds to treat the target storage volume where operational data and the address in target storage volume, according to the two-dimentional Lothrus apterus Memory access address carries out memory access to data;
3) multiple storage group storage operational datas are configured, each storage group includes multiple parallel single port memory banks;Hold When row FFT is calculated, the linear address for treating operational data is mapped as three-dimensional Lothrus apterus memory access address, the three-dimensional memory access address pair It should be and treat target storage group, target storage volume and the address in target storage volume where operational data, according to the three-dimensional Lothrus apterus memory access address carries out memory access to data.
As a further improvement on the present invention:Wherein P of multiple parallel single port memory banks deposit in the step 2) Storage body is addressed according to low level interleaved mode, and P is the odd number more than 3;Each storage group wherein P memory bank in step 3) Addressed according to low level interleaved mode, and P is the odd number more than 3.
As a further improvement on the present invention:The linear address for treating operational data is mapped according to the following formula in the step 2) For two-dimentional Lothrus apterus memory access address (X, Y);
Wherein, Y is to treat the target storage volume position where operational data, and X is to treat operational data in target storage volume Row address, Addr is the linear address for treating operational data, and for W to treat operational data granularity, p is to intersect addressing using low level Memory bank number, mod represent modulo operation, and N is the sequence length that FFT is calculated.
As a further improvement on the present invention:The linear address for treating operational data is mapped according to the following formula in the step 3) For three-dimensional Lothrus apterus memory access address (X, Y, Z);
Wherein, Y is to treat the target storage volume group position where operational data, and Z is to treat operational data in target storage group The position of target storage volume, X are the row address for treating operational data in target storage volume;Addr is to treat operational data linearly Location, the positive integer pwoer that G is SIMD width and G is 2, p are the memory bank number for intersecting addressing in every group of storage group using low level, Mod represents modulo operation, and N is the sequence length that FFT is calculated.
Compared with prior art, the advantage of the invention is that:
1) present invention is by the way that for non-SIMD architecture and double access microprocessors of SIMD architecture, single port is deposited respectively Storage body is organized as one-dimension storage group, more body organizational forms of two-dimensional storage group respectively, when performing FFT and calculating, by will be to be shipped The linear address branch for the evidence that counts is mapped as two-dimentional Lothrus apterus memory access address, three-dimensional Lothrus apterus memory access address, can effectively eliminate Memory access conflict in FFT computings, realizes the parallel memory access of Lothrus apterus of FFT computings, while improves FFT operation efficiencies.
2) present invention is directed to the microprocessor with SIMD extension structure, and the more bodies of single port memory bank are organized as by SIMD The two-dimensional storage array structure that mode operates, it would be preferable to support the Lothrus apterus memory access of the vectorization extension of FFT parallel algorithms, so as to big Width improves FFT operation efficiency.
3) present invention by will do not use SIMD architecture in treat operational data linear address be mapped as corresponding memory bank, The address in memory bank, corresponding storage group, memory bank will be mapped as using the linear address that operational data is treated in SIMD architecture And the address in memory bank, only change the calculation of memory access address, thus required hardware spending very little.
Brief description of the drawings
Fig. 1 is the realization principle schematic diagram for the base 2FFT butterfly computations that length is 16.
Fig. 2 is the implementation process schematic diagram that the present embodiment stores access method based on the Lothrus apterus towards FFT parallel computations.
Fig. 3 is the principle schematic diagram of memory bank tissue under non-SIMD architecture in the present embodiment.
Fig. 4 is the principle schematic diagram of memory bank tissue under SIMD architecture in embodiment.
Fig. 5 is the principle schematic diagram of memory bank tissue under non-SIMD architecture in the specific embodiment of the invention.
Fig. 6 is the principle schematic diagram of memory bank tissue under SIMD architecture in the specific embodiment of the invention.
Embodiment
Below in conjunction with Figure of description and specific preferred embodiment, the invention will be further described, but not therefore and Limit the scope of the invention.
As shown in Fig. 2 the present embodiment stores access method towards the Lothrus apterus of FFT parallel computations, step includes:
1) judge the structure of current processor, if SIMD architecture, be transferred to and perform step 3);Otherwise it is transferred to execution step 2);
2) a storage group storage operational data is configured, storage group includes multiple parallel single port memory banks;Perform FFT During calculating, the linear address for treating operational data is mapped as two-dimentional Lothrus apterus memory access address, two-dimentional Lothrus apterus memory access address is corresponding To treat the target storage volume where operational data and the address in target storage volume, according to two-dimentional Lothrus apterus memory access address logarithm According to progress memory access;
3) multiple storage group storage operational datas are configured, each storage group includes multiple parallel single port memory banks;Hold When row FFT is calculated, the linear address for treating operational data is mapped as three-dimensional Lothrus apterus memory access address, three-dimensional memory access address corresponds to Target storage group, target storage volume and the address in target storage volume where operational data are treated, is visited according to three-dimensional Lothrus apterus Deposit address and memory access is carried out to data.
In the present embodiment, for double access store bandwidth demands of microprocessor, based on multiple single port memory bank sram (Static RAM) builds on-chip memory, and multiple single port memory bank sram support parallel memory access, with reduce area and Power consumption.
In the present embodiment, multiple parallel single port memory bank wherein P memory banks are according to low level intersection side in step 2) Formula is addressed, and P is the odd number more than 3;Each storage group wherein P memory bank enters according to low level interleaved mode in step 3) Row addressing, and P is the odd number more than 3.
The linear address for treating operational data is mapped as into two-dimentional Lothrus apterus according to formula (4) in the present embodiment, in step 2) to visit Deposit address (X, Y);
Wherein, Y is to treat the target storage volume position where operational data, and X is to treat operational data in target storage volume Row address, Addr is the linear address for treating operational data, and for W to treat operational data granularity, p is to intersect addressing using low level Memory bank number, mod represent modulo operation, and N is the sequence length that FFT is calculated.Expression is taken less than or equal to Addr/W Maximum integer.
The present embodiment is not in the processor of SIMD architecture is used, it is assumed that memory span 2HByte, H are positive integer, It is W bytes to treat operational data granularity, and assumes the positive integer pwoer that W is 2, and all memory banks are used or part is intersected using low level The structure of addressing, wherein using low level to intersect the memory bank of addressing for P (P is the odd number not less than 3).It is as shown in figure 3, whole The byte address of memory is H positions, is expressed as Addr [H-1:0], the data address in units for the treatment of operational data granularity is Data_Addr=Addr/W, actual address of the data in memory bank can use two-dimensional coordinate (X, Y) to represent, wherein Y is represented Actual memory access memory bank sram sequence number, X represent the row address where operational data is treated in choosing memory bank, the line of memory bank Property address Addr and actual address (X, Y) mapping relations such as formula (4) shown in, actual address (X, Y) as maps two obtained Tie up Lothrus apterus memory access address.
Above-mentioned memory bank organizational form is used in the processor for not using SIMD architecture, carries out the FFT that sequence length is N During computing (N is 2 positive integer pwoer), whole nothing can be realized during the butterfly computation by double access Parallel Implementation FFT Conflict accessing operation.
In the present embodiment, the linear address for treating operational data is mapped as three-dimensional Lothrus apterus memory access according to the following formula in step 3) Address (X, Y, Z);
Wherein, Y is to treat the target storage volume group position where operational data, and Z is to treat operational data in target storage group The position of target storage volume, X are the row address for treating operational data in target storage volume;Addr is to treat operational data linearly Location, the positive integer pwoer that G is SIMD width and G is 2, P are the memory bank number for intersecting addressing in every group of storage group using low level, Mod represents modulo operation, and N is the sequence length that FFT is calculated.
The present embodiment is in using SIMD architecture processor, it is assumed that SIMD width is G, and G is 2 positive integer pwoer, will not adopted The SIMD extension that quantity is G is carried out with the bank structure of SIMD architecture and obtains the memory bank of SIMD architecture, wherein single storage Structure still in whole or in part have low level intersect addressing structure, and wherein use low level intersect address memory bank for P ' individual (P ' is the odd number not less than 3), each memory bank width is W bytes.It is assumed that memory span is 2HWord Section, it is W bytes to treat operational data width, and assumes the integer power that W is 2, sequence length N, the byte address of whole memory For H positions, Addr [H-1 are expressed as:0], the data address in units for the treatment of operational data granularity is Data_Addr=Addr/W, Actual address of the data in memory bank can use three-dimensional coordinate (X, Y, Z) to represent, wherein Y represents actual memory access address in G Position in individual region, Z represent the position in the memory bank of p, the region, and X then represents corresponding row address, whole or part Shown in the linear address Addr and actual address (X, Y, Z) of memory bank mapping relations such as formula (5), actual address (X, Y, Z) is i.e. To map obtained three-dimensional Lothrus apterus memory access address.
Above-mentioned memory bank organizational form is used in the processor using SIMD architecture, FFT vectors are carried out by double access When changing concurrent operation, it is possible to achieve whole Lothrus apterus performs.
Below using operational data width W as 4, the present invention is further illustrated exemplified by sequence length N.
As shown in figure 5, in the present embodiment under non-SIMD architecture, low level is carried out using P=3 memory bank and intersects addressing, Operational data width W is 4, and sequence length N, two-dimentional Lothrus apterus memory access address is represented with coordinate (X, Y), and wherein Y represents that target is visited Memory bank sram sequence number is deposited, X represents to treat row address of the operational data in target storage volume, will treat operational data linearly Location Addr is mapped as two-dimentional Lothrus apterus memory access address (X, Y) by formula (6):
Wherein, Y represents to treat operational data position in 3 memory banks, and X represents to treat that operational data is corresponding in memory bank Row address.
As shown in fig. 6, in the present embodiment under SIMD architecture, it is low using P=3 memory bank progress in every group of memory bank group Position intersects addressing, and operational data width W is 4, and sequence length N, SIMD width takes 16, then whole memory bank group is divided into 16 Region, three-dimensional Lothrus apterus memory access address is represented with coordinate (X, Y, Z), and the linear address Addr for treating operational data is reflected by formula (7) Penetrate as three-dimensional Lothrus apterus memory access address (X, Y, Z):
Wherein, Y represents to treat operational data position in 16 regions, and Z represents to treat operational data in 3, region memory bank In position, X then represents to treat operational data corresponding row address in memory bank.
Above-mentioned simply presently preferred embodiments of the present invention, not makees any formal limitation to the present invention.It is although of the invention It is disclosed above with preferred embodiment, but it is not limited to the present invention.Therefore, it is every without departing from technical solution of the present invention Content, according to the technology of the present invention essence to any simple modifications, equivalents, and modifications made for any of the above embodiments, it all should fall In the range of technical solution of the present invention protection.

Claims (1)

1. a kind of Lothrus apterus towards FFT parallel computations stores access method, it is characterised in that step includes:
1) judge the structure of current processor, if SIMD architecture, be transferred to and perform step 3);Otherwise it is transferred to and performs step 2);
2) a storage group storage operational data is configured, the storage group includes multiple parallel single port memory banks;Perform FFT During calculating, the linear address for treating operational data is mapped as two-dimentional Lothrus apterus memory access address, the two-dimentional Lothrus apterus memory access address Correspond to treat the target storage volume where operational data and the address in target storage volume, visited according to the two-dimentional Lothrus apterus Deposit address and carry out data memory access;
3) multiple storage group storage operational datas are configured, each storage group includes multiple parallel single port memory banks;Perform FFT During calculating, the linear address for treating operational data is mapped as three-dimensional Lothrus apterus memory access address, the three-dimensional Lothrus apterus memory access address Correspond to treat target storage group, target storage volume and the address in target storage volume where operational data, according to described Three-dimensional Lothrus apterus memory access address carries out data memory access;
The wherein P memory bank of multiple parallel single port memory banks is compiled according to low level interleaved mode in the step 2) Location, and P is the odd number more than 3;Each storage group wherein P memory bank is addressed according to low level interleaved mode in step 3), And P is the odd number more than 3;
The linear address for treating operational data is mapped as two-dimentional Lothrus apterus memory access address (X, Y) according to the following formula in the step 2);
Wherein, Y is to treat the target storage volume position where operational data, and X is the row ground for treating operational data in target storage volume Location, Addr are the linear address for treating operational data, and for W to treat operational data granularity, p is the memory bank for intersecting addressing using low level Number, mod represent modulo operation, and N is the sequence length that FFT is calculated;
In the step 3) by the linear address for treating operational data be mapped as according to the following formula three-dimensional Lothrus apterus memory access address (X, Y, Z);
Wherein, Y is to treat the target storage group position where operational data, and Z is to treat that operational data target in target storage group is deposited The position of body is stored up, X is the row address for treating operational data in target storage volume;Addr is the linear address for treating operational data, and G is The positive integer pwoer that SIMD width and G are 2, p are the memory bank number for intersecting addressing in every group of storage group using low level, and mod is represented Modulo operation, N are the sequence length that FFT is calculated.
CN201510137874.6A 2015-03-26 2015-03-26 Lothrus apterus towards FFT parallel computations stores access method Active CN104699624B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510137874.6A CN104699624B (en) 2015-03-26 2015-03-26 Lothrus apterus towards FFT parallel computations stores access method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510137874.6A CN104699624B (en) 2015-03-26 2015-03-26 Lothrus apterus towards FFT parallel computations stores access method

Publications (2)

Publication Number Publication Date
CN104699624A CN104699624A (en) 2015-06-10
CN104699624B true CN104699624B (en) 2018-01-23

Family

ID=53346775

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510137874.6A Active CN104699624B (en) 2015-03-26 2015-03-26 Lothrus apterus towards FFT parallel computations stores access method

Country Status (1)

Country Link
CN (1) CN104699624B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107748723B (en) * 2017-09-28 2020-03-20 中国人民解放军国防科技大学 Storage method and access device supporting conflict-free stepping block-by-block access
CN109635235B (en) * 2018-11-06 2020-09-25 海南大学 Triangular part storage device of self-conjugate matrix and parallel reading method
CN111158757B (en) * 2019-12-31 2021-11-30 中昊芯英(杭州)科技有限公司 Parallel access device and method and chip
CN112163187B (en) * 2020-11-18 2023-07-07 无锡江南计算技术研究所 Ultra-long point high-performance FFT (fast Fourier transform) computing device
CN112822139B (en) * 2021-02-04 2023-01-31 展讯半导体(成都)有限公司 Data input and data conversion method and device
CN113094639B (en) * 2021-03-15 2022-12-30 Oppo广东移动通信有限公司 DFT parallel processing method, device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101290613A (en) * 2007-04-16 2008-10-22 卓胜微电子(上海)有限公司 FFT processor data storage system and method
CN101339546A (en) * 2008-08-07 2009-01-07 那微微电子科技(上海)有限公司 Address mappings method and operand parallel FFT processing system
CN102508802A (en) * 2011-11-16 2012-06-20 刘大可 Data writing method based on parallel random storages, data reading method based on same, data writing device based on same, data reading device based on same and system
CN103116555A (en) * 2013-03-05 2013-05-22 中国人民解放军国防科学技术大学 Data access method based on multi-body parallel cache structure

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080172529A1 (en) * 2007-01-17 2008-07-17 Tushar Prakash Ringe Novel context instruction cache architecture for a digital signal processor

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101290613A (en) * 2007-04-16 2008-10-22 卓胜微电子(上海)有限公司 FFT processor data storage system and method
CN101339546A (en) * 2008-08-07 2009-01-07 那微微电子科技(上海)有限公司 Address mappings method and operand parallel FFT processing system
CN102508802A (en) * 2011-11-16 2012-06-20 刘大可 Data writing method based on parallel random storages, data reading method based on same, data writing device based on same, data reading device based on same and system
CN103116555A (en) * 2013-03-05 2013-05-22 中国人民解放军国防科学技术大学 Data access method based on multi-body parallel cache structure

Also Published As

Publication number Publication date
CN104699624A (en) 2015-06-10

Similar Documents

Publication Publication Date Title
CN104699624B (en) Lothrus apterus towards FFT parallel computations stores access method
CN106875013B (en) System and method for multi-core optimized recurrent neural networks
CN102375805B (en) Vector processor-oriented FFT (Fast Fourier Transform) parallel computation method based on SIMD (Single Instruction Multiple Data)
CN104572295B (en) It is matched with the structured grid data management process of high-performance calculation machine architecture
US20220360428A1 (en) Method and Apparatus for Configuring a Reduced Instruction Set Computer Processor Architecture to Execute a Fully Homomorphic Encryption Algorithm
CN103049241A (en) Method for improving computation performance of CPU (Central Processing Unit) +GPU (Graphics Processing Unit) heterogeneous device
CN103970718A (en) Quick Fourier transformation implementation device and method
CN101847137B (en) FFT processor for realizing 2FFT-based calculation
CN102495721A (en) Single instruction multiple data (SIMD) vector processor supporting fast Fourier transform (FFT) acceleration
Xiao et al. Reduced memory architecture for CORDIC-based FFT
US20180373677A1 (en) Apparatus and Methods of Providing Efficient Data Parallelization for Multi-Dimensional FFTs
CN107391439B (en) Processing method capable of configuring fast Fourier transform
CN103034621B (en) The address mapping method of base 2 × K parallel FFT framework and system
Kasagi et al. An optimal offline permutation algorithm on the hierarchical memory machine, with the GPU implementation
CN102567283B (en) Method for small matrix inversion by using GPU (graphic processing unit)
KR101696987B1 (en) Fft/dft reverse arrangement system and method and computing system thereof
CN104504205A (en) Parallelizing two-dimensional division method of symmetrical FIR (Finite Impulse Response) algorithm and hardware structure of parallelizing two-dimensional division method
CN104050148A (en) FFT accelerator
Nakano Asynchronous memory machine models with barrier synchronization
Sorokin et al. Conflict-free parallel access scheme for mixed-radix FFT supporting I/O permutations
CN107305486A (en) A kind of neutral net maxout layers of computing device
EP4011030A1 (en) Configuring a reduced instruction set computer processor architecture to execute a fully homomorphic encryption algorithm
CN115544438A (en) Twiddle factor generation method and device in digital communication system and computer equipment
Wu et al. Optimizing dynamic programming on graphics processing units via data reuse and data prefetch with inter-block barrier synchronization
Shao et al. Processing grid-format real-world graphs on DRAM-based FPGA accelerators with application-specific caching mechanisms

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant