CN104699624A - FFT (fast Fourier transform) parallel computing-oriented conflict-free storage access method - Google Patents

FFT (fast Fourier transform) parallel computing-oriented conflict-free storage access method Download PDF

Info

Publication number
CN104699624A
CN104699624A CN201510137874.6A CN201510137874A CN104699624A CN 104699624 A CN104699624 A CN 104699624A CN 201510137874 A CN201510137874 A CN 201510137874A CN 104699624 A CN104699624 A CN 104699624A
Authority
CN
China
Prior art keywords
operational data
address
fft
memory access
lothrus apterus
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510137874.6A
Other languages
Chinese (zh)
Other versions
CN104699624B (en
Inventor
陈海燕
刘胜
陈书明
郭阳
燕世林
刘仲
万江华
陈胜刚
杨超
梁停雨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN201510137874.6A priority Critical patent/CN104699624B/en
Publication of CN104699624A publication Critical patent/CN104699624A/en
Application granted granted Critical
Publication of CN104699624B publication Critical patent/CN104699624B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Complex Calculations (AREA)

Abstract

The invention discloses an FFT (fast Fourier transform) parallel computing-oriented conflict-free storage access method. The method includes the steps of 1, judging the structure of a current processor; if the structure is an SIMD (single instruction multiple data) structure, executing the step 3; if not, executing the step 2; 2, configuring a storage set to store computing data, with the storage set comprising a plurality of parallel single-port storages; during executing FFT computing, mapping addresses of data to be computed, as the corresponding target storages and two-dimensional conflict-free memory access addresses of inner addresses of the target storages; 3, configuring a plurality of parallel storage sets to store the computing data, with each storage set comprising a plurality of parallel single-port storages; during executing the FFT computing, mapping the addresses of the data to be computed, as the corresponding target storage sets and three-dimensional conflict-free memory access addresses of inner address of the target storages. The method has the advantages that conflict-free access of the FFT parallel computing is achieved, memory access efficiency is high and hardware cost is low.

Description

Towards the Lothrus apterus memory access method of FFT parallel computation
Technical field
The present invention relates to FFT computing field in microprocessor, particularly relate to a kind of Lothrus apterus memory access method towards FFT parallel computation.
Background technology
FFT (Fast Fourier Transform, Fast Fourier Transform (FFT)) to be nineteen sixty-five proposed by J.W. Cooley and T.W. figure base algorithm realizes discrete Fourier transformation (Discrete Fourier Transform, DFT) a kind of fast algorithm, be the core algorithm in many Embedded Application such as radio communication, image procossing, the height of its operational performance often decides the processing capability in real time of whole digital processing system.The performance of development to FFT of application demand it is also proposed more and more higher requirement, and along with the development of digital signal processor techniques, making to realize efficient programmable FFT parallel algorithm becomes possibility.
The implementation method of fft algorithm common is at present divided into two kinds, and the first is special FFT hardware accelerator, and such as, based on FPGA mode or it can be used as FFT hardware co-processor in microprocessor sheet, it only accelerates for fft algorithm; The second is the software programming realization based on general purpose microprocessor or digital signal processor instructions architecture.The restricted application of first method, the development and change that can not satisfy the demands, and realization price of hardware is high, lack dirigibility; Second method is owing to being the realization based on instruction set programmed method, thus there is certain dirigibility, versatility, and along with the development of high-performance microprocessor technology, make to adopt and also obtain the operational performance suitable with special FFT hardware accelerator in this way.
The DFT computing formula of N point sequence x (n) is as follows:
X ( k ) = Σ n = 0 N - 1 x ( n ) W N nk - - - ( 1 )
Wherein, 0≤k<N, hypothetical sequence length N is the integer power of 2.
The fft algorithm of base 2 temporal decimation utilizes twiddle factor symmetry, periodicity and reducibility, N point sequence x (n) is separated in half by before and after sequence number, by N point DFT X (k), k=0,1 ..., N-1, is divided into the DFT of two N/2 points, that is: by the odd even of frequency domain sequence number
X ( k ) = &Sigma; n = 0 N / 2 - 1 x ( 2 n ) W N / 2 nk + W N k &Sigma; n = 0 N / 2 - 1 x ( 2 n + 1 ) W N / 2 nk - - - ( 2 )
Make k=2r, k=2r+1, wherein r=0,1,2 ..., N/2-1, X (k) by odd even sequence number separately, have:
X ( 2 r ) = &Sigma; n = 0 N / 2 - 1 [ x ( n ) + x ( n + N / 2 ) ] W N / 2 rn X ( 2 r + 1 ) = &Sigma; n = 0 N / 2 - 1 [ x ( n ) - x ( n + N / 2 ) ] W N n W N / 2 rn - - - ( 3 )
As N/2 remains even number, then continue to decompose, until 2 DFT by such as upper type.
As shown in Figure 1, the FFT of 16 point sequence bases 2 calculates and is decomposed into 8 points, 4 points, 2 DFT successively the base 2 butterfly computation flow process of the sequence X of N=16 point.Length is that the fft algorithm of the sequence base 2 of N needs to carry out log 2n level, every grade have again N butterfly computation, in every grade of butterfly unit, former and later two treat that operational data is equally spaced, and carry out the butterfly computation of same structure, and the data break of butterfly unit is N/2 j, wherein j is the progression of butterfly computation, j=1,2 ... log 2n; Again two number sums are deposited back the original position of first data, the differences of two numbers and the long-pending original position then depositing back second data of butterfly coefficient, this characteristic of FFT calculating be applicable to very much carrying out data parallel processing and in the computing of SIMD expansion structure witness vector.
Along with the development of integrated circuit technique and performance requirement, single instruction stream multiple data stream (Single Instruction MultipleData, SIMD) structure has become the important expansion of high-performance microprocessor, and single-chip is the increasing functional unit of accessible site also.Adopt superscale or very long instruction word (Very LongInstruction Word, VLIW) structure, multiple functional unit can be made to carry out computing in SIMD mode to data, and to develop more instruction-level, data level walks abreast, thus obtains higher operational performance.For making full use of multiplier, the totalizer in microprocessor arithmetic element, improve counting yield, high-performance microprocessor usually support two access bandwidth (or more) parallel accessing operation.A butterfly computation of FFT, except coefficient constant, also needs a bat can provide two operands, and therefore FFT computing needs the two access bandwidths utilizing microprocessor to provide operand.
Due to 2 times that the area of the identical dual-port memory bank of capacity and power consumption are generally single port memory banks, and mass storage area and power consumption have strict restriction on sheet, therefore on sheet storage organization generally select quantity be 2 an integral number power single port memory bank, organize by low level crossing parallel mode, two access bandwidth can be provided with lower side sum power consumption cost.But due to the uncontinuity treating operational data address and the symmetry of FFT butterfly computation, all there is parallel memory access conflict in each butterfly computation; Special in SIMD expansion structure, memory access conflict causes vectorial memory bandwidth service efficiency to reduce, and the actual computation efficiency of FFT will significantly lower than theoretical peak.
Summary of the invention
The technical problem to be solved in the present invention is just: the technical matters existed for prior art, the invention provides a kind of implementation method simple, the high and Lothrus apterus memory access method towards FFT parallel computation that hardware consumption is little of memory access conflict in FFT parallel computation, memory access efficiency can be eliminated.
For solving the problems of the technologies described above, the technical scheme that the present invention proposes is:
Towards a Lothrus apterus memory access method for FFT parallel computation, step comprises:
1) judge the structure of current processor, if SIMD structure, proceed to and perform step 3); Otherwise proceed to and perform step 2);
2) configure a storage sets and store operational data, described storage sets comprises multiple parallel single port memory bank; When performing FFT calculating, to treat that the linear address of operational data is mapped as two-dimentional Lothrus apterus memory access address, described two-dimentional Lothrus apterus memory access address corresponds to the target storage volume for the treatment of operational data place and address in target storage volume, carries out memory access according to described two-dimentional Lothrus apterus memory access address to data;
3) configure multiple storage sets and store operational data, each storage sets comprises multiple parallel single port memory bank; When performing FFT calculating, to treat that the linear address of operational data is mapped as three-dimensional Lothrus apterus memory access address, described three-dimensional memory access address correspond to treat operational data place target storage sets, target storage volume and address in target storage volume, according to described three-dimensional Lothrus apterus memory access address, memory access is carried out to data.
As a further improvement on the present invention: described step 2), a wherein P memory bank of multiple parallel single port memory bank addresses according to low level interleaved mode, and P be greater than 3 odd number; Step 3) in each storage sets wherein P memory bank address according to low level interleaved mode, and P be greater than 3 odd number.
To treat that the linear address of operational data is mapped as two-dimentional Lothrus apterus memory access address (X, Y) according to the following formula as a further improvement on the present invention: described step 2);
Wherein, Y is the target storage volume position for the treatment of operational data place, X is treating the row address of operational data in target storage volume, Addr be treat operational data linear address, W is for treating operational data granularity, p is the memory bank number adopting the addressing of low level intersection, and mod represents modulo operation, and N is the sequence length that FFT calculates.
To treat that the linear address of operational data is mapped as three-dimensional Lothrus apterus memory access address (X, Y, Z) according to the following formula as a further improvement on the present invention: described step 3);
Wherein, Y is the target storage volume group position for the treatment of operational data place, and Z is the position for the treatment of operational data target storage volume in target storage sets, and X treats the row address of operational data in target storage volume; Addr is the linear address treating operational data, and G is SIMD width and G is the positive integer pwoer of 2, and p often organizes in storage sets the memory bank number adopting the addressing of low level intersection, and mod represents modulo operation, and N is the sequence length that FFT calculates.
Compared with prior art, the invention has the advantages that:
1) the present invention passes through the two access microprocessors respectively for non-SIMD structure and SIMD structure, single port memory bank is organized as respectively many somas mode of one-dimension storage group, two-dimensional storage group, when performing FFT and calculating, by treating that the linear address branch of operational data is mapped as two-dimentional Lothrus apterus memory access address, three-dimensional Lothrus apterus memory access address, effectively can eliminate the memory access conflict in FFT computing, the Lothrus apterus realizing FFT computing walks abreast memory access, improves FFT operation efficiency simultaneously.
2) the present invention is directed to the microprocessor with SIMD expansion structure, it is the two-dimensional storage array structure operated by SIMD mode by many for single port memory bank somas, the Lothrus apterus memory access that the vectorization of FFT parallel algorithm is expanded can be supported, thus significantly improve the operation efficiency of FFT.
3) by not adopting in SIMD structure, the present invention is by treating that the linear address of operational data is mapped as corresponding memory bank, address in memory bank, treat that the linear address of operational data is mapped as corresponding storage sets, memory bank and address in memory bank by adopting in SIMD structure, only change the account form of memory access address, thus required hardware spending is very little.
Accompanying drawing explanation
Fig. 1 to be length be base 2FFT butterfly computation of 16 realize principle schematic.
Fig. 2 is the realization flow schematic diagram of the present embodiment based on the Lothrus apterus memory access method towards FFT parallel computation.
Fig. 3 is the structural principle schematic diagram of memory bank tissue under non-SIMD structure in the present embodiment.
Fig. 4 is the structural principle schematic diagram of memory bank tissue under SIMD structure in embodiment.
Fig. 5 is the structural principle schematic diagram of memory bank tissue under non-SIMD structure in the specific embodiment of the invention.
Fig. 6 is the structural principle schematic diagram of memory bank tissue under SIMD structure in the specific embodiment of the invention.
Embodiment
Below in conjunction with Figure of description and concrete preferred embodiment, the invention will be further described, but protection domain not thereby limiting the invention.
As shown in Figure 2, the present embodiment is towards the Lothrus apterus memory access method of FFT parallel computation, and step comprises:
1) judge the structure of current processor, if SIMD structure, proceed to and perform step 3); Otherwise proceed to and perform step 2);
2) configure a storage sets and store operational data, storage sets comprises multiple parallel single port memory bank; When performing FFT calculating, to treat that the linear address of operational data is mapped as two-dimentional Lothrus apterus memory access address, two dimension Lothrus apterus memory access address corresponds to the target storage volume for the treatment of operational data place and address in target storage volume, carries out memory access according to two-dimentional Lothrus apterus memory access address to data;
3) configure multiple storage sets and store operational data, each storage sets comprises multiple parallel single port memory bank; When performing FFT calculating, to treat that the linear address of operational data is mapped as three-dimensional Lothrus apterus memory access address, three-dimensional memory access address correspond to treat operational data place target storage sets, target storage volume and address in target storage volume, according to three-dimensional Lothrus apterus memory access address, memory access is carried out to data.
In the present embodiment, for two access store bandwidth demands of microprocessor, build on-chip memory based on multiple single port memory bank sram (StaticRAM), and multiple single port memory bank sram supports parallel memory access, to reduce area and power consumption.
In the present embodiment, step 2) in multiple parallel single port memory bank wherein P memory bank address according to low level interleaved mode, and P be greater than 3 odd number; Step 3) in each storage sets wherein P memory bank address according to low level interleaved mode, and P be greater than 3 odd number.
In the present embodiment, step 2) in will treat that the linear address of operational data is mapped as two-dimentional Lothrus apterus memory access address (X, Y) according to formula (4);
Wherein, Y is the target storage volume position for the treatment of operational data place, X is treating the row address of operational data in target storage volume, Addr be treat operational data linear address, W is for treating operational data granularity, p is the memory bank number adopting the addressing of low level intersection, and mod represents modulo operation, and N is the sequence length that FFT calculates. represent and get the maximum integer being less than or equal to Addr/W.
The present embodiment, in the processor not adopting SIMD structure, supposes that memory span is 2 hbyte, H is positive integer, treats that operational data granularity is W byte, and supposes that W is the positive integer pwoer of 2, all memory banks adopts or part adopts low level to intersect the structure of addressing, wherein adopt low level intersect the memory bank of addressing be P (P be not less than 3 odd number).As shown in Figure 3, the byte address of whole storer is H position, be expressed as Addr [H-1:0], to treat that data address in units of operational data granularity is for Data_Addr=Addr/W, the actual address of data in memory bank can use two-dimensional coordinate (X, Y) represent, wherein Y represents the sequence number of actual memory access memory bank sram, X represents and is choosing in memory bank the row address treating operational data place, the linear address Addr of memory bank and actual address (X, Y) mapping relations are such as formula shown in (4), actual address (X, Y) the two-dimentional Lothrus apterus memory access address mapping and obtain is.
Above-mentioned memory bank organizational form is adopted in the processor not adopting SIMD structure, when carrying out FFT computing (N is the positive integer pwoer of 2) that sequence length is N, in the process by the butterfly computation of two access Parallel Implementation FFT, omnidistance Lothrus apterus accessing operation can be realized.
In the present embodiment, step 3) in will treat that the linear address of operational data is mapped as three-dimensional Lothrus apterus memory access address (X, Y, Z) according to the following formula;
Wherein, Y is the target storage volume group position for the treatment of operational data place, and Z is the position for the treatment of operational data target storage volume in target storage sets, and X treats the row address of operational data in target storage volume; Addr is the linear address treating operational data, and G is SIMD width and G is the positive integer pwoer of 2, and P often organizes in storage sets the memory bank number adopting the addressing of low level intersection, and mod represents modulo operation, and N is the sequence length that FFT calculates.
The present embodiment is in employing SIMD architecture processor, suppose that SIMD width is G, G is the positive integer pwoer of 2, carrying out quantity by not adopting the bank structure of SIMD structure is that the SIMD expansion of G obtains the memory bank of SIMD structure, wherein single storage organization still has the structure of low level intersection addressing in whole or in part, and wherein adopting low level to intersect the memory bank of addressing is P ' individual (P ' odd number) for being not less than 3, and each memory bank width is W byte.As shown in Figure 4, suppose that memory span is 2 hbyte, treat that operational data width is W byte, and suppose that W is the integer power of 2, sequence length is N, the byte address of whole storer is H position, be expressed as Addr [H-1:0], to treat that data address in units of operational data granularity is for Data_Addr=Addr/W, the actual address of data in memory bank can use three-dimensional coordinate (X, Y, Z) represent, wherein Y represents position in G region, actual memory access address, Z represents the position in the memory bank of p, this region, X then represents corresponding row address, the linear address Addr of whole or part memory bank and actual address (X, Y, Z) mapping relations are such as formula shown in (5), actual address (X, Y, Z) the three-dimensional Lothrus apterus memory access address mapping and obtain is.
In the processor adopting SIMD structure, adopt above-mentioned memory bank organizational form, when carrying out the computing of FFT vectorized parallel by two access, omnidistance Lothrus apterus can be realized and perform.
Below with operational data width W for 4, sequence length N is that example further illustrates the present invention.
As shown in Figure 5, in the present embodiment under non-SIMD structure, adopt P=3 memory bank to carry out the addressing of low level intersection, operational data width W is 4, sequence length N, two dimension Lothrus apterus memory access address coordinate (X, Y) represent, wherein Y represents the sequence number of target memory access memory bank sram, and X represents and treats the row address of operational data in target storage volume, to treat that the linear address Addr of operational data is mapped as two-dimentional Lothrus apterus memory access address (X, Y) by formula (6):
Wherein, Y represents and treats operational data position in 3 memory banks, and X represents the row address treating that operational data is corresponding in memory bank.
As shown in Figure 6, in the present embodiment under SIMD structure, often organize in set of memory banks and adopt P=3 memory bank to carry out the addressing of low level intersection, operational data width W is 4, sequence length N, SIMD width gets 16, then whole set of memory banks has been divided into 16 regions, three-dimensional Lothrus apterus memory access address coordinate (X, Y, Z) represent, will treat that the linear address Addr of operational data is mapped as three-dimensional Lothrus apterus memory access address (X by formula (7), Y, Z):
Wherein, Y represents and treats operational data position in 16 regions, and Z represents and treats the position of operational data in the memory bank of 3, this region, and X then represents the row address treating that operational data is corresponding in memory bank.
Above-mentioned just preferred embodiment of the present invention, not does any pro forma restriction to the present invention.Although the present invention discloses as above with preferred embodiment, but and be not used to limit the present invention.Therefore, every content not departing from technical solution of the present invention, according to the technology of the present invention essence to any simple modification made for any of the above embodiments, equivalent variations and modification, all should drop in the scope of technical solution of the present invention protection.

Claims (4)

1., towards a Lothrus apterus memory access method for FFT parallel computation, it is characterized in that, step comprises:
1) judge the structure of current processor, if SIMD structure, proceed to and perform step 3); Otherwise proceed to and perform step 2);
2) configure a storage sets and store operational data, described storage sets comprises multiple parallel single port memory bank; When performing FFT calculating, to treat that the linear address of operational data is mapped as two-dimentional Lothrus apterus memory access address, described two-dimentional Lothrus apterus memory access address corresponds to the target storage volume for the treatment of operational data place and address in target storage volume, carries out data memory access according to described two-dimentional Lothrus apterus memory access address;
3) configure multiple storage sets and store operational data, each storage sets comprises multiple parallel single port memory bank; When performing FFT calculating, to treat that the linear address of operational data is mapped as three-dimensional Lothrus apterus memory access address, described three-dimensional memory access address correspond to treat operational data place target storage sets, target storage volume and address in target storage volume, carry out data memory access according to described three-dimensional Lothrus apterus memory access address.
2. the Lothrus apterus storage means towards FFT parallel computation according to claim 1, it is characterized in that: described step 2) in the wherein P memory bank of multiple parallel single port memory bank address according to low level interleaved mode, and P be greater than 3 odd number; Step 3) in each storage sets wherein P memory bank address according to low level interleaved mode, and P be greater than 3 odd number.
3. the Lothrus apterus storage means towards FFT parallel computation according to claim 2, is characterized in that: described step 2) in will treat that the linear address of operational data is mapped as two-dimentional Lothrus apterus memory access address (X, Y) according to the following formula;
Wherein, Y is the target storage volume position for the treatment of operational data place, X is treating the row address of operational data in target storage volume, Addr be treat operational data linear address, W is for treating operational data granularity, p is the memory bank number adopting the addressing of low level intersection, and mod represents modulo operation, and N is the sequence length that FFT calculates.
4. the Lothrus apterus storage means towards FFT parallel computation according to Claims 2 or 3, is characterized in that: described step 3) in will treat that the linear address of operational data is mapped as three-dimensional Lothrus apterus memory access address (X, Y, Z) according to the following formula;
Wherein, Y is the target storage volume group position for the treatment of operational data place, and Z is the position for the treatment of operational data target storage volume in target storage sets, and X treats the row address of operational data in target storage volume; Addr is the linear address treating operational data, and G is SIMD width and G is the positive integer pwoer of 2, and p often organizes in storage sets the memory bank number adopting the addressing of low level intersection, and mod represents modulo operation, and N is the sequence length that FFT calculates.
CN201510137874.6A 2015-03-26 2015-03-26 Lothrus apterus towards FFT parallel computations stores access method Active CN104699624B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510137874.6A CN104699624B (en) 2015-03-26 2015-03-26 Lothrus apterus towards FFT parallel computations stores access method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510137874.6A CN104699624B (en) 2015-03-26 2015-03-26 Lothrus apterus towards FFT parallel computations stores access method

Publications (2)

Publication Number Publication Date
CN104699624A true CN104699624A (en) 2015-06-10
CN104699624B CN104699624B (en) 2018-01-23

Family

ID=53346775

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510137874.6A Active CN104699624B (en) 2015-03-26 2015-03-26 Lothrus apterus towards FFT parallel computations stores access method

Country Status (1)

Country Link
CN (1) CN104699624B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107748723A (en) * 2017-09-28 2018-03-02 中国人民解放军国防科技大学 Storage method and access device supporting conflict-free stepping block-by-block access
CN109635235A (en) * 2018-11-06 2019-04-16 海南大学 A kind of the triangular portions storage device and parallel read method of self adjoint matrix
CN111158757A (en) * 2019-12-31 2020-05-15 深圳芯英科技有限公司 Parallel access device and method and chip
CN112163187A (en) * 2020-11-18 2021-01-01 无锡江南计算技术研究所 Overlength point high performance FFT accounting device
CN112822139A (en) * 2021-02-04 2021-05-18 展讯半导体(成都)有限公司 Data input and data conversion method and device
CN113094639A (en) * 2021-03-15 2021-07-09 Oppo广东移动通信有限公司 DFT parallel processing method, device, equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080172529A1 (en) * 2007-01-17 2008-07-17 Tushar Prakash Ringe Novel context instruction cache architecture for a digital signal processor
CN101290613A (en) * 2007-04-16 2008-10-22 卓胜微电子(上海)有限公司 FFT processor data storage system and method
CN101339546A (en) * 2008-08-07 2009-01-07 那微微电子科技(上海)有限公司 Address mappings method and operand parallel FFT processing system
CN102508802A (en) * 2011-11-16 2012-06-20 刘大可 Data writing method based on parallel random storages, data reading method based on same, data writing device based on same, data reading device based on same and system
CN103116555A (en) * 2013-03-05 2013-05-22 中国人民解放军国防科学技术大学 Data access method based on multi-body parallel cache structure

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080172529A1 (en) * 2007-01-17 2008-07-17 Tushar Prakash Ringe Novel context instruction cache architecture for a digital signal processor
CN101290613A (en) * 2007-04-16 2008-10-22 卓胜微电子(上海)有限公司 FFT processor data storage system and method
CN101339546A (en) * 2008-08-07 2009-01-07 那微微电子科技(上海)有限公司 Address mappings method and operand parallel FFT processing system
CN102508802A (en) * 2011-11-16 2012-06-20 刘大可 Data writing method based on parallel random storages, data reading method based on same, data writing device based on same, data reading device based on same and system
CN103116555A (en) * 2013-03-05 2013-05-22 中国人民解放军国防科学技术大学 Data access method based on multi-body parallel cache structure

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107748723A (en) * 2017-09-28 2018-03-02 中国人民解放军国防科技大学 Storage method and access device supporting conflict-free stepping block-by-block access
CN107748723B (en) * 2017-09-28 2020-03-20 中国人民解放军国防科技大学 Storage method and access device supporting conflict-free stepping block-by-block access
CN109635235A (en) * 2018-11-06 2019-04-16 海南大学 A kind of the triangular portions storage device and parallel read method of self adjoint matrix
CN111158757A (en) * 2019-12-31 2020-05-15 深圳芯英科技有限公司 Parallel access device and method and chip
CN111158757B (en) * 2019-12-31 2021-11-30 中昊芯英(杭州)科技有限公司 Parallel access device and method and chip
CN112163187A (en) * 2020-11-18 2021-01-01 无锡江南计算技术研究所 Overlength point high performance FFT accounting device
CN112163187B (en) * 2020-11-18 2023-07-07 无锡江南计算技术研究所 Ultra-long point high-performance FFT (fast Fourier transform) computing device
CN112822139A (en) * 2021-02-04 2021-05-18 展讯半导体(成都)有限公司 Data input and data conversion method and device
CN113094639A (en) * 2021-03-15 2021-07-09 Oppo广东移动通信有限公司 DFT parallel processing method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN104699624B (en) 2018-01-23

Similar Documents

Publication Publication Date Title
CN104699624A (en) FFT (fast Fourier transform) parallel computing-oriented conflict-free storage access method
KR20220129107A (en) Matrix multiplier
US8364736B2 (en) Memory-based FFT/IFFT processor and design method for general sized memory-based FFT processor
CN102375805B (en) Vector processor-oriented FFT (Fast Fourier Transform) parallel computation method based on SIMD (Single Instruction Multiple Data)
US20220360428A1 (en) Method and Apparatus for Configuring a Reduced Instruction Set Computer Processor Architecture to Execute a Fully Homomorphic Encryption Algorithm
CN103970720B (en) Based on extensive coarseness imbedded reconfigurable system and its processing method
CN105335331B (en) A kind of SHA256 realization method and systems based on extensive coarseness reconfigurable processor
CN103970718A (en) Quick Fourier transformation implementation device and method
CN101847137B (en) FFT processor for realizing 2FFT-based calculation
CN101571849A (en) Fast Foourier transform processor and method thereof
CN103699515A (en) FFT (fast Fourier transform) parallel processing device and FFT parallel processing method
CN105224505A (en) Based on the FFT accelerator installation of matrix transpose operation
CN102495721A (en) Single instruction multiple data (SIMD) vector processor supporting fast Fourier transform (FFT) acceleration
CN103777896A (en) 3D memory based address generator
CN112446471B (en) Convolution acceleration method based on heterogeneous many-core processor
KR101696987B1 (en) Fft/dft reverse arrangement system and method and computing system thereof
CN103544111B (en) A kind of hybrid base FFT method based on real-time process
CN105718424B (en) A kind of parallel Fast Fourier Transform processing method
CN105183701A (en) 1536-point FFT processing mode and related equipment
CN104504205A (en) Parallelizing two-dimensional division method of symmetrical FIR (Finite Impulse Response) algorithm and hardware structure of parallelizing two-dimensional division method
CN102567283B (en) Method for small matrix inversion by using GPU (graphic processing unit)
CN104050148A (en) FFT accelerator
CN103034621A (en) Address mapping method and system of radix-2*K parallel FFT (fast Fourier transform) architecture
Hussain et al. Evaluation of Radix-2 and Radix-4 FFT processing on a reconfigurable platform
WO2021026196A1 (en) Configuring a reduced instruction set computer processor architecture to execute a fully homomorphic encryption algorithm

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant