SE539721C2 - Device and method for performing a Fourier transform on a three dimensional data set - Google Patents
Device and method for performing a Fourier transform on a three dimensional data set Download PDFInfo
- Publication number
- SE539721C2 SE539721C2 SE1450880A SE1450880A SE539721C2 SE 539721 C2 SE539721 C2 SE 539721C2 SE 1450880 A SE1450880 A SE 1450880A SE 1450880 A SE1450880 A SE 1450880A SE 539721 C2 SE539721 C2 SE 539721C2
- Authority
- SE
- Sweden
- Prior art keywords
- unit
- dimensional data
- memory
- data set
- dimensional
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/14—Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
- G06F17/141—Discrete Fourier transforms
- G06F17/142—Fast Fourier transforms, e.g. using a Cooley-Tukey type algorithm
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/76—Arrangements for rearranging, permuting or selecting data according to predetermined rules, independently of the content of the data
- G06F7/78—Arrangements for rearranging, permuting or selecting data according to predetermined rules, independently of the content of the data for changing the order of data flow, e.g. matrix transposition or LIFO buffers; Overflow or underflow handling therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/76—Arrangements for rearranging, permuting or selecting data according to predetermined rules, independently of the content of the data
- G06F7/78—Arrangements for rearranging, permuting or selecting data according to predetermined rules, independently of the content of the data for changing the order of data flow, e.g. matrix transposition or LIFO buffers; Overflow or underflow handling therefor
- G06F7/785—Arrangements for rearranging, permuting or selecting data according to predetermined rules, independently of the content of the data for changing the order of data flow, e.g. matrix transposition or LIFO buffers; Overflow or underflow handling therefor having a sequence of storage locations each being individually accessible for both enqueue and dequeue operations, e.g. using a RAM
Landscapes
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computational Mathematics (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Analysis (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- Discrete Mathematics (AREA)
- Complex Calculations (AREA)
Abstract
ABSTRACT The present invention relates to a device (100) and method for performing Fourier transformon a three dimensional data set. The device comprises a first one dimensional Fouriertransform unit (110) arranged to receive the three dimensional data set as one dimensionaldata blocks related to a first spatial dimension and to perform a Fourier transform, a first datapermutation unit (120) arranged to provide one dimensional data blocks related to a secondspatial dimension, a second one dimension Fourier transform unit (130) arranged to perform aFourier transform, a second data permutation unit (140) arranged to provide the threedimensional data set as one dimensional data blocks related to a third spatial dimension , anda third one dimension Fourier transform unit (150) arranged to perform a Fourier transform.The first or second data permutation unit (120, 140) is arranged to associate a delay to eachreceived sample determined so as to reorder read-out ofthe samples to provide the one dimensional data blocks related to the second or third spatial dimension. For publication: figure 1
Description
Device and method for performing a Fourier transform on a three dimensional data set TECHNICAL FIELD The present invention relates to a device and method for performing a Fourier transform on a three dimensional data set.BACKGROUND The three dimensional Fast Fourier Transform (3D FFT) is often calculated one dimension at a time. This is realized by applying three 1D FFTs, one in each dimension.
The problem when performing a 3D FFT in an environment which provides and demands afixed number of samples per time unit at the input and output respectively is the datamanagement involved when performing the data permutations required to constantly provide correct order of input samples to the different 1D FFT blocks.
Mario Garrido. ”Efficient Hardware Architectures for the Computation of the FFT and otherrelated Signal Processing Algorithms in Real Time." PhD thesis, Universidad Politecnica deMadrid, Spain, 2009 describes an existing design wherein 2D FFT is performed in that the firstpermutation which takes place between the first FFT and the second FFT is a simple matrixtransposition, that is, the rows and columns of a two dimensional matrix are changing place.This is a well-studied problem that often occurs in mathematics, and hardware that performs this permutation efficiently has been designed.
Also S. Langemeyer, P. Pirsch, and H. Blume. ”Using SDRAMs for two-dimensional accesses oflong 2” x 2m-point FFTs and transposing.", ln Proc. Int. Embedded Computer Systems (SAMOS)Conf, pages 242-248, 2011 describes an existing design 2D FFT is performed in that the firstpermutation which takes place between the first FFT and the second FFT is a simple matrix transposition, that is, the rows and columns of a two dimensional matrix are changing place.
Existing 3D FFT designs store three dimensional data on a memory and perform the required operations on data fetched iteratively from the memory, i.e. the algorithm is calculated by reading some data from the memory, process these data and writing back the result to thememory. This is done iteratively until all calculations ofthe algorithm have been carried out.This means that all or almost all data have to be available before calculation can start and thatno or almost no results are provided before being finished. This implies that there will be adelay while the memory is filled with input samples. Further, it will be necessary to read fromand write to the same memory locations various times which increases the amount ofmemory accesses. Further, there will also be a delay while the memory is emptied before nextcalculation can be started. The memory bottleneck of loading and unloading data to and fromthe same memory can be reduced by double buffering, which means that two memories areused instead of one in order to increase the memory throughput. lt will then be possible toaccess the two memories simultaneously so that the calculations can be performed using onememory while from the other the previous result is unloaded and later filled up with new datato be calculated. However, this solution almost doubles the amount of resources since thesememories form a major part of the design. ln addition, this iterative approach is suitable forprocessing data that arrives in bursts, but is not suitable for processing a continuous flow of data.
Chi-Li Yu, K. lrick, C. Chakrabarti and V. Narayanan. ”Multidimensional DFTIP generator forFPGA platforms." 58(4):755-764, 2011 describes an existing 3D FFT design which stores threedimensional data on a memory and carries out the required operations on data fetched iteratively from the memory as described above.
U. Nidhi, Kolin Paul, Ahmed Hemani and Ansul Kumar. ”High performance 3D-FFTimplementation." 2013 describes a way of partially tackling this problem when computing on asupercomputer or a network of processing elements. The solution is to load the data ontolocal memories of the different processing elements and to carry out the permutationbetween the first and second FFTs locally. Then, the permutation between the second andthird FFTs is performed with the aid of an interconnection network, wherein the permutationbetween the second and third FFTs is performed by sending data back and forth between thedifferent processing elements. This solution is suitable when all data are available at start ofcalculation. lf it is to be applied to real time processing the input data are first buffered and loaded onto the processing elements and then the next input data are buffered while performing all calculations. This approach requires one full size memory and a number ofsmaller memories since all processing elements need to store a portion ofthe data locally.Further, the throughput (samples per time unit) is very limited since all processing elementshave to finish before the next FFT calculation can be started and also the result has to beunloaded from all the processing elements putting tough constraints for high throughput on the interconnection network.
Further, in handling large amounts of data, an external memory would typically be used. ln anapplication as described herein, the memory would in many cases have the size of hundreds ofmegabytes. To enable high throughput, a fast memory would be desirable. These tworequirements of a both large and fast memory lead in the contemporary availablearchitectures to the use of dynamic memories. However, this class of memories has to bemanaged in a correct way in order not to lose data, and they typically present constraints onhow to access the data stored in them. Existing designs propose solutions for handling theseconstraints for matrix transposition but this problem has not been addressed for general permutations and permutations of more than two dimensions yet.
SUMMARY One object of the present invention is to improve the way of determining a Fourier transform.
This has in one embodiment been solved by means of a device for performing Fouriertransform on a three dimensional data set. The device comprises a first one dimensionalFourier transform unit, a first data permutation unit, a second one dimensional Fouriertransform unit, a second data permutation unit, and a third one dimensional Fourier transform unit.
The first one dimensional Fourier transform unit is arranged to receive the three dimensionaldata set as one dimensional data blocks related to a first spatial dimension and to perform a Fourier transform for the first spatial dimension based on the samples of the respective datablocks related to the first spatial dimension. The first data permutation unit is arranged to receive an output from the first one dimensional Fourier transform unit and to provide the three dimensional data set as one dimensional data blocks related to a second spatialdimension. The second one dimension Fourier transform unit is arranged to receive the onedimensional data blocks related to the second spatial dimension and to perform a Fouriertransform for the second spatial dimension based on the samples of the respective data blocksrelated to the second spatial dimension. The second data permutation unit is arranged toreceive an output from the second one dimensional Fourier transform unit and to provide thethree dimensional data set as one dimensional data blocks related to a third spatialdimension. The third one dimension Fourier transform unit is arranged to receive the onedimensional data blocks related to the third spatial dimension and to perform a Fouriertransform for the third spatial dimension based on the samples of the respective data blocks related to the third spatial dimension.
The first or second data permutation unit is arranged to associate a delay to each receivedsample determined so as to reorder read-out ofthe samples to provide the one dimensional data blocks related to the second or third spatial dimension.
The device is in one example used for calculating a 3D FFT. Each ofthe spatial dimensions of the three dimensional data set corresponds to one ofthe dimensions of the 3D FFT.
The device is in one example used for calculating a 2D FFT. ln accordance with this example,two of the spatial dimensions of the three dimensional data set correspond to one of the dimensions ofthe 2D FFT and the other spatial dimension ofthe three dimensional data setcorresponds to the other dimension of the 2D FFT. Therefore, the two dimensions of the 2D FFT are interpreted as a three dimensional space.
The device is in one example used for calculating a 1D FFT. ln accordance with this example, allthree spatial dimensions of the three dimensional data set correspond to the one dimensionof the 1D FFT. Therefore, the one dimension of the 1D FFT is interpreted as a three dimensional space.
The Fourier transform is for example a Fast Fourier Transform, FFT.
This permutation ofthe first or second permutation unit can in one example be visualized asrotating the three axes, so that if the order of the axis before the permutation was Y,X,Z, afterthe permutation the order will be Z,Y,X. This problem has been solved for a continuous flow of samples.
The presented solution is based on pipelined calculation which continuously calculates theresult. This makes it suitable for real time calculation since it can support continuous flow and the results do not need to be loaded to a memory before a new calculation can be carried out.
The provided solution is very resource effective. The design can easily be fitted together with other algorithms on the same FPGA or integrated circuit.
As a continuous flow is maintained through all the processing elements in the circuit, there arealways resources to calculate all three dimensions at all times. This is in contrast to knownprior art systems which performs 3D FFT on a distributed system where calculations areperformed in parallel on different processing elements, where all processing elements areinvolved in each dimension, preventing continuous flow. This difference is fundamental andpresent in all designs based on parallelization ofthe calculation of each dimension onto allprocessing elements in the design, which is the case for most (if not all) designs aimed for such networks of processing elements including supercomputers. ln one option, the first or second data permutation unit is arranged to reorder the samples so as to perform a three dimensional rotation.
This permutation ofthe first or second permutation unit can in one example be visualized asrotating the three axes, so that if the order of the axis before the permutation was Y,X,Z, afterthe permutation the order will be Z,Y,X. This problem has been solved for a continuous flow of samples. ln one option, each sample of the three dimensional data set is associated to an indexindicating its spatial position in the three dimensional data set and that the delay for the respective sample is determined in accordance with said index. ln one option, the first and/or second data permutation unit comprises at least one memory870. The memory is for example a random access memory or a set of them, a dynamic memory or a set of them, a set of registers, etc. ln one option, the first or second permutation unit further comprises a controller arranged tocontrol read-out from and writing to the at least one memory so that read-out of a samplefrom the memory and writing of a received sample is performed to the same memory position in the memory.
The memory bottleneck of loading and unloading data to and from the same memory hasbeen avoided by this solution by always reading and writing to the same location in the memory. ln one option, the controller further comprises a counter arranged to count a predeterminednumber of clock cycles determined based on the number of spatial positions in the threedimensional data set and wherein the controller is arranged to control writing to and read-outfrom the memory based on a counter signal from the counter. ln a case wherein several inputsare handled in parallel, the number of clock cycles is also determined based on the number of samples handled in parallel. ln one option, the controller further comprises an address mapper unit arranged to receivethe counter signal from the counter and to map the received counter signal to an associated memory position in the memory for reading and writing a sample. ln one option, the address mapping unit comprises a plurality of selectable mapping schemesand wherein the controller further comprises a mapping selector unit arranged to select onemapping scheme from the plurality of mapping schemes based on the counter signal and to provide a select signal indicating the selected mapping scheme. ln one option, the controller further comprises a multiplexer arranged to receive the selectsignal from the mapping selector unit and to control output from the address mapper unit in accordance with the received select signal. ln one option, the first or second permutation unit further comprises an auxiliary permutationunit arranged to re-order samples read out from the at least one memory so as to permute locked bits in the memory.
One advantage with this solution is that the device can be used together with dynamicmemories typically presenting constraints on how to access the data stored in them as thesolution performs the required three dimensional permutation between the second 1D FFTand the third 1D FFT, (Fig. 1a) or between the first 1D FFT and the second 1D FFT (Fig. 1b) insuch a way that it is both made sure the permutation is correct and that the memory timings and regulations are taken into account and fulfilled. ln one option, the device is arranged to handle parallel inputs. ln one option, the device is arranged to perform three dimensional Fourier transform on the tree dimensional data set. ln one option, the device is arranged to perform Fourier transform on a cubic three dimensional data set. ln one option, the device is arranged to perform three dimensional Fourier transform on the cubic three dimensional data set.
One embodiment of the disclosure relates to a method for performing Fourier transform on a three dimensional data set. The method comprises the steps of receiving the three dimensional data set as one dimensional data blocks related to a first spatial dimension performing a Fourier transform for the first spatial dimension based on the respective datablocks so as to provide the three dimensional data set Fourier transformed in the first dimension, providing the three dimensional data set as one dimensional data blocks related to a second spatial dimension, performing a Fourier transform for the second spatial dimension based on the respective datablocks so as to provide the three dimensional data set Fourier transformed in the first and second dimensions, providing the three dimensional data set as one dimensional data blocks related to a third spatial dimension, and performing a Fourier transform for the third spatial dimension based on the respective datablocks so as to provide the three dimensional data set Fourier transformed in the first, second and third spatial dimensions.
The step of providing the three dimensional data set as one dimensional data blocks related toa second or third spatial dimension comprises associating a delay to each received sampleaccording to a permutation scheme, wherein the permutation scheme is determined so as toreorder the samples to provide the one dimensional data blocks related to the second or third spatial dimension. ln one option, the step of providing three dimensional data set as one dimensional data blocksrelated to a second or third spatial dimension comprises a step of controlling read-out fromand writing to at least one memory so that read-out of a sample from the memory and writing of a received sample is performed to the same memory position in the memory. ln one option, the step of providing three dimensional data set as one dimensional data blocksrelated to a second or third spatial dimension comprises a step of providing a counter signalcounting a predetermined number of clock cycles determined based on the number of spatialpositions in the three dimensional data set, wherein writing to and read-out from the memory is controlled based on the counter signal. ln one option, the step of providing the three dimensional data set as one dimensional datablocks related to a second or third spatial dimension comprises a step of mapping the receivedcounter signal to an associated memory position in the memory for reading and writing a sample.
One embodiment of the disclosure relates to software for executing the steps of the methodfor performing Fourier transform on a three dimensional data set in accordance with the above.
BRIEF DESCRIPTION OF THE DRAWINGS Figs 1a, 1b, 1c and 1d show an overview of devices for performing a Fast Fourier Transform.
Fig 2 illustrates schematically an example of a three dimensional data set for transformation in device of Fig 1.
Fig 3 shows an example of a first FFT unit in a device of Fig 1.
Fig 4 shows an example of a first permutation unit in a device of Fig 1.
Fig 5 shows an example of a second FFT unit in a device of Fig 1.
Fig 6 shows an example of a second permutation unit in a device of Fig 1.
Fig 7 shows an example of a third FFT unit in a device of Fig 1.
Fig 8 shows detailed example of a second permutation unit in form of a 3D rotation unit.
Fig 9 illustrates schematically the operation of a second permutation unit in form of a 3D rotation unit without memory constraints.
Fig 10 illustrates schematically an example of an operation on the respective samples of athree dimensional data set in a second permutation unit in form of a 3D rotation unit without memOFy COHStFalHtS.
Fig 11 illustrates schematically an example of an operation on the respective samples of athree dimensional data set in a second permutation unit in form ofa 3D rotation unit with memOFy COHStFalHtS.
Fig 12 is a flow chart illustrating a method for performing Fourier transform on a three dimensional data set.
Fig 13 is a flow chart illustrating one example of a step in the method for performing a permutation with a memory.
Figs l4a. l4b and l4c illustrate schematically examples of different formats for the data set for input to a Fourier transform device.
DETAILED DESCRIPTION Figures la and lb re|ate to three dimensional Fast Fourier Transform,3D FFT, devices l00a,l00b arranged to calculate the three dimensional Fast Fourier Transform, 3D FFT, on a threedimensional data set, one dimension at a time. This is realized by applying three lD FFTs, onein each dimension, and providing the correct sequence of input data to the respective lD FFT so that the calculations are performed in the correct order.
The exemplified system of Fig la and lb aims for continuous flow applications. ln the 3D FFT of Fig la, all rows are translated to columns in a 2D permutation between thefirst and the second FFT. ln one example, the 2D permutation translates X, Y, Z to Y, X, Z.Further, all three dimensions are rotated between the second FFT and the third FFT. Thispermutation can be visualized as rotating the three axes, so that ifthe order of the axis before the permutation was Y,X,Z, after the permutation the order will be Z,Y,X. ln the 3D FFT of Fig lb, all three dimensions are rotated between the first FFT and the secondFFT. This permutation can be visualized as rotating the three axes, so that if the order of theaxis before the permutation was X,Y,Z, after the permutation the order will be Y,Z,X which is arotation in the opposite direction compared to Fig la. Further, all rows are translated tocolumns in a 2D permutation between the second and the third FFT. ln one example, the 2D permutation translates Y,Z,X to Z,Y,X.
The input to the system may be continuous flow serial input data from real time applicationssuch as e.g. motion detection in sequences of images or Synthetic Aperture Radar (SAR). The input to the system may also be data related to fluid dynamics, astrophysics, gene sequencing 11 and molecular dynamics. The system can also be used as an accelerator for ordinary PCs thatare performing 3D FFT. The dataset may be very large. The input may be configured in manyways. For example, it may have a single input where data come in series or several parallelinputs. ln both cases data arrives in a continuous flow, i.e., the system receives one sample per clock cycle at each of the inputs. ln the example of Figs 1a and b, the three dimensional fast Fourier transform system 100a and100b comprises a first 1D Fourier transform determination unit 110, a first permutation unit120, 140, a second 1D Fourier transform unit 130, a second permutation unit 140, 120 and a third 1D Fourier transform determination unit 150. ln one example, the first permutation unit 120 is a 2D translation unit and the secondpermutation unit is a 3D rotation unit 140. ln one alternative example, the first permutation unit 140 is a 3D rotation unit and the second permutation unit is a 2D translation unit 120. ln the following description, reference is made to the example wherein the first permutation unit 120 is a 2D translation unit and the second permutation unit is a 3D rotation unit 140.
The first one dimensional Fourier transform (determination) unit 110 is arranged to receivethe three dimensional data set as one dimensional data blocks related to a first spatialdimension. The first 1D Fourier transform (determination) unit 110 is arranged to performFourier transform for the first spatial dimension based on the respective data blocks and toprovide the three dimensional data set Fourier transformed in the first spatial dimension. The output from the first 1D Fourier transform unit 110 is fed to the first permutation unit 120.
The first permutation unit 120 is arranged to receive the three dimensional data set Fouriertransformed in the first spatial dimension as one dimensional data blocks related to the firstspatial dimension and to provide the received three dimensional data set as one dimensional data blocks related to a second spatial dimension.
The second one dimensional Fourier transform (determination) unit 130 is arranged to receive the three dimensional data set as one dimensional data blocks related to the second spatial 12 dimension. The second 1D Fourier transform (determination) unit 130 is arranged to performFourier transform for the second spatial dimension based on the respective data blocks and toprovide the three dimensional data set Fourier transformed in the second and first spatialdimension. The output from the second 1D Fourier transform unit 130 is fed to the second permutation unit 140.
The two dimensional and three dimensional permutations performed by the permutationunits 120, 140 ofthe system are adapted to the configuration ofthe input data. Further, thefirst and second 120, 140 permutation units may be connected differently in the system if the1D FFT units are adapted accordingly. Each block may then have a single input where datacome in series or several parallel inputs. ln both cases data arrives in a continuous flow, i.e., the system receives one sample per clock cycle at each ofthe inputs.
The first permutation unit 120 is arranged to operate in cooperation with one or a plurality of memories 160.
The second data permutation unit 140 is arranged to receive the three dimensional data setFourier transformed in the second spatial dimension as one dimensional data blocks related tothe second spatial dimension and provide the three dimensional data set as one dimensionaldata blocks related to a third spatial dimension. This will be discussed in detail later. Thesecond data permutation unit is arranged to operate in cooperation with one or a plurality of memories 170.
The third one dimensional Fourier transform (determination) unit 150 is arranged to receivethe three dimensional data set as one dimensional data blocks related to the second spatialdimension. The third 1D Fourier transform unit 150 is arranged to perform Fourier transformfor the third spatial dimension based on the respective data blocks and to provide the three dimensional data set Fourier transformed in the third, second and first spatial dimension. 13 Thus, a 3D FFT is provided by the first 1D Fourier transform unit 110, the first permutation unit120, the second 1D Fourier transformation unit 130, the second permutation unit 140 and the third 1D Fourier transform unit 150.The operation for computation ofthe three dimensional Fourier transform is in one exampleas follows. The formula of a 3D FFT for an input signal x[n1, nz] is: N1-1 N2-1 N3-1 . n kX[k11k2»k3]= Z Z Z Xlnpnzflla] eflzn Ãlll 9 71120 712 :Û T13=Û Tlzkz Tlgkg1 ” NZ e NS where N1 is the number of rows of the matrix, NZ is the number of columns, and Ng is thenumber of heights. The equation is defined for kl = 0, 1, ..., N1-1, k; = 0, 1, ..., N2-1 and k; = 0, 1, ..., N3-1.
The 3D FFT can be performed as an FFT of each row of the matrix followed by an FFT of each column and followed by an FFT of each height, which can be observed by rewriting equation above as: N3-1 N2-1 N1-1 k k k- H1 1 - H2 2 - ns 3-}2n -12n--- -}2n--Xlkpkzika] = Z Z Z xlnpnzfllfle Nl NZ 9 NS' n3=0 n2=0 n1=0 Therefore, it is possible to use three pipelined 1D FFT units for the computation ofthe 3D FFTin real time. One ofthem calculates the FFT of the rows, other one obtains the FFT of thecolumns and the third one calculates the FFT ofthe heights. Common FFTs can be used as the1D-FFTs for the computations ofthe respective blocks. ln addition to this, two permutationmodules are required. The first receives in one example the data row by row from the first FFTand provides them column by column, as required for the second FFT. This is equivalent totransposing the data matrix after the first FFT. The second permutation module receives datacolumn by column and provides data height by height by performing the rotation as discussed herein. ln Fig 2 an example of the format of an input data set is illustrated. ln the illustrated example each input data set comprises a 3D structure. The input data set forms a three dimensional 14 matrix, wherein each position in the matrix is associated with a value. The number of samples in each spatial direction is in one example a power of two. ln one example, the samples of data are input row-wise (herein shown as the x-dimension) foreach layer in the third dimension (herein shown as the z-dimension). The index informationrelated to the position of the respective sample in the data set is not required as an input. lnone example, starting information to start the computations can be provided before start of input ofthe data set or it can be provided in relation to each block.
Figs 14a, 14b and 14c illustrate different examples of data sets for input to a device forperforming Fourier transform on the data set. ln Fig 14a, the data set is a three dimensionaldata set having a first N1, a second N2 and a third N3 spatial dimension. ln Fig 14b, the threedimensional data set has a first N1, a second N2 and a third N3 spatial dimension, wherein thefirst and the second spatial dimensions correspond to one ofthe dimensions of a 2D FFT. ln Fig14c, the data three dimensional data set has a first N1, a second N2 and a third N3 spatialdimension, wherein the first, second and third spatial dimensions all correspond to the same dimension of a 1D FFT.
Figures 1c and 1d relate to Fast Fourier Transform FFT devices 100c and 100d arranged tocalculate the Fast Fourier Transform on a three dimensional data set, one dimension at a time.The calculation of the Fast Fourier transform is as described in relation to Figures 1a and 1brealized by applying three 1D FFTs, one in each dimension, and providing the correct sequenceof input data to the respective 1D FFT so that the calculations are performed in the correctorder. ln the same manner as the exemplified system of Fig 1a and 1b, the exemplified systemof Fig 1c and 1c aims for continuous flow applications.
The Fast Fourier Transform device 100c of Fig 1c has the same parts as the Fast FourierTransform device of Fig 1a or Fig 1b. ln order perform 1D FFT and/or 2D FFT, the FastTransform Fourier device 100c of Fig 1c is complemented with one or two rotation units 180a,180b. The respective rotation unit 180a, 180b is in one example arranged to calculate thetwiddle factors of the FFT. ln a case wherein the Fast Fourier Transform device is arranged tocalculate a 2D FFT, the Fast Fourier Transform device has one of a first rotation unit 180a and a second rotation unit 180b. The first rotation unit 180a, if present, is arranged between the first FFT unit 110 and the second FFT unit 130. The second rotation unit 180b, if present, isarranged between the second FFT unit 130 and the third FFT unit 150. ln a case wherein theFast Fourier Transform device is arranged to calculate a 1D FFT, the Fast Fourier Transformdevice has both the first rotation unit 180a and the second rotation unit 180b. The firstrotation unit 180a is arranged between the first FFT unit 110 and the second FFT unit 130. Thesecond rotation unit 180b is arranged between the second FFT unit 130 and the third FFT unit150. ln the i||ustrated example of Fig 1c, the first rotation unit 180a is arranged in the flowdirectly after the first FFT unit 110 and fed with the data from the first FFT unit 110. ln theillustrative example of Fig 1c, the second rotation unit 180b is arranged in the flow directly after the second FFT unit 130 and fed with the data from the second FFT unit 130.
The Fast Fourier Transform device 100d of Fig 1d has the same parts as the Fast FourierTransform device of Fig 1c. However, in the i||ustrated example of Fig 1d, the first rotation unit180a, if present, is arranged in the flow directly before the second FFT unit 130. ln theillustrative example of Fig 1d, the second rotation unit 180b, if present, is arranged in the flow directly before the third FFT unit 150. ln a not i||ustrated example, the first rotation unit 180a is arranged in the flow directly afterthe first FFT unit 110 and fed with the data from the first FFT unit 110 while the secondrotation unit is arranged in the flow directly before the third FFT unit 150. ln another noti||ustrated example, the first rotation unit 180a is arranged in the flow directly before thesecond FFT unit 130 while the second rotation unit 180b is arranged in the flow directly after the second FFT unit 130 and fed with the data from the second FFT unit 130. ln Fig 3, a first 1D Fourier transform determination unit 310 is arranged to determine a firstone dimensional Fourier transform unit arranged to receive the cubic three dimensional dataset as a one dimensional data blocks related to a first spatial dimension and to perform aFourier transform for the first spatial dimension based on the respective data blocks. The firstFourier transform unit 310 is in one example arranged to receive blocks row-wise (first spatialdimension), to perform the 1D Fourier transform on the respective received block and to output the results as row-wise blocks. 16 ln Fig 4, a first data permutation unit 420 in the form of a 2D transposition unit is arranged toprovide the cubic three dimensional data set as one dimensional data blocks related to asecond spatial dimension. ln the illustrated example, the three dimensional data set isreceived row-wise and outputted column-wise. The first data permutation unit is arranged to cooperate with a memory 460. ln one example, the first permutation unit is arranged to manage all the samples of each 2Dlayer ofthe 3D data set. ln one example it comprises 2D data sets formed by dimensions X andY. Its complexity is of order N1 x NZ, where N1 is the number of samples in the first dimensionand NZ the number of samples in the second dimension. Therefore, the memory 460 will takeup a large area in general. External memories (not shown) may be used for storing all thesamples. Due to real-time constraints, the system may receive a series of 2D data sets oneafter the other. Accordingly, the first permutation unit 120 may be capable of reading a 2Ddata set from the memory 160 and, at the same time, store samples ofthe following 2D data Set.
The access to the memory 460 may be limited. ln general, in small memories it is possible toread or write in any memory address. On the contrary, large memories are not so easilyaddressable. For instance, the access to external SDRAMs is performed by selecting a row andreading or writing samples in the columns of this row. A change in the row leads to animportant overhead due to the fact that several commands need to be executed on theSDRAM in order to change the active row, which is needed before new data can be read orwritten. Therefore, it may not be advisable to write rows of the 2D data set in rows of thememory, since data would eventually have to be read column by column. Thus, this casedemands a procedure for reading and writing that can efficiently use the memory 460. ln an alternative example, additional hardware is included.
Thirdly, under certain circumstances several memories 460 are used in parallel instead of asingle one. Considering that a memory characteristically achieves a throughput of one sampleper clock cycle, a group of memories is used when several data are received in parallel. lndeed, the throughput of a memory can be lower than one sample per clock cycle. This is for 17 example the case ofthe external memories discussed above, which include overheads due torefresh or to activation ofthe rows of the memory. ln this case, several memories in parallelcan be used in order to meet the throughput ofthe system. Apart from the throughputadjustment, several memories are also used when the size ofthe 2D data set is larger than that of a single memory.
Fourthly, the 2D data set to be transposed can be square but also non-square. Further, adesign problem can include any combination of the cited difficulties. For example, it could benecessary to design a circuit for the real-time transposition of a non-square 2D data set using several memories with access limitations with several inputs in parallel. ln Fig 5, a second 1D Fourier transform determination unit 530 is arranged to determine asecond one dimensional Fourier transform unit arranged to receive the three dimensionaldata set as a one dimensional data blocks related to the second spatial dimension and toperform a Fourier transform for the second spatial dimension based on the respective datablocks. The second Fourier transform unit 530 is in one example arranged to receive blockscolumn-wise (second spatial dimension), to perform the 1D Fourier transform on the respective received block and to output the results as column-wise blocks. ln Fig 6, a second permutation unit 640 in the form of a 3D rotation unit is arranged to providethe three dimensional data set as one dimensional data blocks related to a third spatialdimension. ln the illustrated example, the three dimensional data set is received column-wiseand outputted height-wise. The second data permutation unit 640 is arranged to cooperate with a memory 670.
The second permutation unit 640 is arranged to perform a 3D rotation. lt is arranged toperform the 3D rotation by storing values in the memory 670 using one mapping and readingthem out using another. This is performed by associating a delay to each received sample,wherein each delay is determined so as to reorder the samples to provide the one dimensional data blocks related to the third spatial dimension. 18 The second permutation unit 640 is further arranged to write new values into the locationsthat have just been read. ln this way it saves memory and thereby both area and power consumption. Further, double buffering is avoided.
The memory 670 can comprise several memories for the same reasons as presented in the description of Fig 4 for memory 460. ln Fig 7, a third 1D Fourier transform determination unit 750 is arranged to determine a thirdone dimensional Fourier transform unit 750 arranged to receive the three dimensional dataset as a one dimensional data blocks related to the third spatial dimension and to perform aFourier transform for the third spatial dimension based on the respective data blocks. Thethird Fourier transform unit 750 is in one example arranged to receive blocks height-wise(third spatial dimension), to perform the 1D Fourier transform on the respective received block and to output the results as height-wise blocks. ln Fig 8, a second permutation unit 840 in form of a 3D rotation unit is arranged to receive the3D data set as one dimensional data blocks related to the second or first spatial dimension,and to associate a delay to each received sample determined so as to reorder the samples toprovide the one dimensional data blocks related to the third spatial dimension. Accordingly, a three dimensional rotation of the data set is performed.
The second permutation unit 840 in form of a 3D rotation unit comprises a memory 870. Thememory is arranged to receive input data in the form of a three dimensional data set. Acontroller 845 is arranged to control read-out from and writing to the memory 870. ln oneexample, the controller 845 is arranged to control read-out from and writing to the memory 870 so that read-out of a sample from the memory and writing of a received sample is performed to the same memory position in the memory. This will be described in detail below.
The controller 845 comprises a counter 841 arranged to provide a count signal defining paceof operation of the second permutation unit. ln one example, the counter is arranged to provide the count signal at the same pace as samples arrive at the input. ln one example (notillustrated) a buffer is arranged to receive and buffer the received samples, or the samples to be sent to subsequent blocks, and guarantee reading or writing of data into the memory at 19 the pace given by the counter 841. The controller 845 is arranged to control writing to andread-out from the memory (170; 870) based on a counter signal from the counter 841. Thecounter 841 is arranged to count a predetermined number of clock cycles determined basedon the number of spatial positions in the three dimensional data set. |fthe data set is handledin parallel, the predetermined clock cycles is also determined based on the number of parallelinputs. The three dimensional data set can then be associated to an index defined by thecounter signal, said counter signal indicating the spatial position in the three dimensional dataset ofthe respective sample. The delay for the respective sample is determined in accordance with said index.
The controller 845 comprises further an address mapper unit 842 arranged to receive thecounter signal from the counter 841 and to map the received counter signal to an associatedmemory position in the memory 870 for reading and writing a sample. ln one example, theaddress mapper unit comprises one or a plurality of mapping schemes. Each mapping schemedefines a memory address position associated to each count value (i.e. index in the threedimensional data set). ln one example, the mapping schemes comprise logical mappingschemes mapping each bit position in the count signal to a predetermined bit position in thememory address. Thus, the mapping scheme determines the read/write pattern in the memory. ln one example, the controller 845 comprises further a mapping selector unit 844 arranged toselect one mapping scheme from the plurality of mapping schemes based on the countersignal. The controller 845 comprises further a multiplexer 843 arranged to receive the selectsignal from the mapping selector unit 844 and to control output from the address mapper unit 842 in accordance with the received selection signal. ln one example, the second permutation unit 840 further comprises an auxiliary permutationunit 846 arranged to re-order samples read out from the memory 870 so as to permute lockedindex bits in the memory. This may be necessary if the used memory is subjected toconstraints in the flexibility in writing/reading data and the needed permutation therefore cannot be fully performed directly on the memory.
The second permutation unit 840 in form of a 3D rotation unit is arranged to operate as follows. The address mapper unit receives the bits from the counter and permutes them into address mappings. The number of mappings and their design depend on the constraints of theselected memory. The current mapping is determined by the mapping selector which keepstrack on the number of frames or sets of 3D data that have been rotated. The mappingselector is arranged to keep track on the same amount of frames as there exists differentaddress mappings. The mapping selector is arranged to circulate among these mappings in accordance with a predetermined selection scheme. ln Fig 9a, an example input to a second permutation unit in form ofa 3D rotation unit which isarranged to receive the three dimensional data set as a one dimensional data blocks related tothe second spatial dimension. ln Fig 9b the output of a second permutation unit as onedimensional data blocks related to the third spatial dimension given the input in Fig 9a ispresented. Accordingly, a three dimensional rotation of the data set is performed by this example ofthe second permutation unit.
The examples illustrated in Figs 9a, 9b are simplified as the set of data comprises 4 samples ineach dimension. Thus, each data in the 3D data has a relation in the data set described by twobits in each dimension. Thus, in order to index each sample of the data set, six bits are required. ln Fig 10, reading out from and writing to a memory is illustrated when performed on anunconstrained memory. ln the illustrated memory, the size of each dimension is 256 andhence the number of bits needed to index all samples is 8x3=24. The memory permutation isexecuted by writing data in a specific order and then read it in a different order so that theresulting output order is the wanted order. ln order to maintain high memory bandwidth andsample throughput, new data is written to the same location as the permuted values are read.Thereby, use of double buffers can be avoided and area and power consumption can be reduced. ln the illustrated example, the memory receives one dimensional data blocks sorted as described in relation to Figure 9a and outputted as described in relation to Figure 9b. 21 ln Fig 11, a memory permutation is illustrated performed on a memory with three bits lockeddue to access constraints (indexes 10-8). ln the illustrated memory the size of each dimensionis 256 and hence the number of bits needed to index all samples is 8x3=24. The memorypermutation is executed by writing data in a specific order and then reading it in a differentorder so that the resulting output is in the wanted order, or as close to the wanted order aspossible. lf it is not possible to achieve the full permutation on the memory, an auxiliarypermutation 846 or correction circuit can be fed with the output from the memory. Theaddress mapper is then arranged to provide as good mapping as possible given the constraintsof the memory. Thus, some bits are locked in the memory and cannot be permuted by the address mapper. Those bits are permuted by the auxiliary permutation unit.
The auxiliary permutation unit is then arranged to permute the output samples which are not permuted by the address mapper.
Figure 12 illustrates an example of a method 200 for performing a three dimensional, 3D,Fourier transform on a three dimensional data set. The method comprises the steps ofreceiving S10 the three dimensional data set as a one dimensional data blocks related to a firstspatial dimension, performing S20 a Fourier transform for the first spatial dimension based onthe respective data blocks, providing S30 the three dimensional data set as one dimensionaldata blocks related to a second spatial dimension, performing S40 a Fourier transform for thesecond spatial dimension based on the respective data blocks, providing S50 the threedimensional data set as one dimensional data blocks related to a third spatial dimension, andperforming S60 a Fourier transform for the third spatial dimension based on the respectivedata blocks. The step of providing S50 three dimensional data set as one dimensional datablocks related to a third spatial dimension comprises in the illustrated example associating adelay to each received sample according to a permutation scheme, wherein the permutationscheme is determined so as to reorder the samples to provide the one dimensional datablocks related to the third spatial dimension. ln a not illustrated example, instead, the step ofproviding S30 three dimensional data set as one dimensional data blocks related to a secondspatial dimension comprises associating a delay to each received sample according to apermutation scheme, wherein the permutation scheme is determined so as to reorder the samples to provide the one dimensional data blocks related to the second spatial dimension. 22 ln Figure 13, a step of providing S50 three dimensional data set as one dimensional data blocks related to a third spatial dimension in a method for performing three dimensional, 3D,Fourier transform on a three dimensional data set comprises the following steps: A step of providing S51 a counter signal that counts a number of clock cycles determined bythe number of spatial positions in the three dimensional data set, wherein writing to and read-out from the memory (170; 870) is controlled based on the counter signal bits. Here Cmax is thenumber of total memory accesses in one 3D data set and Nmax is the total number of mappings.
A step of mapping S52 the received counter signal to an associated memory position in the memory for reading and writing a sample.A step of controlling S53 read-out from and writing to at least one memory so that read-out ofa stored sample and writing of a received sample is performed to the same position in the memory.
Those steps are repeated until all the input data has been processed.
Claims (1)
1. CLAll\/IS Device (100) for performing Fourier transform on a three dimensional data setcomprising a first one dimensional Fourier transform unit (110) arranged to receive thethree dimensional data set as a one dimensional data blocks related to a first spatialdimension and to perform a Fourier transform for the first spatial dimension based onthe samples of the respective data blocks related to the first spatial dimension, a first data permutation unit (120) arranged to receive an output from the firstone dimensional Fourier transform unit (110) and to provide the three dimensional dataset as one dimensional data blocks related to a second spatial dimension, a second one dimension Fourier transform unit (130) arranged to receive the onedimensional data blocks related to the second spatial dimension and to perform aFourier transform for the second spatial dimension based on the samples of therespective data blocks related to the second spatial dimension, a second data permutation unit (140) arranged to receive an output from thesecond one dimensional Fourier transform unit (130) and to provide the threedimensional data set as one dimensional data blocks related to a third spatial dimension,and a third one dimension Fourier transform unit (150) arranged to receive the onedimensional data blocks related to the third spatial dimension and to perform a Fouriertransform for the third spatial dimension based on the samples of the respective datablocks related to the third spatial dimension, wherein the first or second data permutation unit ( __140) is arranged to associate a delay to each received sample determined according to a permutationscheme, wherein the permutation scheme is determined so as to reorder read-out ofthe samples to provide the one dimensional data blocks related to the second or third spatial dimension, wherein the first and/or second data permutation unit comprises at least one memory (160, 170,- 870), and 2wherein the first or second permutation unit (140; 840) further comprises acontroller (845) arranged to control read-out from and writing to the at least onememory (170; 870) so that read-out of a sample from the memory and writing of a received sample is performed to the same memory position in the memory. Device according to claim 1, wherein the first or second data permutation unit (140) is arranged to reorder the samples so as to perform a three dimensional rotation. Device according to any ofthe preceding claims, wherein each sample of the threedimensional data set is associated to an index indicating its spatial position in the threedimensional data set and that the delay for the respective sample is determined in accordance with said index. Device according to any ofthe preceding claims, wherein the controller (845) furthercomprises a counter (841) arranged to count a predetermined number of clock cyclesdetermined based on the number of spatial positions in the three dimensional data setand wherein the controller (845) is arranged to control writing to and read-out from the memory (170; 870) based on a counter signal from the counter (841). Device according to any ofthe preceding claims, wherein the controller (845) furthercomprises an address mapper unit (842) arranged to receive the counter signal from thecounter (841) and to map the received counter signal to an associated memory position in the memory (170; 870) for reading and writing a sample. Device according to claim 5, wherein the address mapping unit (842) comprises aplurality of selectable mapping schemes and wherein the controller (845) furthercomprises a mapping selector unit (844) arranged to select one mapping scheme fromthe plurality of mapping schemes based on the counter signal and to provide a select signal indicating the selected mapping scheme. Device according to claim 6, wherein the controller (845) further comprises a multiplexer (843) arranged to receive the select signal from the mapping selector unit (844) and to 10. 3control output from the address mapper unit (842) in accordance with the received selection signal. Device according to any of the preceding claims, wherein the at least one memorycomprises a dynamic memory presenting constraints on how to access data stored on itand wherein the first or second permutation unit (120; 140; 840) further comprises anauxiliary permutation unit (846) arranged to re-order samples read out from the at least one memory (170; 870) so as to permute locked bits in the memory. Device according to any of the preceding claims, wherein the device is arranged to handle parallel inputs. A software implemented method (200) for performing Fourier transform on a threedimensional data set comprising receiving (S10) the three dimensional data set as one dimensional data blocksrelated to a first spatial dimension performing (S20) a Fourier transform for the first spatial dimension based on therespective data blocks so as to provide the three dimensional data set Fouriertransformed in the first dimension, providing (S30) the three dimensional data set as one dimensional data blocksrelated to a second spatial dimension, performing (S40) a Fourier transform for the second spatial dimension based onthe respective data blocks so as to provide the three dimensional data set Fouriertransformed in the first and second dimensions, providing (S50) the three dimensional data set as one dimensional data blocksrelated to a third spatial dimension , and performing (S60) a Fourier transform for the third spatial dimension based onthe respective data blocks so as to provide the three dimensional data set Fouriertransformed in the first, second and third dimensions, wherein the step of providing the three dimensional data set as one dimensionaldata blocks related to a second or third spatial dimension comprises associating a delay to each received sample according to a permutation scheme, wherein the permutation 11. 12. 1 3 i; 4 scheme is determined so as to reorder the samples to provide the one dimensional datablocks related to the second or third spatial dimension and wherein the step ofproviding (S30, S50) the three dimensional data set as one dimensional data blocksrelated to a second or third spatial dimension comprises a step of controlling (S53) read-out from and writing to at least one memory (170; 870) so that read-out of a samplefrom the memory and writing of a received sample is performed to the same memory position in the memory. Method according to claim 10, wherein the step of providing (S30, S50) the threedimensional data set as one dimensional data blocks related to a second or third spatialdimension comprises a step of providing (S51) a counter signal counting apredetermined number of clock cycles determined based on the number of spatialpositions in the three dimensional data set, wherein writing to and read-out from the memory (170; 870) is controlled based on the counter signal. Method according to claim 11, wherein the step of providing (S50) three dimensionaldata set as one dimensional data blocks related to a second or third spatial dimensioncomprises a step of mapping (S52) the received counter signal to an associated memory position in the memory for reading and writing a sample. Software for executing the steps of the me>;:_hod for performing Fourier transform on athree dimensional data set in accordance with any ofthe claims ~“~10-12.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
SE1450880A SE539721C2 (en) | 2014-07-09 | 2014-07-09 | Device and method for performing a Fourier transform on a three dimensional data set |
PCT/SE2015/050689 WO2016007069A1 (en) | 2014-07-09 | 2015-06-15 | Device and method for performing a fourier transform on a three dimensional data set |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
SE1450880A SE539721C2 (en) | 2014-07-09 | 2014-07-09 | Device and method for performing a Fourier transform on a three dimensional data set |
Publications (2)
Publication Number | Publication Date |
---|---|
SE1450880A1 SE1450880A1 (en) | 2016-01-10 |
SE539721C2 true SE539721C2 (en) | 2017-11-07 |
Family
ID=55064563
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
SE1450880A SE539721C2 (en) | 2014-07-09 | 2014-07-09 | Device and method for performing a Fourier transform on a three dimensional data set |
Country Status (2)
Country | Link |
---|---|
SE (1) | SE539721C2 (en) |
WO (1) | WO2016007069A1 (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018213438A1 (en) * | 2017-05-16 | 2018-11-22 | Jaber Technology Holdings Us Inc. | Apparatus and methods of providing efficient data parallelization for multi-dimensional ffts |
GB2620473B (en) * | 2022-04-22 | 2024-10-16 | Advanced Risc Mach Ltd | A Method for permuting dimensions of a multi-dimensional tensor |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5339265A (en) * | 1992-08-31 | 1994-08-16 | University Of Maryland At College Park | Optimal unified architectures for the real-time computation of time-recursive discrete sinusoidal transforms |
US7392275B2 (en) * | 1998-03-31 | 2008-06-24 | Intel Corporation | Method and apparatus for performing efficient transformations with horizontal addition and subtraction |
US6073154A (en) * | 1998-06-26 | 2000-06-06 | Xilinx, Inc. | Computing multidimensional DFTs in FPGA |
ES2283236B2 (en) * | 2007-04-12 | 2008-03-16 | Universidad Politecnica De Madrid | PROCEDURE AND ARCHITECTURE WITHOUT MEMORY FOR THE CALCULATION OF FFT ROTATIONS. |
US8539201B2 (en) * | 2009-11-04 | 2013-09-17 | International Business Machines Corporation | Transposing array data on SIMD multi-core processor architectures |
US9203671B2 (en) * | 2012-10-10 | 2015-12-01 | Altera Corporation | 3D memory based address generator for computationally efficient architectures |
-
2014
- 2014-07-09 SE SE1450880A patent/SE539721C2/en unknown
-
2015
- 2015-06-15 WO PCT/SE2015/050689 patent/WO2016007069A1/en active Application Filing
Also Published As
Publication number | Publication date |
---|---|
WO2016007069A1 (en) | 2016-01-14 |
SE1450880A1 (en) | 2016-01-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10255547B2 (en) | Indirectly accessing sample data to perform multi-convolution operations in a parallel processing system | |
US11120329B2 (en) | Multicast network and memory transfer optimizations for neural network hardware acceleration | |
US10223333B2 (en) | Performing multi-convolution operations in a parallel processing system | |
US8539201B2 (en) | Transposing array data on SIMD multi-core processor architectures | |
US20120127818A1 (en) | Sharing access to a memory among clients | |
CN111209232B (en) | Method, apparatus, device and storage medium for accessing static random access memory | |
US9413358B2 (en) | Forward counter block | |
US20070011442A1 (en) | Systems and methods of providing indexed load and store operations in a dual-mode computer processing environment | |
US9582420B2 (en) | Programmable memory mapping scheme with interleave properties | |
US11403173B2 (en) | Multiple read and write port memory | |
US20190164254A1 (en) | Processor and method for scaling image | |
US20220129523A1 (en) | Method, circuit, and soc for performing matrix multiplication operation | |
US10884657B2 (en) | Computing device within memory processing and narrow data ports | |
KR20190063393A (en) | Apparatus for processing convolutional neural network using systolic array and method thereof | |
CN111310115B (en) | Data processing method and device, chip, electronic equipment and storage medium | |
US11934827B2 (en) | Partition and isolation of a processing-in-memory (PIM) device | |
SE539721C2 (en) | Device and method for performing a Fourier transform on a three dimensional data set | |
US9195622B1 (en) | Multi-port memory that supports multiple simultaneous write operations | |
US20030172245A1 (en) | Data processing method and device for parallel stride access | |
US9880974B2 (en) | Folded butterfly module, pipelined FFT processor using the same, and control method of the same | |
CN109445852B (en) | Method and system for improving memory access efficiency in multi-core processor | |
US20240005445A1 (en) | System and method for high-throughput image processing | |
US11531497B2 (en) | Data scheduling register tree for radix-2 FFT architecture | |
US11556337B2 (en) | Parallel matrix multiplication technique optimized for memory fetches | |
US20220207323A1 (en) | Architecture and cluster of processing elements and operating method for convolution |