WO2014108718A1 - Transformation de fourier rapide à bases mélangées exempte de conflit en flux continu dans une mémoire multibloc - Google Patents

Transformation de fourier rapide à bases mélangées exempte de conflit en flux continu dans une mémoire multibloc Download PDF

Info

Publication number
WO2014108718A1
WO2014108718A1 PCT/IB2013/000446 IB2013000446W WO2014108718A1 WO 2014108718 A1 WO2014108718 A1 WO 2014108718A1 IB 2013000446 W IB2013000446 W IB 2013000446W WO 2014108718 A1 WO2014108718 A1 WO 2014108718A1
Authority
WO
WIPO (PCT)
Prior art keywords
memory
data
radix
butterfly
processor
Prior art date
Application number
PCT/IB2013/000446
Other languages
English (en)
Inventor
Sergey I. SALISHEV
Original Assignee
Intel Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Corporation filed Critical Intel Corporation
Priority to PCT/IB2013/000446 priority Critical patent/WO2014108718A1/fr
Publication of WO2014108718A1 publication Critical patent/WO2014108718A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/14Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
    • G06F17/141Discrete Fourier transforms
    • G06F17/142Fast Fourier transforms, e.g. using a Cooley-Tukey type algorithm

Definitions

  • the present disclosure relates to continuous-flow conflict-free mixed-radix fast Fourier transform (FFT) in multi-bank memory, and in particular to methods of performing FFT by launching multiple butterfly stage operations simultaneously using multiple memory banks to maximize use of memory space during mixed-radix FFT, in order to reduce circuit space, clock, and power requirements.
  • FFT mixed-radix fast Fourier transform
  • Digital signal processing tasks may be performed by a Digital Signal Processor (DSP) in various types of applications, such as communications, video and audio processing, financial analysis, biological data analysis, and environmental sciences.
  • DSP Digital Signal Processor
  • a DSP may be a specialized microprocessor.
  • FFT operations may be used to process signals in time or frequency domains in such applications. FFT operations may include Decimation in Time (DIT) and Decimation in Frequency (DIF) decomposition operations.
  • DIT Decimation in Time
  • DIF Decimation in Frequency
  • FFT operations may be performed on data entries stored in a memory.
  • the DSP may perform multiple stages of multiply-accumulate operations and data transposition operations on the data entries. These stages are sometimes called “butterflies”.
  • Each butterfly may have a base- size (radix).
  • a FFT using butterflies of base-2 may be a radix-2 FFT.
  • a FFT having butterflies of two different base sizes may be called "mixed radix" FFT.
  • FFT operations may be implemented on a software level in the DSP, or using specialized hardware architecture in the DSP. Performance of the DSP in the various applications depends on the performance of the FFT operations, which may depend on various factors. For example, data processed through the FFT operations may typically be stored in memory during processing. Thus, the memory space required and the timing of memory read and write operations may impact the overall performance and cost of the DSP.
  • FIG. 1 illustrates an exemplary method of processing data according to an embodiment of the present disclosure.
  • FIG. 2 illustrates an exemplary processing device according to an embodiment of the present disclosure.
  • FIG. 3 illustrates an exemplary processing device according to an embodiment of the present disclosure.
  • FIG. 4 illustrates an exemplary processing device according to an embodiment of the present disclosure.
  • FIG. 5 illustrates an exemplary processing device according to an embodiment of the present disclosure.
  • FIG. 6 illustrates an exemplary processing device according to an embodiment of the present disclosure.
  • a method and a processor to perform continuous-flow conflict-free mixed-radix FFT for data in a memory are provided. Multiple butterfly calculations of small radix are launched generally in parallel in mixed-radix FFT using conflict-free address generation with a memory. The multiple butterfly calculations of data entries may be staged in a processor, such that the memory read and write operations may be executed continuously without access conflicts.
  • the mixed-radix FFT operation may be carried out with maximum memory data through-put, minimum wait time, and less costs in memory circuit space and power.
  • a dual-port memory architecture may be used.
  • a single-port memory architecture may be provided with an in-place strategy, further reducing port routing circuit space requirement.
  • a common approach to FFT processor architecture may be an "in-place" memory- based FFT. Use of this approach guarantees that for each butterfly or group of butterflies both inputs and results are stored in the same memory locations. For example, a FFT of data points sampled at N points may use a memory with N complex words capacity. [0018] One butterfly calculation may be initiated every clock to maximize memory throughput for a given butterfly size. Each wing of the butterfly may be read and written at different memory banks and addresses, using conflict-free bank assignment.
  • the FFT calculation may use a radix-i? butterfly operation to calculate multiple radix-r butterflies simultaneously.
  • the FFT operation may result in input data and output data having different digit orders.
  • a digit reverse operation may be performed near the beginning or the end of the FFT operation.
  • Self-sorting may eliminate the need for separate a digit reverse operation in the FFT, and may increase the speed of the FFT.
  • FIG. 1 illustrates an exemplary method 100 of processing data according to an embodiment of the present disclosure.
  • the method 100 begins at block 110, by generating memory addresses and traversal order for data according to mixed-radix settings.
  • the method proceeds to block 120, reading the data from the memory according to the generated memory addresses in the traversal order.
  • the method proceeds to block 130, processing the data of more than one butterfly stage operations of the FFT.
  • the method proceeds to block 140, if self-sorting is needed, performing self-sorting on butterfly stages that need sorting, and apply any delays needed to avoid memory conflict.
  • the method proceeds to block 150, writing the processed data of more than one butterfly stage operations into memory.
  • the method proceeds to block 160, determining if all butterfly stages completed processing. If yes, the method ends at block 170. If no, the method returns to block 120 to read additional data as needed for additional butterfly stage operations.
  • FIG. 2 illustrates an exemplary processing device 200 according to an embodiment of the present disclosure.
  • a processing device 200 is described.
  • the processing device 200 may be connected to a memory 220 storing N entries of complex data points for processing.
  • the processing device 200 may include an address generator unit (AGU) 210 that generates the memory address assignments and a traverse order for the data according to the mixed-radix settings.
  • the AGU 210 is connected to an interface 280, which reads the data from the memory 220 according to the memory address assignments in the traversal order generated by the AGU 210 and writes the processed data back into the memory 220.
  • the AGU 210 is connected to a processor (PU) 240, which processes the data of more than one butterfly stage operations of the FFT, prior to the interface 280 writing the processed data back to the memory 220.
  • PU processor
  • a FFT operation may be implemented as two nested loops, with an outer loop iterating over stages c, and an inner loop iterating over butterflies (or butterfly groups for stages) with multiple butterflies executing simultaneously within one stage.
  • FFT(k n . lt .... k 0 ) may represent the result of a FFT operation on input indexed
  • a radix-r butterfly operation may be represented as,
  • f c +i([ d ]o.n-c-2. k e . [A.] c _i.o ) may represent stage c output of [l d .n-c-2 > k c * MC-LO] 5 where k « may represent already processed digits and d t are digits that are yet to be processed, where k, ⁇ r l , d l ⁇ t F 0 (d 0 , .... « ' perennial_!> are input sampled data points. Then
  • the FFT stage formula may be represented as:
  • frc M -l o 3 ⁇ 4c( w 0» '— w 'rn-t-i-l),
  • the FFT stage formula may be represented as:
  • DIT decompositions may lead to digit reverse order of the input data points
  • DIF decomposition may lead to digit reverse order of the output data points.
  • DIF and DIT may differ above in whether multiplication by twiddle factors is performed before or after the butterfly operation.
  • a radix- r c butterfly in stage c utilizes inputs of the index number - ⁇ ' ⁇ - ⁇ > ⁇ — k e . k e . 1 , ..., k 0 ' ] f where k e varies from 0 to r c — 1 .
  • the radix- r c butterfly in stage c may be represented as [k n _ it .... A.- c+1 , k e _ lt .... k 0 ] .
  • memory 220 may be a random access memory (RAM).
  • RAM random access memory
  • memory 220 may be a multi- bank memory with r n-i banks to allow pipelining butterfly execution.
  • a memory having multiple memory banks may have independent I/O ports and buses for each memory bank, such that multiple memory banks may be accessed (for example, in read and write operations) concurrently.
  • each of the multiple memory banks may be a group of memory locations, and memory 220 may allow generally simultaneous access of multiple memory banks, by encoding, aggregating, staggering, or interleaving accesses on shared memory I/O ports and buses.
  • PU 240 may include a processor with general processing capabilities, or specialized hardware. PU 240 may process the data sequentially, in parallel, staggered, interleaved, or in various process to prioritize between multiple butterfly stage operations, to maximize data throughput and minimize a waiting period for the memory or the processor, without having to increase overall circuit or power or clocking speed.
  • the memory banks in the embodiments of the present disclosure may include any of the above and other possible grouping of memory locations.
  • Memory bank assignments in the embodiments of the present disclosure may include any memory group identification, indexes, addresses, or labels, that may be used for controlling access to a group of memory locations.
  • Each radix-*" butterfly operation may include r memory reads and J * memory writes.
  • Memory bank and address assignments may be generated depending on the number of sampled data points, and adjusted in run-time.
  • i(fc n -n .... k 0 ) may represent bank assignment and oW-'n-u ⁇ . k 0 ) may represent address assignment within the bank for butterfly index number [k n -t> ⁇ «. k B ].
  • stage — 1 obviously has only one butterfly run simultaneously, because only r n-i memory banks are available.
  • the inner loop may iterate over butterfly
  • [0048] ' may represent the k c+1 'th butterfly executed in [fcn--.' "" ⁇ c+2 ' fc c+i' ⁇ c-i'—' ⁇ o] 'th iteration of loop iterating over butterfly groups in stage c 5 where I 3 ⁇ 4 I L J
  • fcc+i may be represented as being split into [ fc c+i « fc e+i] , and k c+1 is used as a part of butterfly group index number, while k ' e-t is used to enumerate butterflies within the group.
  • the traverse order (the sequence order of the N data sample entries in the memory to process) for all stages may be represented as:
  • W c ([fcji-i» " ' » ⁇ o] ) may represent the memory bank assignment for use in iteration 0 f butterfly loop in stage ,
  • ⁇ ? radix- r c butterflies may run in parallel, using the multiple memory banks, may be represented as:
  • a conflict-free bank assignment that allows multiple butterflies of small radix stages generally simultaneously in a mixed-radix FFT operation with traverse order may be represented as: m(fc n _ 1 ,... : 0 ) ' k t J mod r n _ x
  • Qi may represent constants that depend on radixes chosen for the stages of the FFT operation.
  • FIG. 3 illustrates an exemplary processing device 300 according to an embodiment of the present disclosure.
  • a processing device 300 that performs a mixed-radix FFT using a dual-port multi-bank memory is described.
  • the processing device 300 may be connected to a memory 320 with (R number of) multiple memory banks 320.0 to 320.R-1, containing memory capacity for storing N entries of complex data points for processing.
  • the processing device 300 may include an address generator (AGU) 310 that generates the memory address assignments and a traverse order for the data according to the mixed-radix settings.
  • the AGU 310 is connected to an interface 380, which reads the data from the memory 320 according to the memory address assignments in the traversal order generated by the AGU 310 and writes the processed data back into the memory 320.
  • the AGU 310 is connected to a processor (PU) 340, which processes the data of more than one butterfly stage operations of the FFT, prior to the interface 380 writing the processed data back to the memory 320.
  • PU processor
  • Memory 320 here may be a dual-port memory, with one set of input (write) port and another set of output (read) port, which may allow memory 320 to perform one read operation and one write operation concurrently or generally simultaneously (for example, in a single clock period).
  • the FFT operation may be performed using the processing device 300, by implementing addressing strategy that allows execution of butterflies simultaneously in radix-*" stage. Executing multiple butterflies simultaneously allows the FFT operation to access multiple memory banks generally simultaneously, and stage the parallel calculations in PU 340 generally simultaneously, to reduce waiting time associated with sequential processing of butterflies in the FFT. This makes radix-* " calculation ⁇ ? times faster in speed performance.
  • the AGU 310 may use traverse order Tc , which may be represented as:
  • bank assignment may be represented as: to provide conflict-free memory access.
  • values of « and *" above may be adjusted at run-time to use one FFT processing device to calculate transforms (and reverse transforms) of different sizes. For example, depending on the size of the data sample N, available memory banks, available memory I/O ports or I/O bandwidth, processor speed, or other factors, values of " and *" may be adjusted at run-time to maximize the data throughput in the processing device, and to minimize a waiting period for the memory or the processor, without having to increase overall circuit or power or clocking speed.
  • PU 340 may include a processor with general processing capabilities, or specialized hardware. PU 340 may process the data sequentially, in parallel, staggered, interleaved, or in various process to prioritize between multiple butterfly stage operations, to maximize data throughput and minimize waiting period for the memory or the processor, without having to increase overall circuit or power or clocking speed.
  • Table 1 illustrates simulated performance gain in FFT operation using the above method and processing device.
  • FIG. 4 illustrates an exemplary processing device 400 according to an embodiment of the present disclosure.
  • a processing device 400 that performs a mixed-radix FFT with self-sorting using a dual-port multi-bank memory is described.
  • the processing device 400 may be connected to a memory 420 with (R number of) multiple memory banks 420.0 to 420.R-1, containing memory capacity for storing N entries of complex data points for processing.
  • the processing device 400 may include an address generator (AGU) 410 that generates the memory address assignments and a traverse order for the data according to the mixed-radix settings.
  • the AGU 410 is connected to an interface 480, which reads the data from the memory 420 according to the memory address assignments in the traversal order generated by the AGU 410 and writes the processed data back into the memory 420.
  • AGU address generator
  • the AGU 410 is connected to a processor (PU) 440, which processes the data of more than one butterfly stage operations of the FFT, prior to the interface 480 writing the processed data back to the memory 420. Additionally, a pipeline 450 connects the input interface 480 to the PU 440.
  • PU processor
  • Memory 420 here may be a dual-port memory, with one set of inputs (write) port and another set of outputs (read) port, which may allow memory 420 to perform one read operation and one write operation concurrently or generally simultaneously (for example, in a single clock period).
  • the FFT operation may be performed using the processing device 400, by
  • DIT and DIF may lead to input or output data having reversed digit order. In order to obtain proper result, a digit reverse operation may need to be performed.
  • a digit reverse operation may be incorporated into the operation of the processing device such that a separate digit reverse operation may not be required.
  • the processing device of the embodiment may launch multiple butterflies in radix- 1" stage generally simultaneously.
  • the AGU 410 may use bank assignment, which may be represented as:
  • T e which may be represented as:
  • stage c where c ⁇ n - 1 > butterfly with input of data may have outputs stored in memory addresses calculated for data indexed
  • the second 2 stages may use traverse order 7 * c t which may be represented as:
  • the output transposition may be accomplished by delaying the write operations in the butterfly stages that perform the digit reverse operations above.
  • stages performing digit reverse operations are not in-place. Thus, it may need to be ensured that during various stage
  • a memory location is written only after it is read by a butterfly.
  • the correct order of read and write operations may be guaranteed by reordering butterflies within the stage, so that all butterflies with overlapping data index values of k n -i>— ⁇ 3 ⁇ 4c+i « kc-u * ⁇ A- ' n-c kn-e- ⁇ fr n _ c _3, .... A: 0 are executed sequentially in one batch, and adding pipeline 450 with delays to postpone write operations for R - V clocks, where V is pipeline delay length. Since write operations of butterflies from one batch can only change data values already read in the same batch and the butterfly loop is pipelined, the correct read and write order may be ensured.
  • butterfly stage operations may need to have write operations delayed
  • parallel execution of multiple butterfly stage operations may increase the overall FFT operation speed.
  • pipeline 450 may include any hardware and/or software component to postpone write operations for a predetermined number of clock periods.
  • pipeline 450 may include software loop delays, or hardware components, such as flip-flops, buffers, etc., capable of postponing transfer of data.
  • Pipeline 450 may also be located anywhere along the read or write paths between memory 420, interface 480, and PU 440.
  • values of « and r above may be adjusted at run-time to use one FFT processing device to calculate transforms (and reverse transforms) of different sizes. For example, depending on the size of the data sample N, available memory banks, available memory I/O ports or I/O bandwidth, processor speed, or other factors, values of « and f may be adjusted at run-time to maximize the data throughput in the processing device, and to minimize waiting period for the memory or the processor, without having to increase overall circuit or power or clocking speed.
  • PU 440 may include a processor with general processing capabilities, or specialized hardware. PU 440 may process the data sequentially, in parallel, staggered, interleaved, or in various processes to prioritize between multiple butterfly stage operations, to maximize data throughput and minimize waiting period for the memory or the processor, without having to increase overall circuit or power or clocking speed.
  • FIG. 5 illustrates an exemplary processing device 500 according to an embodiment of the present disclosure.
  • a processing device 500 that performs a mixed-radix FFT using a single-port multi-bank memory is described.
  • the processing device 500 may be connected to a memory 520 with (2R number of) multiple memory banks 520.0 to 520.2R-1, containing memory capacity for storing N entries of complex data points for processing.
  • the processing device 500 may include an address generator unit (AGU) 510 that generates the memory address assignments and a traverse order for the data according to the mixed-radix settings.
  • the AGU 510 is connected to an interface 580, which reads the data from the memory 520 according to the memory address assignments in the traversal order generated by the AGU 510 and writes the processed data back into the memory 520.
  • the AGU 510 is connected to a processor (PU) 540, which processes the data of more than one butterfly stage operations of the FFT, prior to the interface 580 writing the processed data back to the memory 520.
  • PU processor
  • Memory 520 here may be a single-port memory, with one set of ports for both input (write) and output (read) operations. Single-port memory may require less circuitry space.
  • the FFT operation may be performed using the processing device 500, by
  • the AGU 510 may be modified in order to allow use of 2R number of single-port memory banks without increase of overall memory words count.
  • the AGU 510 may g emory bank assignments, represented as:
  • traverse order for stage 0 represented as: traverse order for other stages, represented as:
  • Tc (t * n- i> ⁇ t c +-. fc c+1 , fc c+1 » k C . lt .... k 0 ) [fc n _ if ⁇ ⁇ » k c -. fc c +j., ATC+ ⁇ ⁇ c - i> .... fc 0 j ⁇
  • the bank assignment »i above used with traversal orders above may ensure no memory access conflicts for FFT operations in the above configuration using a single-port memory.
  • every butterfly stage operations of radix- r stage may utilize all values of , and absence of conflicts in radix-R stage may be ensured by interleaving do mod 2 values for subsequent butterflies, the processing device may need to wait for the radix-*" stage operations to complete before launching the first radix-fl stage.
  • read/write conflicts within one butterfly on wings ⁇ 'c k e may be represented as:
  • Data points in butterfly operation stages from one group may have overlapping index values of A- ' n-u— ' fc 3» fc i , and may differ in & ⁇ & ⁇ .
  • index values of i o may overlap for overlapping index values of bi> fro- Thus, conflicts within one butterfly group may be prevented.
  • Index values of k i interleave for subsequent butterfly groups With a pipeline having odd length, it guarantees that any 2 butterfly groups that have read and write operations within the same clock have different parity of , therefore use banks with a different second bit in radix-2 representation of the bank's number. Hence there are no conflicts on wings of butterflies from different butterfly groups in radix- r stage.
  • values of « and r above may be adjusted at run-time to use one FFT processing device to calculate transforms (and reverse transforms) of different sizes. For example, depending on the size of the data sample N, available memory banks, available memory I/O ports or I/O bandwidth, processor speed, or other factors, values of" and i" may be adjusted at run-time to maximize the data throughput in the processing device, and to minimize the waiting period for the memory or the processor, without having to increase overall circuit or power or clocking speed.
  • PU 540 may include a processor with general processing capabilities, or specialized hardware. PU 540 may process the data sequentially, in parallel, staggered, interleaved, or in various process to prioritize between multiple butterfly stage operations, to maximize data throughput and minimize waiting period for the memory or the processor, without having to increase overall circuit or power or clocking speed.
  • FIG. 6 illustrates an exemplary processing device 600 according to an embodiment of the present disclosure.
  • a processing device 600 that performs a mixed-radix FFT with self-sorting using a single-port multi-bank memory is described.
  • the processing device 600 may be connected to a memory 620 with (2R number of) multiple memory banks 620.0 to 620.2R-1, containing memory capacity for storing N entries of complex data points for processing.
  • the processing device 600 may include an address generator (AGU) 610 that generates the memory address assignments and a traverse order for the data according to the mixed-radix settings.
  • the AGU 610 is connected to an interface 680, which reads the data from the memory 620 according to the memory address assignments in the traversal order generated by the AGU 610 and writes the processed data back into the memory 620.
  • AGU address generator
  • the AGU 610 is connected to a processor (PU) 640, which processes the data of more than one butterfly stage operations of the FFT, prior to the interface 680 writing the processed data back to the memory 620. Additionally, a pipeline 650 connects the input interface 680 to the PU 640.
  • PU processor
  • Memory 620 here may be a single-port memory, with one set of port for both input (write) and output (read) operations. Single-port memory may require less circuitry space.
  • the FFT operation may be performed using the processing device 600, by
  • DIT and DIF may lead to input or output data having reversed digit order. In order to obtain proper result, a digit reverse operation may need to be performed.
  • a digit reverse operation may be incorporated into the operation of the processing device such that a separate digit reverse operation may not be required.
  • the processing device of the embodiment may launch multiple butterflies in radix- 1 " stage generally simultaneously.
  • the bank assignment may need to be invariant with respect to switching of the last digit *. ' n-i and the first digit fe o .
  • the AGU 610 may generate bank assignment, which may be represented as:
  • the traverse orders generated by AGU 610 may be represented as:
  • stage c Starting from stage 2 , for stage c , where c ⁇ n— 1 ⁇ butterfly with input of data
  • the first 2 ⁇ stages, the outputs of butterflies may be transposed.
  • the output transposition may be accomplished by delaying the write operations in the butterfly stages that perform the digit reverse operations above.
  • stages performing digit reverse operations are not in-place. Thus, it may need to be ensured that during various stage
  • a memory location is written only after it is read by a butterfly.
  • butterfly stage operations may be grouped into batches of size 2R.
  • Read/write conflicts may be prevented by interleaving in some index values .
  • one size 2R batch may be formed from two size R batches covering all index values of k c , fccountry_ c _ ⁇ t such that index values of k, interleave between the two size R batches.
  • batch 1 having R butterflies (Butterfly 1.0, Butterfly 1.1, .... Butterfly
  • R-2 Butterfly 2.R-1
  • R-2 Butterfly 2.R-1
  • a size 2R batch (Butterfly 1.0, Butterfly 2.0, Butterfly 1.1, Butterfly 2.1, Butterfly l.R-2, Butterfly 2.R-2, Butterfly l.R-1, Butterfly 2.R-1).
  • 2R— 1 - v may prevent read/write conflicts in self-sorting.
  • butterfly stage operations may need to have write operations delayed
  • parallel execution of multiple butterfly stage operations may increase the overall FFT operation speed.
  • pipeline 650 may include any hardware and/or software component to postpone write operations for a predetermined number of clock periods.
  • pipeline 650 may include software loop delays, or hardware components, such as flip-flops, buffers, etc., capable of postponing transfer of data.
  • Pipeline 650 may also be located any where along the read or write paths between memory 620, interface 680, and PU 640.
  • values of n and r above may be adjusted at run-time to use one FFT processing device to calculate transforms (and reverse transforms) of different sizes. For example, depending on the size of the data sample N, available memory banks, available memory I/O ports or I/O bandwidth, processor speed, or other factors, values of « and r may be adjusted at run-time to maximize the data throughput in the processing device, and to minimize waiting period for the memory or the processor, without having to increase overall circuit or power or clocking speed.
  • PU 640 may include a processor with general processing capabilities, or specialized hardware. PU 640 may process the data sequentially, in parallel, staggered, interleaved, or in various process to prioritize between multiple butterfly stage operations, to maximize data throughput and minimize waiting period for the memory or the processor, without having to increase overall circuit or power or clocking speed.

Landscapes

  • Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Discrete Mathematics (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Complex Calculations (AREA)

Abstract

L'invention concerne un procédé et un processeur pour effectuer une FFT à bases mélangées exempte de conflit en flux continu pour des données dans une mémoire. De multiples calculs "papillons" de petite base sont lancés généralement en parallèle dans une FFT à bases mélangées en utilisant une génération d'adresses exempte de conflit avec une mémoire. Les multiples calculs "papillons" des entrées de données peuvent être étagés dans un processeur, de sorte que les opérations de lecture et d'écriture de mémoire puissent être exécutées continûment sans conflits d'accès.
PCT/IB2013/000446 2013-01-09 2013-01-09 Transformation de fourier rapide à bases mélangées exempte de conflit en flux continu dans une mémoire multibloc WO2014108718A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/IB2013/000446 WO2014108718A1 (fr) 2013-01-09 2013-01-09 Transformation de fourier rapide à bases mélangées exempte de conflit en flux continu dans une mémoire multibloc

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/IB2013/000446 WO2014108718A1 (fr) 2013-01-09 2013-01-09 Transformation de fourier rapide à bases mélangées exempte de conflit en flux continu dans une mémoire multibloc

Publications (1)

Publication Number Publication Date
WO2014108718A1 true WO2014108718A1 (fr) 2014-07-17

Family

ID=51166557

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2013/000446 WO2014108718A1 (fr) 2013-01-09 2013-01-09 Transformation de fourier rapide à bases mélangées exempte de conflit en flux continu dans une mémoire multibloc

Country Status (1)

Country Link
WO (1) WO2014108718A1 (fr)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106201999A (zh) * 2016-07-26 2016-12-07 中国科学院自动化研究所 混合基dft/idft并行读取及计算方法和装置
WO2017019095A1 (fr) * 2015-07-30 2017-02-02 Hewlett Packard Enterprise Development Lp Accès mémoire par entrelacement
CN106469134A (zh) * 2016-08-29 2017-03-01 北京理工大学 一种用于fft处理器的数据无冲突存取方法
CN112800386A (zh) * 2021-01-26 2021-05-14 Oppo广东移动通信有限公司 傅里叶变换处理方法和处理器、终端、芯片及存储介质
WO2023244453A1 (fr) * 2022-06-17 2023-12-21 Achronix Semiconductor Corporation Tri par base parallèle sans conflit

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1269346B1 (fr) * 2000-03-10 2007-10-31 Jaber Associates, L.L.C. Traitement multiprocesseur parallele pour une transformation fourier rapide avec architecture en pipeline
US20070288542A1 (en) * 2006-04-28 2007-12-13 Qualcomm Incorporated Multi-port mixed-radix fft

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1269346B1 (fr) * 2000-03-10 2007-10-31 Jaber Associates, L.L.C. Traitement multiprocesseur parallele pour une transformation fourier rapide avec architecture en pipeline
US20070288542A1 (en) * 2006-04-28 2007-12-13 Qualcomm Incorporated Multi-port mixed-radix fft

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017019095A1 (fr) * 2015-07-30 2017-02-02 Hewlett Packard Enterprise Development Lp Accès mémoire par entrelacement
US10579519B2 (en) 2015-07-30 2020-03-03 Hewlett Packard Enterprise Development Lp Interleaved access of memory
CN106201999A (zh) * 2016-07-26 2016-12-07 中国科学院自动化研究所 混合基dft/idft并行读取及计算方法和装置
CN106469134A (zh) * 2016-08-29 2017-03-01 北京理工大学 一种用于fft处理器的数据无冲突存取方法
CN106469134B (zh) * 2016-08-29 2019-02-15 北京理工大学 一种用于fft处理器的数据无冲突存取方法
CN112800386A (zh) * 2021-01-26 2021-05-14 Oppo广东移动通信有限公司 傅里叶变换处理方法和处理器、终端、芯片及存储介质
WO2023244453A1 (fr) * 2022-06-17 2023-12-21 Achronix Semiconductor Corporation Tri par base parallèle sans conflit

Similar Documents

Publication Publication Date Title
EP3039570B1 (fr) Structure de données à matrice creuse
EP2013772B1 (fr) Transformation de fourier rapide à base mixte à ports multiples
US20080208944A1 (en) Digital signal processor structure for performing length-scalable fast fourier transformation
WO2014108718A1 (fr) Transformation de fourier rapide à bases mélangées exempte de conflit en flux continu dans une mémoire multibloc
Garrido et al. A 4096-point radix-4 memory-based FFT using DSP slices
Chen et al. Continuous-flow parallel bit-reversal circuit for MDF and MDC FFT architectures
US20150331634A1 (en) Continuous-flow conflict-free mixed-radix fast fourier transform in multi-bank memory
WO2005057423A2 (fr) Architecture fft et procede associe
US9317481B2 (en) Data access method and device for parallel FFT computation
US20060253514A1 (en) Memory-based Fast Fourier Transform device
US9176929B2 (en) Multi-granularity parallel FFT computation device
US8825729B1 (en) Power and bandwidth efficient FFT for DDR memory
EP2778948A2 (fr) Accélérateur de FFT
Richardson et al. Building conflict-free FFT schedules
US20100179978A1 (en) Fft-based parallel system with memory reuse scheme
Sorokin et al. Conflict-free parallel access scheme for mixed-radix FFT supporting I/O permutations
US11614945B2 (en) Apparatus and method of a scalable and reconfigurable fast fourier transform
US9268744B2 (en) Parallel bit reversal devices and methods
US9459812B2 (en) System and method for zero contention memory bank access in a reorder stage in mixed radix discrete fourier transform
EP3066583B1 (fr) Dispositif de fft et procédé pour effectuer une transformation de fourier rapide
WO2014089830A1 (fr) Procédés et appareil de décodage
Hassan et al. Implementation of a reconfigurable ASIP for high throughput low power DFT/DCT/FIR engine
RU2717950C1 (ru) Высокоскоростное устройство быстрого преобразования фурье с бесконфликтным линейным доступом к памяти
Kaya et al. A novel addressing algorithm of radix-2 FFT using single-bank dual-port memory
Ma et al. A novel conflict-free parallel memory access scheme for FFT constant geometry architectures

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13871241

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13871241

Country of ref document: EP

Kind code of ref document: A1