WO2018129930A1 - 快速傅里叶变换处理方法、装置和计算机存储介质 - Google Patents

快速傅里叶变换处理方法、装置和计算机存储介质 Download PDF

Info

Publication number
WO2018129930A1
WO2018129930A1 PCT/CN2017/099342 CN2017099342W WO2018129930A1 WO 2018129930 A1 WO2018129930 A1 WO 2018129930A1 CN 2017099342 W CN2017099342 W CN 2017099342W WO 2018129930 A1 WO2018129930 A1 WO 2018129930A1
Authority
WO
WIPO (PCT)
Prior art keywords
memory
data
read
butterfly operation
counter
Prior art date
Application number
PCT/CN2017/099342
Other languages
English (en)
French (fr)
Inventor
余成
Original Assignee
深圳市中兴微电子技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市中兴微电子技术有限公司 filed Critical 深圳市中兴微电子技术有限公司
Publication of WO2018129930A1 publication Critical patent/WO2018129930A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/14Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
    • G06F17/141Discrete Fourier transforms
    • G06F17/142Fast Fourier transforms, e.g. using a Cooley-Tukey type algorithm
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/14Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F5/00Methods or arrangements for data conversion without changing the order or content of the data handled
    • G06F5/06Methods or arrangements for data conversion without changing the order or content of the data handled for changing the speed of data flow, i.e. speed regularising or timing, e.g. delay lines, FIFO buffers; over- or underrun control therefor
    • G06F5/16Multiplexed systems, i.e. using two or more similar devices which are alternately accessed for enqueue and dequeue operations, e.g. ping-pong buffers

Definitions

  • the present invention relates to digital signal processing technologies, and in particular, to a fast Fourier transform (FFT) processing method, apparatus, and computer storage medium.
  • FFT fast Fourier transform
  • FFT is an important unit in digital signal processing and is widely used in various fields of modern digital signal processing, such as wired communication, radar, satellite communication and multimedia signal processing.
  • the FFT can be divided into a parallel implementation structure and a serial implementation structure according to the implementation structure.
  • parallel pipeline structures are often chosen to implement FFT operations; however, parallel pipeline architectures have the disadvantage of high hardware resource overhead.
  • the N-point FFT as an example, assuming that the data bit width is W, it is implemented by a parallel pipeline structure, and the required resources are at least: Registers, Complex multiplication. Therefore, in resource-sensitive applications, serial implementation structures are often chosen.
  • serial implementation structure uses the storage resources and the butterfly operation unit to achieve the purpose of streamlining resources.
  • a common practice is to use two full-function dual-port random access memory (RAM) as a ping-pong buffer for butterfly operation.
  • RAM full-function dual-port random access memory
  • the use of full-featured dual-port RAM consumes more resources, and the full-function dual-port RAM consumes more power.
  • embodiments of the present invention are expected to provide an FFT processing method and apparatus, and it is desirable to at least partially solve the problem of large circuit resource consumption and/or large power consumption in the FFT.
  • An embodiment of the present invention provides an FFT processing method, where the method includes:
  • a set of discrete digital signals are sequentially read from the upper memory and the lower memory to perform a butterfly operation, and the butterfly operation result is written into the upper memory and the lower memory according to the operation write address rule until completion.
  • the data of the butterfly operation is read from the upper memory and the lower memory, and the read data is determined as the FFT result of the discrete digital signal sequence.
  • An embodiment of the present invention further provides an FFT processing apparatus, where the apparatus includes: an input controller, an arithmetic unit, an output controller, an upper memory, and a lower memory; wherein
  • the input controller is configured to write each discrete digital signal in the discrete digital signal sequence into an upper memory and a lower memory according to an input address rule;
  • the operator is configured to read a set of discrete digital signals from the upper memory and the lower memory in sequence to perform a butterfly operation according to the operation, and write the butterfly operation result into the upper memory according to the operation write address rule. And the lower memory, until the completion of the butterfly operation all the levels of the operation;
  • the output controller is configured to read data of the butterfly operation from the upper memory and the lower memory according to an output address rule, and determine the read data as an FFT result of the discrete digital signal sequence.
  • the embodiment of the invention further provides a computer storage medium, wherein the computer storage medium stores computer executable instructions, and the computer executable instructions are used to execute the foregoing FFT processing method.
  • the FFT processing method, apparatus and computer storage medium write discrete digital signals in a discrete digital signal sequence into an upper memory and a lower memory according to an input address rule; A set of discrete digital signals are respectively read from the upper memory and the lower memory for butterfly operation, and the butterfly operation result is written into the upper memory and the lower memory according to the operation write address rule until all stages of the butterfly operation are completed. Reading the data of the completed butterfly operation from the upper memory and the lower memory according to the output address rule, and determining the read data as an FFT result of the discrete digital signal sequence; the memory may be a simple dual port RAM . In this way, the FFT is implemented by using a simple dual-port RAM, which saves chip resources, reduces chip power consumption, and improves chip competitiveness compared to the existing full-function dual-port RAM scheme.
  • FIG. 1 is a schematic flow chart of an FFT processing method according to an embodiment of the present invention.
  • FIG. 2 is a schematic diagram of a port of a full-function dual-port RAM according to an embodiment of the present invention
  • FIG. 3 is a block diagram of a technical solution for performing FFT processing using a full-function dual-port RAM according to an embodiment of the present invention
  • FIG. 4 is a schematic diagram of a port of a simple dual port RAM according to an embodiment of the present invention.
  • FIG. 5 is a block diagram of a technical solution for performing FFT processing using a simple dual port RAM according to an embodiment of the present invention
  • FIG. 6 is a schematic diagram of a 16-point FFT operation flow according to an embodiment of the present invention.
  • FIG. 7 is a schematic diagram of processing of an address controller in a discrete digital signal sequence input process according to an embodiment of the present invention.
  • FIG. 8 is a schematic diagram of a value of a write address in a process of inputting a discrete digital signal sequence according to an embodiment of the present invention
  • FIG. 9 is a schematic diagram of address controller read/write address control in a butterfly operation process according to an embodiment of the present invention.
  • FIG. 10 is a schematic diagram of values of read and write addresses in a butterfly operation according to an embodiment of the present invention.
  • FIG. 11 is a schematic diagram of intermittent data exchange according to an embodiment of the present invention.
  • FIG. 12 is a schematic diagram of processing of an address controller of a processing result output process according to an embodiment of the present invention.
  • FIG. 13 is a schematic diagram of the value of an output read address according to an embodiment of the present invention.
  • FIG. 14 is a schematic structural diagram of a structure of an FFT processing apparatus according to an embodiment of the present invention.
  • each discrete digital signal in the discrete digital signal sequence is written into the upper memory and the lower memory according to the input address rule; according to the operation read address rule, one of the upper memory and the lower memory is sequentially read respectively.
  • the discrete digital signal is subjected to butterfly operation, and the butterfly operation result is written into the upper memory and the lower memory according to the operation write address rule until all stages of the butterfly operation are completed; according to the output address rule, from the upper memory and the lower part
  • the data of the butterfly operation is read in the memory, and the read data is determined as the FFT result of the discrete digital signal sequence.
  • the FFT processing method provided by the embodiment of the present invention is as shown in FIG. 1 , and the method includes:
  • Step 101 Write each discrete digital signal in the discrete digital signal sequence into an upper memory and a lower memory according to an input address rule
  • the memory may be a simple dual port RAM, the simple dual port RAM consumes less resources than the full function dual port RAM, and the power consumption is relatively small;
  • FIG. 2 is a full function dual used in the existing FFT processing technology solution.
  • the port map of the port RAM usually the number of points of the FFT determines the depth of the full-featured dual-port RAM; in the block diagram of the technical solution of the FFT processing of the full-function dual-port RAM shown in FIG.
  • Two full-function dual-port RAMs of depth N are used as ping-pong buffers; when FFT-processed discrete digital signal sequence data is input, it is stored in the ping buffer; then, the corresponding two data in the full-function dual-port RAM are simultaneously read and butterfly The shape operation, the two data obtained after the operation are simultaneously written into the pong buffer until all the data butterfly operations of the first stage end; to the second stage butterfly operation stage, the corresponding two are read from the pong buffer.
  • the data is subjected to a butterfly operation, and the two data obtained after the operation are simultaneously written into the ping buffer until the second-level butterfly operation ends; the subsequent butterfly operation and the like; and the same depth is simplified.
  • Single- and dual-port RAM saves more circuit resources than full-featured dual-port RAM, saving resources and power.
  • the FFT processing method proposed by the embodiment of the present invention is implemented based on a simple dual port RAM, and the port diagram of the simple dual port RAM is as shown in FIG. 4; the simple dual port RAM based on the technical scheme block diagram shown in FIG. 5 can be implemented.
  • FFT processing Alternatively, two simple dual-port RAM upper memory (RAM1) and lower memory (RAM2) with a depth of N/2 can be used instead of two full-function dual-port RAMs of depth N in the above design, saving at least Half of the storage units provide a more practical solution for resource-sensitive circuit design.
  • the technical scheme shown in FIG. 5 can realize serial input of discrete digital signal sequence data, serial output after operation; controller capable of writing and reading addresses by using hardware logic inside the chip;
  • a binary number whose number of bits is equal to the butterfly operation level can be used as a data input counter, and counted from small to large; the data is input to the counter.
  • the highest bit inverted value and the highest bit of the data input counter are respectively used as write enable of the upper memory and the lower memory; the data is input and counted Removing the reversed binary number of the remaining bits after the highest bit as a write address; according to the data input counter counting, sequentially writing the discrete digital signals to the upper portion according to the write address corresponding to the data input counter Memory and lower memory;
  • N is usually an integer power of 2; the input data can be counted using the address counter A[M-1:0], where M is the number of stages for performing the butterfly operation.
  • the highest bit of the counter is inverted (!A[M-1]) as the write enable of the upper memory, and the highest bit of the counter A[M-1] is used as the write enable of the lower memory.
  • the other bits of the counter are reversed (A[0], A[1]...A[M-2]) as the write address of the two RAMs. Both simple dual port RAM depths are N/2;
  • the flow chart of the 16-point FFT operation is shown in Fig. 6.
  • the processing block diagram of the controller is shown in Figure 7, and the input data is counted using the address counter A[3:0].
  • the highest bit of the counter is inverted (!A[3]) as the write enable of the simple dual port RAM1, and the highest bit of the counter A[3] is used as the write enable for the simple dual port RAM2.
  • the other bits of the counter are reversed (A[0], A[1], A[2]) as the write address of the two RAMs.
  • the two simple dual-port RAM depths are all 8; the specific write address table is shown in Figure 8. For each count of the counter A[3:0], a set of discrete digital signals can be written into the write address corresponding to the counter. Until the end of the count.
  • write control of the upper memory and the lower memory can be implemented by an address controller in the chip in conjunction with a data selector (MUX) or the like.
  • MUX data selector
  • Step 102 According to the operation read address rule, sequentially read a set of discrete digital signals from the upper memory and the lower memory to perform a butterfly operation, and write the butterfly operation result into the two memories according to the operation write address rule until Complete the operation of all stages of the butterfly operation;
  • a butterfly operation can be performed; and the discrete digital signals stored in the upper memory and the lower memory can be read by using an operation read address rule Perform a butterfly operation to match Figure 6.
  • the binary number whose number of bits is equal to the difference of the butterfly operation level minus one can be used as an operation counter, and counts from small to large; in the first stage butterfly operation, each butterfly operation uses the same binary number of the operation counter as Reading the address, the butterfly operation of the subsequent series respectively shifts the leftward shift of the reading address of the previous butterfly operation as the read address; according to the counting of the operation counter, sequentially from the upper memory and the lower memory, respectively Reading a discrete digital signal corresponding to the read address of the operation counter and performing a butterfly operation;
  • the data may be sequentially fetched from the RAM 1 and the RAM 2 according to the read address to perform a butterfly operation.
  • the order of the butterfly operations of each stage of the butterfly operation shown in FIG. 6 is counted from top to bottom, which is expressed as I[M-2:0]; in the first stage operation, the value of the counter can also be used. It is regarded as the RAM read address of the read data, and the read data is sequentially subjected to the butterfly operation.
  • the first-level read address can be rotated to the left, and the read address of the read data is ⁇ I[M-3:0], I[M-2] ⁇ , and so on.
  • the processing block diagram of the address controller in the chip is as shown in FIG. 9, and the data is sequentially extracted from the RAM 1 and the RAM 2 according to the read address to perform a butterfly operation.
  • the butterfly unit of each level of butterfly operation is counted from top to bottom in the order of use, expressed as I[2:0].
  • the value of the counter can also be regarded as the RAM read address of the read data, and the read data is sequentially subjected to the butterfly operation.
  • the read address of the read data is ⁇ I[1:0], I[2] ⁇ . By analogy, it changes step by step; its specific read address value is shown in Figure 10.
  • the counter is 001, and the data can be taken out from the address 001 of RAM1 and RAM2 at the same time for butterfly shape.
  • the counter is 001, and the read address is the second read address of the first stage.
  • the left read bit is shifted left by one bit, which is 010, and the address of RAM1 and RAM2 can be simultaneously.
  • the data is taken out in 010 for butterfly operation.
  • the result of the butterfly operation it can be written into the upper memory and the lower memory according to the operation write address rule until the butterfly operation of all the stages is completed.
  • the result of each butterfly operation may be stored in the upper part according to the butterfly operation data read address and the read address sequence. Memory and lower memory; two sets of results of the same butterfly operation are sequentially stored in the same memory;
  • the two-way data for performing the butterfly operation from the RAM1 and the RAM2 are respectively X-channel and Y-channel, and the read address is 000.
  • the X-channel data and the Y-channel data are simultaneously output from the butterfly operation unit.
  • the Y-channel data after the butterfly operation is delayed by one cycle.
  • the X and Y data are exchanged by the data selector for the interval data, and the two sets of the butterfly operation are sequentially written into the RAM1 according to the X-channel data read address, that is, the 000 and 001 addresses of the RAM1 are written;
  • the first butterfly operation of the first stage is similar.
  • the second operation result of the first stage is written into the 000 and 001 addresses of the RAM2, and the subsequent interval is written in the same interval.
  • the result of the specific interval data exchange operation of the first stage butterfly operation is as shown in the figure. 11; taking 16-point FFT processing as an example, the write address after the butterfly operation is as shown in FIG. 10;
  • the write address corresponding to the Y path data is postponed to the write address corresponding to the X path data for one clock cycle, then the two data obtained by the first butterfly operation are stored in the RAM1, and the second butterfly is Both data obtained by the shape operation are stored in RAM2, and so on.
  • the purpose of this processing is to store the two data of the next stage simultaneously participating in the butterfly operation in two different RAMs, avoiding the existence of the same RAM, which causes problems in the next data fetch.
  • this processing method is used to write back the data after the butterfly operation to RAM1 and RAM2;
  • the second-level data for the operation is read according to the operation read address rule, and the butterfly operation is performed in the upper memory and the lower memory according to the operation write address rule, and is repeated until completion.
  • Butterfly operation for all series is read according to the operation read address rule, and the butterfly operation is performed in the upper memory and the lower memory according to the operation write address rule, and is repeated until completion.
  • Step 103 Read data of the butterfly operation from the upper memory and the lower memory according to the output address rule, and determine the read data as an FFT result of the discrete digital signal sequence;
  • the binary number whose number of bits is equal to the number of butterfly operation stages can be used as a data output counter, and counted from small to large; the highest displacement of the data output counter after removing the lowest bit is the most a lower bit, and each of the binary numbers that have been shifted is used as a sorted output read address; sorted according to the output read address, and sequentially read the completed butterfly operation from the upper memory and the lower memory at each output read address Data; the lowest bit of the data output counter can be read enable of the upper memory and the lower memory, the upper memory and the lower memory are enabled at intervals according to the data output counter, and the address is read from the same output Reading the data of the completed butterfly operation
  • the counter enable signal of N cycles can be counted using the counter K[M-1:0]; since the data is stored in the RAM1 and the RAM2, respectively, and the output is output by one channel, then, The same output read address of RAM1 and RAM2 are read, so that each output read address lasts for two cycles, RAM1 and RAM2 are respectively enabled in these two cycles, and the output data is selected and output; wherein, the output is read.
  • the fetch address may be a binary number after shifting the highest bit of the counter K[M-1:0] to the lowest bit after removing the highest bit, and the binary number that completes the shift may be used as the sorted output read address;
  • the processing block diagram of the address controller in the chip is shown in Figure 12.
  • the counter enable signal is counted for 16 cycles using the counter K[3:0]; since the data is stored in RAM1 and In RAM2, the output is output by one channel; thus, the same read address signals of RAM1 and RAM2 are read, and each read address signal is continued for two cycles; in these two cycles, RAM1 and RAM2 are respectively enabled.
  • the output read address may be a binary number after shifting the highest bit of the counter K[3:0] to the lowest bit after removing the highest bit, and the binary that can complete the shift
  • the number is the output read address of the sort; the specific read address value is shown in Figure 13.
  • the data of the same output read address is read in RAM1 and RAM2 in two cycles, and serialized. Output, until the counter finishes counting, that is, all butterfly operation results are read;
  • the read control of the upper memory and the lower memory can be implemented by an address controller in the chip in combination with the MUX or the like.
  • the FFT processing of the discrete digital signal sequence is accomplished by a technical solution consisting of a simple dual port RAM.
  • the apparatus includes: the apparatus includes: an input controller 1401, an operator 1402, an output controller 1403, an upper memory 1404, and a lower memory 1405;
  • the input controller 1401 is configured to write each discrete digital signal in the discrete digital signal sequence into the upper memory 1404 and the lower memory 1405 according to the input address rule;
  • the memory may be a simple dual port RAM, the simple dual port RAM consumes less resources than the full function dual port RAM, and the power consumption is relatively small;
  • FIG. 2 is a full function dual used in the existing FFT processing technology solution.
  • the port map of the port RAM usually the number of points of the FFT determines the depth of the full-featured dual-port RAM; in the block diagram of the technical solution of the FFT processing of the full-function dual-port RAM shown in FIG.
  • Two full-function dual-port RAMs of depth N are used as ping-pong buffers; when FFT-processed discrete digital signal sequence data is input, it is stored in the ping buffer; then, the corresponding two data in the full-function dual-port RAM are simultaneously read and butterfly The shape operation, the two data obtained after the operation are simultaneously written into the pong buffer until all the data butterfly operations of the first stage end; to the second stage butterfly operation stage, the corresponding two are read from the pong buffer.
  • the data is subjected to a butterfly operation, and the two data obtained after the operation are simultaneously written into the ping buffer until the second-level butterfly operation ends; the subsequent butterfly operation and the like; and the same depth is simplified.
  • Single- and dual-port RAM saves more circuit resources than full-featured dual-port RAM, saving resources and power.
  • the FFT processing method proposed by the embodiment of the present invention is implemented based on a simple dual port RAM, and the port diagram of the simple dual port RAM is as shown in FIG. 4; the simple dual port RAM based on the technical scheme block diagram shown in FIG. 5 can be implemented. FFT processing. Further, two simple dual port RAM upper memory 1404 (RAM1) and lower memory 1405 (RAM2) with a depth of N/2 can be used instead of two full-function dual port RAMs of depth N in the above design. Saves at least half of the memory cells, providing a more practical solution for resource-sensitive circuit design.
  • the technical scheme shown in FIG. 5 can realize serial input of discrete digital signal sequence data, serial output after operation; controller capable of writing and reading addresses by using hardware logic inside the chip;
  • a binary number whose number of bits is equal to the butterfly operation level can be used as a data input counter, and counted from small to large; the data is input to the counter.
  • the highest bit inverted value and the highest bit of the data input counter are respectively enabled as write enable of the upper memory 1404 and the lower memory 1405; the data input counter is removed by removing the reversed binary number of each bit after the highest bit as a write Entering the address; according to the data input counter count, the discrete digital signals are sequentially written into the upper memory 1404 and the lower memory 1405 according to the write address corresponding to the data input counter;
  • N is usually an integer power of 2; the input data can be counted using the address counter A[M-1:0], where M is the number of stages for performing the butterfly operation.
  • the highest bit of the counter is inverted (!A[M-1]) as the write enable of the upper memory 1404, and the most significant bit A[M-1] of the counter is the write enable of the lower memory 1405.
  • the other bits of the counter are reversed (A[0], A[1]...A[M-2]) as the write address of the two RAMs. Both simple dual port RAM depths are N/2;
  • the flow chart of the 16-point FFT operation is shown in Fig. 6.
  • the processing block diagram of the controller is shown in Figure 7, and the input data is counted using the address counter A[3:0].
  • the highest bit of the counter is inverted (!A[3]) as the write enable of the simple dual port RAM1, and the highest bit of the counter A[3] is used as the write enable for the simple dual port RAM2.
  • the other bits of the counter are reversed (A[0], A[1], A[2]) as the write address of the two RAMs.
  • the two simple dual-port RAM depths are all 8; the specific write address table is shown in Figure 8. For each count of the counter A[3:0], a set of discrete digital signals can be written into the write address corresponding to the counter. Until the end of the count.
  • the input controller 1401 can implement write control of the upper memory 1404 and the lower memory 1405 by an address controller in the chip in conjunction with MUX or the like.
  • the operator 1402 is configured to read a set of discrete digital signals from the upper memory 1404 and the lower memory 1405 in sequence to perform a butterfly operation according to the operation read address rule, and write the butterfly operation result into an address rule according to the operation. Into the upper memory 1404 and the lower memory 1405 until the operation of all stages of the butterfly operation is completed;
  • a butterfly operation can be performed; the operation can be read and stored in the upper memory 1404 and the lower memory 1405 using an operation read address rule.
  • the medium discrete digital signal performs a butterfly operation to conform to the operational flow diagram shown in FIG.
  • the binary number whose number of bits is equal to the difference of the butterfly operation level minus one can be used as an operation counter, and counts from small to large; in the first stage butterfly operation, each butterfly operation uses the same binary number of the operation counter as Reading the address, the butterfly operation of the subsequent series respectively shifts the leftward shift of the read address of the previous-stage butterfly operation as the read address; according to the count of the operation counter, sequentially from the upper memory 1404 and the lower memory 1405, respectively. Reading a discrete digital signal at a read address corresponding to the operation counter and performing a butterfly operation;
  • the data may be sequentially fetched from the RAM 1 and the RAM 2 according to the read address to perform a butterfly operation.
  • the order of the butterfly operations of each stage of the butterfly operation shown in FIG. 6 is counted from top to bottom, which is expressed as I[M-2:0]; in the first stage operation, the value of the counter can also be used. It is regarded as the RAM read address of the read data, and the read data is sequentially subjected to the butterfly operation.
  • the first-level read address can be rotated to the left, and the read address of the read data is ⁇ I[M-3:0], I[M-2] ⁇ , and so on.
  • the processing block diagram of the address controller in the chip is as shown in FIG. 9, and the data is sequentially extracted from the RAM 1 and the RAM 2 according to the read address to perform a butterfly operation.
  • the order of use of the butterfly unit of each butterfly operation is counted from top to bottom. Expressed as I[2:0].
  • the value of the counter can also be regarded as the RAM read address of the read data, and the read data is sequentially subjected to the butterfly operation.
  • the read address of the read data is ⁇ I[1:0], I[2] ⁇ . By analogy, it changes step by step; its specific read address value is shown in Figure 10.
  • the counter is 001, and the data can be taken out from the address 001 of RAM1 and RAM2 at the same time for butterfly shape.
  • the counter is 001, and the read address is the second read address of the first stage.
  • the left read bit is shifted left by one bit, which is 010, and the address of RAM1 and RAM2 can be simultaneously.
  • the data is taken out in 010 for butterfly operation.
  • the upper memory 1404 and the lower memory 1405 can be written in accordance with the arithmetic write address rule until the butterfly operation of all the stages is completed.
  • the result of each butterfly operation may be stored in the upper memory 1404 and the lower memory 1405 according to the butterfly operation data read address and the read address sequence; the two sets of the same butterfly operation are sequentially stored in the same Memory
  • the two-way data for performing the butterfly operation from the RAM1 and the RAM2 are respectively X-channel and Y-channel, and the read address is 000.
  • the X-channel data and the Y-channel data are simultaneously output from the butterfly operation unit.
  • the Y-channel data after the butterfly operation is delayed by one cycle.
  • the X and Y data are exchanged by the data selector for the interval data, and the two sets of the butterfly operation are sequentially written into the RAM1 according to the X-channel data read address, that is, the 000 and 001 addresses of the RAM1 are written;
  • the first butterfly operation of the first stage is similar.
  • the second operation result of the first stage is written into the 000 and 001 addresses of the RAM2, and the subsequent interval is written in the same interval.
  • the result of the specific interval data exchange operation of the first stage butterfly operation is as shown in the figure. 11; taking 16-point FFT processing as an example, the write address after the butterfly operation is as shown in FIG. 10;
  • the write address corresponding to the Y path data is postponed to the write address corresponding to the X path data for one clock cycle, then the two data obtained by the first butterfly operation are stored in the RAM1, and the second butterfly is Both data obtained by the shape operation are stored in RAM2, and so on.
  • the purpose of this processing is to store the two data of the next level simultaneously participating in the butterfly operation in two different ways. In the RAM, avoiding the same RAM, causing problems with the next data fetch. In the next M-1 level operation, this processing method is used to write back the data after the butterfly operation to RAM1 and RAM2;
  • the data for the second-level operation is read according to the operation read address rule to perform a butterfly operation, and is written into the upper memory 1404 and the lower memory 1405 according to the operation write address rule, and is repeatedly repeated. Until the butterfly operation of all series is completed.
  • the output controller 1403 is configured to read data of the butterfly operation from the upper memory 1404 and the lower memory 1405 according to the output address rule, and determine the read data as an FFT result of the discrete digital signal sequence. ;
  • a binary number whose number of bits is equal to the number of butterfly operation stages can be used as a data output counter, and counted from small to large; the highest displacement of the data output counter after the lowest bit is removed to the lowest bit, and the shift is completed.
  • Each binary number is used as a sorted output read address; sorted according to the output read address, and the data of the completed butterfly operation is sequentially read from the upper memory 1404 and the lower memory 1405 at each output read address;
  • the lowest bit of the data output counter is read enable of the upper memory 1404 and the lower memory 1405, and the upper memory 1404 and the lower memory 1405 are enabled in accordance with the data output counter count, and are read from the same output read address.
  • the data of the butterfly operation is completed;
  • the counter enable signal of N cycles can be counted using the counter K[M-1:0]; since the data is stored in the RAM1 and the RAM2, respectively, and the output is output by one channel, then, The same output read address of RAM1 and RAM2 are read, so that each output read address lasts for two cycles, RAM1 and RAM2 are respectively enabled in these two cycles, and the output data is selected and output; wherein, the output is read.
  • the fetch address may be a binary number after shifting the highest bit of the counter K[M-1:0] to the lowest bit after removing the highest bit, and the binary number that completes the shift may be used as the sorted output read address;
  • the processing block diagram of the address controller in the chip is shown in Figure 12.
  • the counter enable signal is counted for 16 cycles using the counter K[3:0]; since the data is stored in RAM1 and RAM2, respectively, the output is output by one channel; thus, the same read address signal is given to RAM1 and RAM2.
  • Read let each read address signal last for two cycles; enable RAM1 and RAM2 respectively in these two cycles, and select output of the output data; wherein, the output read address may be after removing the highest bit
  • the highest value of the counter K[3:0] is shifted to the binary number after the lowest bit, and each binary number that completes the shift can be used as the sorted output read address; the specific read address value is as shown in FIG. 13 Yes, in the two cycles, the data of the same output read address is read in RAM1 and RAM2, and serially outputted until the counter finishes counting, that is, all the butterfly operation results are read;
  • the output controller 1403 can implement read control of the upper memory 1404 and the lower memory 1405 by an address controller in the chip in conjunction with the MUX or the like.
  • the FFT processing of the discrete digital signal sequence is accomplished by a technical solution consisting of a simple dual port RAM.
  • the input controller 1401, the arithmetic unit 1402, and the output controller 1403 can all be implemented by hardware logic, software logic, and the like inside the chip.
  • the embodiment of the present invention further provides a computer storage medium, where the computer storage medium stores computer executable instructions, and the computer executable instructions are used to execute the FFT processing method provided by the foregoing one or more technical solutions, for example, The FFT processing method shown in Figure 1.
  • the computer storage medium may be various types of storage media, and may be various types of storage media such as an optical disk, a mobile hard disk, a flash disk or a magnetic tape, and may be a non-transitory storage medium.
  • a simple upper memory and a lower memory are used instead of a full-function dual-port RAM with a structural load and a computing resource and a large energy consumption, and the FFT operation is performed to reduce The computational resources and energy consumption in the FFT calculation process have played a positive industrial effect.
  • the dual-port RAM can be used to replace the full-featured dual-port RAM in the FFT processing device, which has the characteristics of simple implementation and easy promotion in the industry, so it has the characteristics of strong industrial achievability.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • General Engineering & Computer Science (AREA)
  • Pure & Applied Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Discrete Mathematics (AREA)
  • Complex Calculations (AREA)

Abstract

一种快速傅里叶变换FFT处理方法、装置及计算机存储介质,所述方法包括:将离散数字信号序列中的各离散数字信号,按输入地址规则写入上部存储器和下部存储器中(101);按照运算读取地址规则,依次从上部存储器和下部存储器中分别读取一组离散数字信号进行蝶形运算,将蝶形运算结果按运算写入地址规则写入上部存储器和下部存储器中,直至完成蝶形运算所有级数的运算(102);按照输出地址规则,从上部存储器和下部存储器中读取完成蝶形运算的数据,将所述读取的数据确定为所述离散数字信号序列的快速傅里叶变换结果(103)。

Description

快速傅里叶变换处理方法、装置和计算机存储介质
本申请基于申请号为201710023363.0、申请日为2017年01月12日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本发明涉及数字信号处理技术,尤其涉及一种快速傅里叶变换(FFT,Fast Fourier Transform)处理方法、装置和计算机存储介质。
背景技术
FFT是数字信号处理中重要的单元,广泛应用于现代数字信号处理的各个领域,如有线通信、雷达、卫星通信和多媒体信号处理等。FFT根据实现结构可分为并行实现结构和串行实现结构。在对吞吐量和处理能力需求较高的应用中,通常会选择并行流水结构来实现FFT运算;但是,并行流水结构缺点在于硬件资源开销大。以N点FFT为例来讲,假定数据位宽为W,采用并行流水结构来实现,所需资源至少为:
Figure PCTCN2017099342-appb-000001
个寄存器,
Figure PCTCN2017099342-appb-000002
个复数乘法。所以,在对资源比较敏感的应用中,通常会选用串行实现结构。
串行实现结构的根本思想是对存储资源和蝶形运算单元进行复用,达到资源精简的目的。在现有技术中,要实现一个N点的FFT串行运算,通用做法是:采用两个深度为N的全功能双端口随机存取存储器(RAM,Random Access Memory)作为乒乓缓存进行蝶形运算;然而,在实际的电路设计中,采用全功能双端口RAM会消耗较多的资源,并且全功能双端口RAM的功耗也比较大。
因此,如何在进行FFT处理中节省资源、降低功耗,提升芯片竞争力,是亟待解决的问题。
发明内容
有鉴于此,本发明实施例期望提供一种FFT处理方法和装置,期望至少部分解决FFT中电路资源消耗大和/或电能消耗大的问题。
本发明的技术方案是这样实现的:
本发明实施例提供了一种FFT处理方法,所述方法包括:
将离散数字信号序列中的各离散数字信号,按输入地址规则写入上部存储器和下部存储器中;
按照运算读取地址规则,依次从上部存储器和下部存储器中分别读取一组离散数字信号进行蝶形运算,将蝶形运算结果按运算写入地址规则写入上部存储器和下部存储器中,直至完成蝶形运算所有级数的运算;
按照输出地址规则,从上部存储器和下部存储器中读取完成蝶形运算的数据,将所述读取的数据确定为所述离散数字信号序列的FFT结果。
本发明实施例还提供了一种FFT处理装置,所述装置包括:输入控制器、运算器、输出控制器、上部存储器和下部存储器;其中,
所述输入控制器,配置为将离散数字信号序列中的各离散数字信号,按输入地址规则写入上部存储器和下部存储器中;
所述运算器,配置为按照运算读取地址规则,依次从上部存储器和下部存储器中分别读取一组离散数字信号进行蝶形运算,将蝶形运算结果按运算写入地址规则写入上部存储器和下部存储器中,直至完成蝶形运算所有级数的运算;
所述输出控制器,配置为按照输出地址规则,从上部存储器和下部存储器中读取完成蝶形运算的数据,将所述读取的数据确定为所述离散数字信号序列的FFT结果。
本发明实施例还提供一种计算机存储介质,所述计算机存储介质中存储有计算机可执行指令,所述计算机可执行指令用于执行前述的FFT处理方法。
本发明实施例所提供的FFT处理方法、装置和计算机存储介质,将离散数字信号序列中的各离散数字信号,按输入地址规则写入上部存储器和下部存储器中;按照运算读取地址规则,依次从上部存储器和下部存储器中分别读取一组离散数字信号进行蝶形运算,将蝶形运算结果按运算写入地址规则写入上部存储器和下部存储器中,直至完成蝶形运算所有级数的运算;按照输出地址规则,从上部存储器和下部存储器中读取完成蝶形运算的数据,将所述读取的数据确定为所述离散数字信号序列的FFT结果;所述存储器可以是简单双端口RAM。如此,采用简单双端口RAM实现了FFT,相对于现有的全功能双端口RAM方案,节省芯片资源、降低芯片功耗,提升芯片竞争力。
附图说明
图1为本发明实施例FFT处理方法的流程示意图;
图2为本发明实施例全功能双端口RAM的端口示意图;
图3为本发明实施例采用全功能双端口RAM进行FFT处理的技术方案框图;
图4为本发明实施例简单双端口RAM的端口示意图;
图5为本发明实施例采用简单双端口RAM进行FFT处理的技术方案框图;
图6为本发明实施例16点FFT运算流示意图;
图7为本发明实施例在离散数字信号序列输入过程地址控制器的处理示意图;
图8为本发明实施例在离散数字信号序列输入过程写入地址取值示意图;
图9为本发明实施例蝶形运算过程中地址控制器读写地址控制示意图;
图10为本发明实施例蝶形运算中读写地址取值示意图;
图11为本发明实施例间隔性数据交换示意图;
图12为本发明实施例处理结果输出过程地址控制器的处理示意图;
图13为本发明实施例输出读取地址取值示意图;
图14为本发明实施例FFT处理装置的组成结构示意图。
具体实施方式
本发明实施例中,将离散数字信号序列中的各离散数字信号,按输入地址规则写入上部存储器和下部存储器中;按照运算读取地址规则,依次从上部存储器和下部存储器中分别读取一组离散数字信号进行蝶形运算,将蝶形运算结果按运算写入地址规则写入上部存储器和下部存储器中,直至完成蝶形运算所有级数的运算;按照输出地址规则,从上部存储器和下部存储器中读取完成蝶形运算的数据,将所述读取的数据确定为所述离散数字信号序列的FFT结果。
下面结合实施例对本发明再作进一步详细的说明,应当理解,以下所说明的优选实施例仅用于说明和解释本发明,并不用于限定本发明。。
本发明实施例提供的FFT处理方法,如图1所示,所述方法包括:
步骤101:将离散数字信号序列中的各离散数字信号,按输入地址规则写入上部存储器和下部存储器中;
这里,所述存储器可以是简单双端口RAM,简单双端口RAM相对于全功能双端口RAM消耗较少的资源,并且功耗也比较小;图2为现有FFT处理技术方案采用的全功能双端口RAM的端口图;通常FFT的点数决定了全功能双端口RAM的深度;如图3所示的全功能双端口RAM进行FFT处理的技术方案框图中,在处理点数为N的FFT时,需要两个深度为N的全功能双端口RAM作为乒乓缓存;FFT处理的离散数字信号序列数据输入时,存储于乒缓存中;然后,同时读取全功能双端口RAM中对应的两个数据进行蝶形运算,将运算后得到的两个数据再同时写入乓缓存中,直至第一级所有数据蝶形运算结束;到第二级蝶形运算阶段,则是从乓缓存中读取对应的两个数据进行蝶形运算,将运算后得到的两个数据再同时写入乒缓存中,直至第二级蝶形运算结束;后续蝶形运算以此类推;而相同深度的简单双端口RAM比全功能双端口RAM更加节省电路资源,更有利于节省资源和功耗。
本发明实施例提出的FFT处理方法,是基于简单双端口RAM来实现的,简单双端口RAM的端口图如图4所示;可以采用如图5所示的技术方案框图实现基于简单双端口RAM的FFT处理。可选地,可以使用两个深度为N/2的简单双端口RAM上部存储器(RAM1)和下部存储器(RAM2),替代上述设计中的两个深度为N的全功能双端口RAM,节省了至少一半的存储单元,为对资源敏感的电路设计提供更实用的解决方案。如图5所示技术方案可以实现离散数字信号序列数据的串行输入,运算后串行输出;可以采用芯片内部的硬件逻辑来进行写入和读取地址的控制器;
如图5所示技术方案框图中,在进行离散数字信号序列输入时,可以将位数等于蝶形运算级数的二进制数作为数据输入计数器,并从小到大计数;将所述数据输入计数器的最高位取反后的值和所述数据输入计数器的最高位分别作为上部存储器和下部存储器的写使能;将所述数据输入计数 器去除所述最高位后剩余各位的倒序二进制数作为写入地址;按照所述数据输入计数器计数,依次将所述各离散数字信号按照所述数据输入计数器对应的写入地址写入所述上部存储器和下部存储器;
进行N点FFT运算的数据串行输入时,N通常为2的整数次方;可以使用地址计数器A[M-1:0]对输入数据进行计数,其中M为进行蝶形运算的级数,根据FFT蝶形运算常识可知,
Figure PCTCN2017099342-appb-000003
计数器的最高位取反(!A[M-1])作为上部存储器的写使能,计数器的最高位A[M-1]作为下部存储器的写使能。计数器的其他位进行倒序(A[0],A[1]…A[M-2])则作为两块RAM的写入地址。两个简单双端口RAM深度均为N/2;
以16点FFT处理为例,16点FFT运算的流图如图6所示,由图可知,N=16,蝶形运算级数M为4;在离散数字信号序列输入过程中,芯片中地址控制器的处理框图如图7所示,使用地址计数器A[3:0]对输入数据进行计数。计数器的最高位取反(!A[3])作为简单双端口RAM1的写使能,计数器的最高位A[3]作为简单双端口RAM2的写使能。计数器的其他位进行倒序(A[0],A[1],A[2])则作为两块RAM的写入地址。两个简单双端口RAM深度均为8;具体写入址表如图8所示;计数器A[3:0]每计一次数,可以往计数器对应的写入地址中写入一组离散数字信号,直至计数结束。
实际应用中,可以由芯片中的地址控制器结合数据选择器(MUX)等实现对上部存储器和下部存储器的写入控制。
步骤102:按照运算读取地址规则,依次从上部存储器和下部存储器中分别读取一组离散数字信号进行蝶形运算,将蝶形运算结果按运算写入地址规则写入两个存储器中,直至完成蝶形运算所有级数的运算;
将离散数字信号序列中的各离散数字信号写入上部存储器和下部存储器中后,可以进行蝶形运算;进行可以采用运算读取地址规则读取存储在所述上部存储器和下部存储器中离散数字信号进行蝶形运算,以符合图6 所示的运算流图。可以将位数等于蝶形运算级数减1之差的二进制数作为运算计数器,并从小到大计数;第一级蝶形运算时,各次蝶形运算分别采用所述运算计数器相同二进制数作为读取地址,后续级数的蝶形运算分别以前级同次蝶形运算读取地址的循环左移作为读取地址;按照所述运算计数器的计数,依次分别从上部存储器和下部存储器中,在运算计数器对应的读取地址读取离散数字信号并进行蝶形运算;
可选地,在FFT运算阶段,可以同时各从RAM1和RAM2中根据读地址依次取出数据进行蝶形运算。对图6所示的每级蝶形运算的各次蝶形运算使用次序进行从上至下计数,表示为I[M-2:0];在第一级运算中,该计数器的值亦可看作读取数据的RAM读地址,读出数据依次进行蝶形运算。在第二级运算时,可以将第一级读取地址进行循环左移,读取数据的读地址则为{I[M-3:0],I[M-2]},依次类推,逐级变化;
以16点FFT处理为例,在FFT运算阶段,芯片中地址控制器的处理框图如图9所示,同时从RAM1和RAM2中根据读地址依次取出数据进行蝶形运算。对每级蝶形运算的蝶形运算单元使用次序进行从上至下计数,表示为I[2:0]。在第一级运算中,该计数器的值亦可看作读取数据的RAM读地址,读出数据依次进行蝶形运算。在第二级运算时,读取数据的读地址则为{I[1:0],I[2]}。依次类推,逐级变化;其具体读地址取值如图10所示,如在第一级的第二次运算中,计数器为001,可以同时从RAM1和RAM2的地址001中取出数据进行蝶形运算;在在第二级的第二次运算中,计数器为001,读取地址为第一级的第二次读取地址循环左移一位,即为010,可以同时从RAM1和RAM2的地址010中取出数据进行蝶形运算。
对于蝶形运算的结果,可以按照运算写入地址规则写入上部存储器和下部存储器中,直至完成所有级数的蝶形运算。可以按照蝶形运算数据读取地址以及所述读取地址先后顺序,将各次蝶形运算的结果间隔存入上部 存储器和下部存储器;同次蝶形运算的两组结果依次存入同一存储器;
可选地,设第一级第一次蝶形运算时,分别从RAM1和RAM2中取出进行蝶形运算的两路数据分别为X路和Y路,读取地址为000。在进行蝶形运算后,X路数据和Y路数据同时从蝶形运算单元中输出。此时,将蝶形运算后的Y路数据延后一个周期。然后X,Y两路数据通过数据选择器进行间隔性数据交换,将蝶形运算的两组结果,按照X路数据读取地址依次写入RAM1,即写入RAM1的000和001地址;与第一级第一次蝶形运算类似,第一级第二次的运算结果写入RAM2的000和001地址,后续同理间隔写入,第一级蝶形运算具体间隔性数据交换操作结果如图11所示;以16点FFT处理为例,蝶形运算后的写入地址如图10所示;
如此,同时把Y路数据对应的写地址延后于X路数据对应的写地址一个时钟周期,那么,第一次蝶形运算得出的两个数据均存入到了RAM1中,第二次蝶形运算得出的两个数据均存入到了RAM2中,依此类推。这样处理的目的在于将下一级同时参与蝶形运算的两个数据分别存储在两个不同RAM中,避免存在同一个RAM中,导致下次取数据出现问题。在接下来的M-1级运算中均采用这种处理方式将蝶形运算后的数据回写到RAM1和RAM2;
完成第一级蝶形运算后,按照运算读取地址规则读取第二级用于运算的数据进行蝶形运算,按运算写入地址规则写入上部存储器和下部存储器中,不断重复,直至完成所有级数的蝶形运算。
步骤103:按照输出地址规则,从上部存储器和下部存储器中读取完成蝶形运算的数据,将所述读取的数据确定为所述离散数字信号序列的FFT结果;
这里,可以将位数等于蝶形运算级数的二进制数作为数据输出计数器,并从小到大计数;将去除最低位后的所述数据输出计数器的最高位移到最 低位,并将完成移位的各二进制数作为排序的输出读取地址;按所述输出读取地址排序,依次在各输出读取地址分别从上部存储器和下部存储器读取所述完成蝶形运算的数据;可以将所述数据输出计数器的最低位作为上部存储器和下部存储器的读使能,按照所述数据输出计数器计数,间隔使能所述上部存储器和下部存储器,并从同一输出读取地址读取所述完成蝶形运算的数据
可选地,可以使用计数器K[M-1:0]对持续N个周期的读使能信号进行计数;由于数据分别存储在RAM1和RAM2中,而输出时是由一路输出,于是,可以给RAM1和RAM2相同的输出读取地址进行读取,让每个输出读取地址持续两个周期,在这两个周期内分别使能RAM1和RAM2,并对输出数据进行选择输出;其中,输出读取地址可以是将去除最高位后的将计数器K[M-1:0]的最高位移到最低位后的二进制数,并可以将完成移位的各二进制数作为排序的输出读取地址;
以16点FFT处理为例,芯片中地址控制器的处理框图如图12所示,使用计数器K[3:0]对持续16个周期的读使能信号进行计数;由于数据分别存储在RAM1和RAM2中,而输出时是由一路输出;于是,给RAM1和RAM2相同的读地址信号进行读取,让每个读地址信号持续两个周期;在这两个周期内分别使能RAM1和RAM2,并对输出数据进行选择输出;其中,输出读取地址可以是将去除最高位后的将计数器K[3:0]的最高位移到最低位后的二进制数,并可以将完成移位的各二进制数作为排序的输出读取地址;具体读地址取值如图13所示,由图13可以,两个周期中,分别在RAM1和RAM2中读取了同一输出读取地址的数据,并串行输出,直至计数器完成计数,即读取了所有蝶形运算结果;
实际应用中,可以由芯片中的地址控制器结合MUX等实现对上部存储器和下部存储器的读取控制。
如此,由简单双端口RAM组成的技术方案完成了对所述离散数字信号序列的FFT处理。
本发明实施例提供的FFT处理装置,如图14所示,所述装置包括:所述装置包括:输入控制器1401、运算器1402、输出控制器1403、上部存储器1404和下部存储器1405;其中,
所述输入控制器1401,配置为将离散数字信号序列中的各离散数字信号,按输入地址规则写入上部存储器1404和下部存储器1405中;
这里,所述存储器可以是简单双端口RAM,简单双端口RAM相对于全功能双端口RAM消耗较少的资源,并且功耗也比较小;图2为现有FFT处理技术方案采用的全功能双端口RAM的端口图;通常FFT的点数决定了全功能双端口RAM的深度;如图3所示的全功能双端口RAM进行FFT处理的技术方案框图中,在处理点数为N的FFT时,需要两个深度为N的全功能双端口RAM作为乒乓缓存;FFT处理的离散数字信号序列数据输入时,存储于乒缓存中;然后,同时读取全功能双端口RAM中对应的两个数据进行蝶形运算,将运算后得到的两个数据再同时写入乓缓存中,直至第一级所有数据蝶形运算结束;到第二级蝶形运算阶段,则是从乓缓存中读取对应的两个数据进行蝶形运算,将运算后得到的两个数据再同时写入乒缓存中,直至第二级蝶形运算结束;后续蝶形运算以此类推;而相同深度的简单双端口RAM比全功能双端口RAM更加节省电路资源,更有利于节省资源和功耗。
本发明实施例提出的FFT处理方法,是基于简单双端口RAM来实现的,简单双端口RAM的端口图如图4所示;可以采用如图5所示的技术方案框图实现基于简单双端口RAM的FFT处理。进一步的,可以使用两个深度为N/2的简单双端口RAM上部存储器1404(RAM1)和下部存储器1405(RAM2),替代上述设计中的两个深度为N的全功能双端口RAM, 节省了至少一半的存储单元,为对资源敏感的电路设计提供更实用的解决方案。如图5所示技术方案可以实现离散数字信号序列数据的串行输入,运算后串行输出;可以采用芯片内部的硬件逻辑来进行写入和读取地址的控制器;
如图5所示技术方案框图中,在进行离散数字信号序列输入时,可以将位数等于蝶形运算级数的二进制数作为数据输入计数器,并从小到大计数;将所述数据输入计数器的最高位取反后的值和所述数据输入计数器的最高位分别作为上部存储器1404和下部存储器1405的写使能;将所述数据输入计数器去除所述最高位后剩余各位的倒序二进制数作为写入地址;按照所述数据输入计数器计数,依次将所述各离散数字信号按照所述数据输入计数器对应的写入地址写入所述上部存储器1404和下部存储器1405;
进行N点FFT运算的数据串行输入时,N通常为2的整数次方;可以使用地址计数器A[M-1:0]对输入数据进行计数,其中M为进行蝶形运算的级数,根据FFT蝶形运算常识可知,
Figure PCTCN2017099342-appb-000004
计数器的最高位取反(!A[M-1])作为上部存储器1404的写使能,计数器的最高位A[M-1]作为下部存储器1405的写使能。计数器的其他位进行倒序(A[0],A[1]…A[M-2])则作为两块RAM的写入地址。两个简单双端口RAM深度均为N/2;
以16点FFT处理为例,16点FFT运算的流图如图6所示,由图可知,N=16,蝶形运算级数M为4;在离散数字信号序列输入过程中,芯片中地址控制器的处理框图如图7所示,使用地址计数器A[3:0]对输入数据进行计数。计数器的最高位取反(!A[3])作为简单双端口RAM1的写使能,计数器的最高位A[3]作为简单双端口RAM2的写使能。计数器的其他位进行倒序(A[0],A[1],A[2])则作为两块RAM的写入地址。两个简单双端口RAM深度均为8;具体写入址表如图8所示;计数器A[3:0]每计一次数,可以往计数器对应的写入地址中写入一组离散数字信号,直至计数结束。
实际应用中,所述输入控制器1401可以由芯片中的地址控制器结合MUX等实现对上部存储器1404和下部存储器1405的写入控制。
所述运算器1402,配置为按照运算读取地址规则,依次从上部存储器1404和下部存储器1405中分别读取一组离散数字信号进行蝶形运算,将蝶形运算结果按运算写入地址规则写入上部存储器1404和下部存储器1405中,直至完成蝶形运算所有级数的运算;
将离散数字信号序列中的各离散数字信号写入上部存储器1404和下部存储器1405中后,可以进行蝶形运算;进行可以采用运算读取地址规则读取存储在所述上部存储器1404和下部存储器1405中离散数字信号进行蝶形运算,以符合图6所示的运算流图。可以将位数等于蝶形运算级数减1之差的二进制数作为运算计数器,并从小到大计数;第一级蝶形运算时,各次蝶形运算分别采用所述运算计数器相同二进制数作为读取地址,后续级数的蝶形运算分别以前级同次蝶形运算读取地址的循环左移作为读取地址;按照所述运算计数器的计数,依次分别从上部存储器1404和下部存储器1405中,在运算计数器对应的读取地址读取离散数字信号并进行蝶形运算;
可选地,在FFT运算阶段,可以同时各从RAM1和RAM2中根据读地址依次取出数据进行蝶形运算。对图6所示的每级蝶形运算的各次蝶形运算使用次序进行从上至下计数,表示为I[M-2:0];在第一级运算中,该计数器的值亦可看作读取数据的RAM读地址,读出数据依次进行蝶形运算。在第二级运算时,可以将第一级读取地址进行循环左移,读取数据的读地址则为{I[M-3:0],I[M-2]},依次类推,逐级变化;
以16点FFT处理为例,在FFT运算阶段,芯片中地址控制器的处理框图如图9所示,同时从RAM1和RAM2中根据读地址依次取出数据进行蝶形运算。对每级蝶形运算的蝶形运算单元使用次序进行从上至下计数, 表示为I[2:0]。在第一级运算中,该计数器的值亦可看作读取数据的RAM读地址,读出数据依次进行蝶形运算。在第二级运算时,读取数据的读地址则为{I[1:0],I[2]}。依次类推,逐级变化;其具体读地址取值如图10所示,如在第一级的第二次运算中,计数器为001,可以同时从RAM1和RAM2的地址001中取出数据进行蝶形运算;在在第二级的第二次运算中,计数器为001,读取地址为第一级的第二次读取地址循环左移一位,即为010,可以同时从RAM1和RAM2的地址010中取出数据进行蝶形运算。
对于蝶形运算的结果,可以按照运算写入地址规则写入上部存储器1404和下部存储器1405中,直至完成所有级数的蝶形运算。可以按照蝶形运算数据读取地址以及所述读取地址先后顺序,将各次蝶形运算的结果间隔存入上部存储器1404和下部存储器1405;同次蝶形运算的两组结果依次存入同一存储器;
可选地,设第一级第一次蝶形运算时,分别从RAM1和RAM2中取出进行蝶形运算的两路数据分别为X路和Y路,读取地址为000。在进行蝶形运算后,X路数据和Y路数据同时从蝶形运算单元中输出。此时,将蝶形运算后的Y路数据延后一个周期。然后X,Y两路数据通过数据选择器进行间隔性数据交换,将蝶形运算的两组结果,按照X路数据读取地址依次写入RAM1,即写入RAM1的000和001地址;与第一级第一次蝶形运算类似,第一级第二次的运算结果写入RAM2的000和001地址,后续同理间隔写入,第一级蝶形运算具体间隔性数据交换操作结果如图11所示;以16点FFT处理为例,蝶形运算后的写入地址如图10所示;
如此,同时把Y路数据对应的写地址延后于X路数据对应的写地址一个时钟周期,那么,第一次蝶形运算得出的两个数据均存入到了RAM1中,第二次蝶形运算得出的两个数据均存入到了RAM2中,依此类推。这样处理的目的在于将下一级同时参与蝶形运算的两个数据分别存储在两个不同 RAM中,避免存在同一个RAM中,导致下次取数据出现问题。在接下来的M-1级运算中均采用这种处理方式将蝶形运算后的数据回写到RAM1和RAM2;
完成第一级蝶形运算后,按照运算读取地址规则读取第二级用于运算的数据进行蝶形运算,按运算写入地址规则写入上部存储器1404和下部存储器1405中,不断重复,直至完成所有级数的蝶形运算。
所述输出控制器1403,用于按照输出地址规则,从上部存储器1404和下部存储器1405中读取完成蝶形运算的数据,将所述读取的数据确定为所述离散数字信号序列的FFT结果;
这里,可以将位数等于蝶形运算级数的二进制数作为数据输出计数器,并从小到大计数;将去除最低位后的所述数据输出计数器的最高位移到最低位,并将完成移位的各二进制数作为排序的输出读取地址;按所述输出读取地址排序,依次在各输出读取地址分别从上部存储器1404和下部存储器1405读取所述完成蝶形运算的数据;可以将所述数据输出计数器的最低位作为上部存储器1404和下部存储器1405的读使能,按照所述数据输出计数器计数,间隔使能所述上部存储器1404和下部存储器1405,并从同一输出读取地址读取所述完成蝶形运算的数据;
可选地,可以使用计数器K[M-1:0]对持续N个周期的读使能信号进行计数;由于数据分别存储在RAM1和RAM2中,而输出时是由一路输出,于是,可以给RAM1和RAM2相同的输出读取地址进行读取,让每个输出读取地址持续两个周期,在这两个周期内分别使能RAM1和RAM2,并对输出数据进行选择输出;其中,输出读取地址可以是将去除最高位后的将计数器K[M-1:0]的最高位移到最低位后的二进制数,并可以将完成移位的各二进制数作为排序的输出读取地址;
以16点FFT处理为例,芯片中地址控制器的处理框图如图12所示, 使用计数器K[3:0]对持续16个周期的读使能信号进行计数;由于数据分别存储在RAM1和RAM2中,而输出时是由一路输出;于是,给RAM1和RAM2相同的读地址信号进行读取,让每个读地址信号持续两个周期;在这两个周期内分别使能RAM1和RAM2,并对输出数据进行选择输出;其中,输出读取地址可以是将去除最高位后的将计数器K[3:0]的最高位移到最低位后的二进制数,并可以将完成移位的各二进制数作为排序的输出读取地址;具体读地址取值如图13所示,由图13可以,两个周期中,分别在RAM1和RAM2中读取了同一输出读取地址的数据,并串行输出,直至计数器完成计数,即读取了所有蝶形运算结果;
实际应用中,所述输出控制器1403可以由芯片中的地址控制器结合MUX等实现对上部存储器1404和下部存储器1405的读取控制。
如此,由简单双端口RAM组成的技术方案完成了对所述离散数字信号序列的FFT处理。
在实际应用中,所述输入控制器1401、运算器1402和输出控制器1403均可以由芯片内部的硬件逻辑、软件逻辑等实现。
本发明实施例还提供一种计算机存储介质,所述计算机存储介质中存储有计算机可执行指令,所述计算机可执行指令用于执行前述一个或多个技术方案提供的FFT处理方法,例如,可如图1所示的FFT处理方法。
所述计算机存储介质可各种类型的存储介质,可为光盘、移动硬盘、闪盘或磁带等各种类型的存储介质,可选为非瞬间存储介质。
以上所述,仅为本发明的最佳实施例而已,并非用于限定本发明的保护范围,凡按照本发明原理所作的修改,都应当理解为落入本发明的保护范围。
工业实用性
本发明实施例中提供的FFT处理方法、装置和计算机存储介质中,利用简单的上部存储器及下部存储器,替代结构负载及计算资源及能耗消耗大的全功能双端口RAM,进行FFT运算,降低了FFT计算过程中的计算资源和能耗的消耗,起到了积极的工业效果。且可以利用简单双端口RAM替代FFT处理装置中的全功能双端口RAM,具有实现简单及在工业上推广简便的特点,故具有工业可实现性强的特点。

Claims (15)

  1. 一种快速傅里叶变换FFT处理方法,所述方法包括:
    将离散数字信号序列中的各离散数字信号,按输入地址规则写入上部存储器和下部存储器中;
    按照运算读取地址规则,依次从上部存储器和下部存储器中分别读取一组离散数字信号进行蝶形运算,将蝶形运算结果按运算写入地址规则写入上部存储器和下部存储器中,直至完成蝶形运算所有级数的运算;
    按照输出地址规则,从上部存储器和下部存储器中读取完成蝶形运算的数据,将所述读取的数据确定为所述离散数字信号序列的FFT结果。
  2. 根据权利要求1所述的方法,其中,所述按输入地址规则写入上部存储器和下部存储器中,包括:
    将位数等于蝶形运算级数的二进制数作为数据输入计数器,并从小到大计数;
    将所述数据输入计数器的最高位取反后的值和所述数据输入计数器的最高位分别作为上部存储器和下部存储器的写使能;
    将所述数据输入计数器去除所述最高位后剩余各位的倒序二进制数作为写入地址;
    按照所述数据输入计数器计数,依次将所述各离散数字信号按照所述数据输入计数器对应的写入地址写入所述上部存储器和下部存储器。
  3. 根据权利要求1所述的方法,其中,所述按照运算读取地址规则,依次从上部存储器和下部存储器中分别读取一组离散数字信号进行蝶形运算,包括:
    将位数等于蝶形运算级数减1之差的二进制数作为运算计数器,并从 小到大计数;
    第一级蝶形运算时,各次蝶形运算分别采用所述运算计数器相同二进制数作为读取地址,后续级数的蝶形运算分别以前级同次蝶形运算读取地址的循环左移作为读取地址;
    按照所述运算计数器的计数,依次分别从上部存储器和下部存储器中,在运算计数器对应的读取地址读取离散数字信号并进行蝶形运算。
  4. 根据权利要求3所述的方法,其中,所述将蝶形运算结果按运算写入地址规则写入上部存储器和下部存储器中,包括:
    按照蝶形运算数据读取地址以及所述读取地址先后顺序,将各次蝶形运算的结果间隔存入上部存储器和下部存储器;
    同次蝶形运算的两组结果依次存入同一存储器。
  5. 根据权利要求1所述的方法,其中,所述按照输出地址规则,从上部存储器和下部存储器中读取完成蝶形运算的数据,包括:
    将位数等于蝶形运算级数的二进制数作为数据输出计数器,并从小到大计数;
    将去除最低位后的所述数据输出计数器的最高位移到最低位,并将完成移位的各二进制数作为排序的输出读取地址;
    按所述输出读取地址排序,依次在各输出读取地址分别从上部存储器和下部存储器读取所述完成蝶形运算的数据。
  6. 根据权利要求5所述的方法,其中,所述分别从上部存储器和下部存储器读取所述完成蝶形运算的数据,包括:
    将所述数据输出计数器的最低位作为上部存储器和下部存储器的读使能,按照所述数据输出计数器计数,间隔使能所述上部存储器和下部存储器,并从同一输出读取地址读取所述完成蝶形运算的数据。
  7. 根据权利要求1至6任一项所述的方法,其中,所述存储器为简单双端口随机存取存储器RAM。
  8. 根据权利要求1至6任一项所述的方法,其中,所述存储器深度为所述FFT点数的一半。
  9. 一种FFT处理装置,所述装置包括:输入控制器、运算器、输出控制器、上部存储器和下部存储器;其中,
    所述输入控制器,配置为将离散数字信号序列中的各离散数字信号,按输入地址规则写入上部存储器和下部存储器中;
    所述运算器,配置为按照运算读取地址规则,依次从上部存储器和下部存储器中分别读取一组离散数字信号进行蝶形运算,将蝶形运算结果按运算写入地址规则写入上部存储器和下部存储器中,直至完成蝶形运算所有级数的运算;
    所述输出控制器,配置为按照输出地址规则,从上部存储器和下部存储器中读取完成蝶形运算的数据,将所述读取的数据确定为所述离散数字信号序列的FFT结果。
  10. 根据权利要求9所述的装置,其中,所述输入控制器,配置为:
    将位数等于蝶形运算级数的二进制数作为数据输入计数器,并从小到大计数;
    将所述数据输入计数器的最高位取反后的值和所述数据输入计数器的最高位分别作为上部存储器和下部存储器的写使能;
    将所述数据输入计数器去除所述最高位后剩余各位的倒序二进制数作为写入地址;
    按照所述数据输入计数器计数,依次将所述各离散数字信号按照所述数据输入计数器对应的写入地址写入所述上部存储器和下部存储器。
  11. 根据权利要求9所述的装置,其中,所述运算器,配置为:
    将位数等于蝶形运算级数减1之差的二进制数作为运算计数器,并从小到大计数;
    第一级蝶形运算时,各次蝶形运算分别采用所述运算计数器相同二进制数作为读取地址,后续级数的蝶形运算分别以前级同次蝶形运算读取地址的循环左移作为读取地址;
    按照所述运算计数器的计数,依次分别从上部存储器和下部存储器中,在运算计数器对应的读取地址读取离散数字信号并进行蝶形运算。
  12. 根据权利要求11所述的装置,其中,所述运算器,配置为:
    按照蝶形运算数据读取地址以及所述读取地址先后顺序,将各次蝶形运算的结果间隔存入上部存储器和下部存储器;
    同次蝶形运算的两组结果依次存入同一存储器。
  13. 根据权利要求9所述的装置,其中,所输出控制器,配置为:
    将位数等于蝶形运算级数的二进制数作为数据输出计数器,并从小到大计数;
    将去除最低位后的所述数据输出计数器的最高位移到最低位,并将所述完成移位的各二进制数作为排序的输出读取地址;
    按所述输出读取地址排序,依次在各输出读取地址分别从上部存储器和下部存储器读取所述完成蝶形运算的数据。
  14. 根据权利要求13所述的装置,其中,所述输出控制器,配置为:
    将所述数据输出计数器的最低位作为上部存储器和下部存储器的读使能,按照所述数据输出计数器计数,间隔使能所述上部存储器和下部存储器,并从同一输出读取地址读取所述完成蝶形运算的数据。
  15. 一种计算机存储介质,所述计算机存储介质中存储有计算机可执行指令,所述计算机可执行指令用于执行权利要求1至8任一项提供的快速傅里叶变换FFT处理方法。
PCT/CN2017/099342 2017-01-12 2017-08-28 快速傅里叶变换处理方法、装置和计算机存储介质 WO2018129930A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710023363.0 2017-01-12
CN201710023363.0A CN108304347A (zh) 2017-01-12 2017-01-12 一种快速傅里叶变换处理方法和装置

Publications (1)

Publication Number Publication Date
WO2018129930A1 true WO2018129930A1 (zh) 2018-07-19

Family

ID=62839246

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/099342 WO2018129930A1 (zh) 2017-01-12 2017-08-28 快速傅里叶变换处理方法、装置和计算机存储介质

Country Status (2)

Country Link
CN (1) CN108304347A (zh)
WO (1) WO2018129930A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112487352A (zh) * 2020-12-18 2021-03-12 清华大学 可重构处理器上快速傅里叶变换运算方法及可重构处理器
CN113918875A (zh) * 2021-09-23 2022-01-11 同致电子科技(厦门)有限公司 一种二维fft的快速处理方法
CN114201725A (zh) * 2021-12-14 2022-03-18 电子科技大学 基于多模可重构fft的窄带通信信号处理方法
CN116136602A (zh) * 2023-04-14 2023-05-19 福建福大北斗通信科技有限公司 北斗抗干扰通道带内频谱幅度和时延一致性装置及方法
CN117389946A (zh) * 2023-11-09 2024-01-12 合肥灿芯科技有限公司 一种可动态扩展点数的fft实现结构

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111435383B (zh) * 2020-01-14 2023-06-20 珠海市杰理科技股份有限公司 数据处理方法、数据处理芯片及电子设备
CN111737638A (zh) * 2020-06-11 2020-10-02 Oppo广东移动通信有限公司 基于傅里叶变换的数据处理方法及相关装置
CN118626411A (zh) * 2024-08-13 2024-09-10 北京大有半导体有限责任公司 基于滑动fft的数据读写方法、装置、系统及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020178195A1 (en) * 2001-05-23 2002-11-28 Lg Electronics Inc. Memory address generating apparatus and method
CN101650706A (zh) * 2009-06-30 2010-02-17 重庆重邮信科通信技术有限公司 Fft分支计算方法及装置
CN103176949A (zh) * 2011-12-20 2013-06-26 中国科学院深圳先进技术研究院 实现fft/ifft变换的电路及方法
CN103970718A (zh) * 2014-05-26 2014-08-06 苏州威士达信息科技有限公司 一种快速傅里叶变换实现装置及方法
US9244886B1 (en) * 2008-07-14 2016-01-26 The Mathworks, Inc. Minimum resource fast fourier transform

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100825771B1 (ko) * 2004-02-11 2008-04-28 삼성전자주식회사 메모리를 반감하는 고속 푸리에 변환 프로세서 및 그 방법

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020178195A1 (en) * 2001-05-23 2002-11-28 Lg Electronics Inc. Memory address generating apparatus and method
US9244886B1 (en) * 2008-07-14 2016-01-26 The Mathworks, Inc. Minimum resource fast fourier transform
CN101650706A (zh) * 2009-06-30 2010-02-17 重庆重邮信科通信技术有限公司 Fft分支计算方法及装置
CN103176949A (zh) * 2011-12-20 2013-06-26 中国科学院深圳先进技术研究院 实现fft/ifft变换的电路及方法
CN103970718A (zh) * 2014-05-26 2014-08-06 苏州威士达信息科技有限公司 一种快速傅里叶变换实现装置及方法

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112487352A (zh) * 2020-12-18 2021-03-12 清华大学 可重构处理器上快速傅里叶变换运算方法及可重构处理器
CN112487352B (zh) * 2020-12-18 2022-06-10 清华大学 可重构处理器上快速傅里叶变换运算方法及可重构处理器
CN113918875A (zh) * 2021-09-23 2022-01-11 同致电子科技(厦门)有限公司 一种二维fft的快速处理方法
CN113918875B (zh) * 2021-09-23 2024-05-03 同致电子科技(厦门)有限公司 一种二维fft的快速处理方法
CN114201725A (zh) * 2021-12-14 2022-03-18 电子科技大学 基于多模可重构fft的窄带通信信号处理方法
CN114201725B (zh) * 2021-12-14 2023-04-07 电子科技大学 基于多模可重构fft的窄带通信信号处理方法
CN116136602A (zh) * 2023-04-14 2023-05-19 福建福大北斗通信科技有限公司 北斗抗干扰通道带内频谱幅度和时延一致性装置及方法
CN117389946A (zh) * 2023-11-09 2024-01-12 合肥灿芯科技有限公司 一种可动态扩展点数的fft实现结构
CN117389946B (zh) * 2023-11-09 2024-05-28 合肥灿芯科技有限公司 一种可动态扩展点数的fft实现结构

Also Published As

Publication number Publication date
CN108304347A (zh) 2018-07-20

Similar Documents

Publication Publication Date Title
WO2018129930A1 (zh) 快速傅里叶变换处理方法、装置和计算机存储介质
US9007798B2 (en) Nearest neighbor serial content addressable memory
CN103336758B (zh) 一种采用带有局部信息的压缩稀疏行的稀疏矩阵存储方法及基于该方法的SpMV实现方法
US20180032342A1 (en) Loop vectorization methods and apparatus
CN103226543B (zh) 一种流水线结构的fft处理器
CN114647635B (zh) 数据处理系统
Liu et al. Towards high-bandwidth-utilization spmv on fpgas via partial vector duplication
US9098449B2 (en) FFT accelerator
CN103493039B (zh) 数据处理方法、数据处理装置、接入设备和用户设备
CN113222129A (zh) 一种基于多级缓存循环利用的卷积运算处理单元及系统
US9268744B2 (en) Parallel bit reversal devices and methods
CN108920097B (zh) 一种基于交织存储的三维数据处理方法
CN116954559A (zh) 一种多标量乘法器及加速方法
CN105893326B (zh) 基于fpga实现65536点fft的装置和方法
US11275562B2 (en) Bit string accumulation
CN111580867B (zh) 一种用于fft运算的块浮点处理方法及装置
US9129042B2 (en) Nearest neighbor serial content addressable memory
US20090172062A1 (en) Efficient fixed-point implementation of an fft
Nickel et al. High-performance AKAZE implementation including parametrizable and generic HLS modules
CN114116012B (zh) 基于混洗操作的fft码位反序算法向量化实现方法及装置
CN111368250A (zh) 基于傅里叶变换/逆变换的数据处理系统、方法及设备
KR101985096B1 (ko) 비트슬라이스 연관메모리
CN118839097A (zh) 矩阵转置单元、计算装置、矩阵转置方法及介质
TW202411857A (zh) 用於高效率逐元素聚合、縮放及位移之特殊用途數位運算硬體
US20190042421A1 (en) Memory control apparatus and memory control method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17891980

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17891980

Country of ref document: EP

Kind code of ref document: A1