CN103699517A - 1-D/2-D hybrid architecture FFT (Fast Fourier Transform) processor - Google Patents

1-D/2-D hybrid architecture FFT (Fast Fourier Transform) processor Download PDF

Info

Publication number
CN103699517A
CN103699517A CN201410023273.8A CN201410023273A CN103699517A CN 103699517 A CN103699517 A CN 103699517A CN 201410023273 A CN201410023273 A CN 201410023273A CN 103699517 A CN103699517 A CN 103699517A
Authority
CN
China
Prior art keywords
fft
data
groups
data transmission
row
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410023273.8A
Other languages
Chinese (zh)
Other versions
CN103699517B (en
Inventor
张多利
黄路
杜高明
宋宇鲲
贾靖华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei University of Technology
Original Assignee
Hefei University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei University of Technology filed Critical Hefei University of Technology
Priority to CN201410023273.8A priority Critical patent/CN103699517B/en
Publication of CN103699517A publication Critical patent/CN103699517A/en
Application granted granted Critical
Publication of CN103699517B publication Critical patent/CN103699517B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Complex Calculations (AREA)

Abstract

The invention discloses a 1-D/2-D hybrid architecture FFT processor, which is characterized in that the processor executes FFT or IFFT (Inverse Fast Fourier Transform) operation of any 2n point 32-bit single-precision floating point number from 32 point to 8K point through a one-dimensional FFT operation mode, executes FFT or IFFT operation of 16K point 32-bit single-precision floating point number through a two-dimensional operation mode, and realizes FFT/IFFT operation of any 2n point (n = 5, 6... 14) single-precision floating point number. The FFT processor provided by the invention carries out FFT or IFFT operation on data through a one-dimensional operation and two-dimensional operation hybrid mode, comprehensively considers FFT operation speed, storage resource consumption and flexibility, integrates the advantages of fast operation speed of the one-dimensional FFT processor and less storage resource of the two-dimensional FFT processor, and realizes balance between speed and resource.

Description

A kind of 1-D/2-D mixed architecture fft processor
Technical field
The invention discloses a kind of one dimension/two dimension (1-D/2-D) mixed architecture fft processor, belong to digital processing field.
Background technology
At present, fast fourier transform (FFT) has a wide range of applications in fields such as digital communication, image processing, speech recognition, radar processing, hardware is realized FFT the incomparable speed advantage of software, and its implementation method based on field programmable gate array (FPGA) has important researching value.Conventionally adopting base-2FFT algorithm to carry out treated length is 2 nthe FFT computing of signal.The Fast Fourier Transform (FFT) of N point base-2, because the singularity of its purposes and self widely requires its hard-wired fast operation, and will take into account hardware area, must have good equilibrium between the two.
In prior art, 1-D fft processor and 2-D ultra long FFT processor all can realize any 2 npoint (n=5,6 ... 14) single precision floating datum FFT/IFFT computing, but their each defectiveness in specific implementation, existing discussion respectively:
1,1-D fft processor
1-D fft processor as shown in Figure 1, data-carrier store on inner integrated FFT controller, storage control unit, FFT arithmetic element and 2 groups of 4*4K sheets.The inner integrated read/write address generation unit of FFT arithmetic element, 2 butterfly processing elements, twiddle factor maker.For N point FFT, computing needs 2N data storage cell RAM, and storage resources is too large.
1-D fft processor adopts base-2 algorithm and fixing addressing structure, each clock period 2 butterfly computations that can walk abreast, and each clock period is read 4 operands and two coefficients from on-chip memory is parallel, produces 4 results and writes on-chip memory.
2,2-D fft processor
2-D fft processor as shown in Figure 2, data-carrier store on inner integrated manipulator, storage control unit, FFT arithmetic element, data transmission unit, 3 groups of 4*256 temporary storages (Memory0/1/2_x) and a 16K sheet.The inner integrated read/write address generation unit of FFT arithmetic element, 2 butterfly processing elements, twiddle factor maker.2-D fft processor increases transport module and carries out the transmission of one dimension raw column data.The maximum number of points that FFT processes is 16K, needs on 16K sheet data-carrier store as Two-dimensional FFT buffer memory.
The data storage resource of 1-D fft processor is very large, and for N point FFT, computing only needs N data storage cell RAM to 2-D ultra long FFT processor, greatly reduces internal storage resources.Due to the stack of a plurality of data transmission time delays, the time delay of butterfly unit flowing water and Memory accessing delay, the 2-D ultra long FFT processor making increases the FFT computing clock period greatly.
Whole the above one dimension (1-D) fft processor, arithmetic speed is very fast, but data storage resource is larger.Two dimension (2-D) fft processor can be realized the computing of one dimension ultra long FFT, greatly reduce internal storage resources, but arithmetic speed is slower, and only limits to ultra long FFT computing.
Summary of the invention
The present invention, for avoiding the existing weak point of above-mentioned prior art, considers FFT arithmetic speed, storage resource consumption and dirigibility, provides a kind of 1-D/2-D mixed architecture fft processor, to reaching the equilibrium between speed and resource.
Technical solution problem of the present invention, adopts following technical scheme:
1-D/2-D mixed architecture fft processor of the present invention, its feature is: described processor comprises controller, storage system, data switching networks, data transmission unit and FFT arithmetic element;
Described controller is for control data transmission unit and FFT arithmetic element;
Described storage system comprises data-carrier store (Memory0/1_x) and 3 groups of interim reservoirs of 4*32 (Memory2/3/4_x) on 2 groups of 4*2K sheets, and on 2 groups of 4*2K sheets, data-carrier store and 3 groups of interim reservoirs of 4*32 adopt simple dual port RAM; According to 16K=128*128, it is 128 points that Two-dimensional FFT tupe needs maximum one dimension FFT to be processed, and in order to save storage resources, fft processor adopts the temporary storage ping-pong operation of 3 groups of 4*32 to complete ranks one dimension FFT computing.
Described data switching networks comprises interior storage control unit, external memory control module and MUX;
Described interior storage control unit is for carrying out the requirement that address assignment and management read and store to meet FFT/IFFT computation process parallel data flexibly to data-carrier store on 2 groups of 4*2K sheets and 3 groups of interim reservoirs of 4*32, avoid the generation of address conflict, and for FFT arithmetic element and data transmission unit, access 5 groups of data-carrier stores unified interface is provided, make each FPDP (comprise operand port, transmission FPDP, operation result port) can carry out memory access to any storage unit of each piece on-chip memory, and without being concerned about which piece on-chip memory is this storage unit belong to,
Described external memory control module is accessed data-carrier store on 2 groups of 4*2K sheets for data transmission unit unified interface is provided;
Described MUX is used for selecting interior storage control unit or the external memory control module data-carrier store on 2 groups of 4*2K sheets that conducts interviews;
Described data transmission unit has been used for the data transmission between data-carrier store and 3 groups of interim reservoirs of 4*32 on 2 groups of 4*2K sheets, realize real-time row or column data carrying, for FFT arithmetic element provides the row or column FFT arithmetic operation that will carry out number, and FFT arithmetic element result of calculation is passed back on 2 groups of 4*2K sheets in data-carrier store;
Described FFT arithmetic element is for realizing FFT or the IFFT computing of data.Described FFT arithmetic element comprises inner integrated read/write address generation unit, 2 butterfly processing elements and twiddle factor maker.
1-D/2-D mixed architecture fft processor of the present invention, its feature is also: described processor adopting base-2 algorithm and fixedly addressing structure; Described processor takes two kinds of methods to carry out the time loss on hiding data path, adopts the pre-read operation of simple dual port RAM to hide the time delay of butterfly unit flowing water and Memory accessing delay, adopts 3 groups of RAM ping-pong operations to carry out the time loss of hiding data carrying; In addition, it also adopts twiddle factor compression algorithm compression outer buttons transposon ROM storage resources.
The integrated peacekeeping two dimension of described processor is mixed FFT tupe, carry out 32 to 8K, order any 2 nwhen the point FFT of 32 single precision floating datums or IFFT computing, adopt one dimension FFT computing, when carrying out the FFT of 32 single precision floating datums of 16K point or IFFT computing, adopt two-dimensional FFT operation.
Described processor by following one dimension FFT operational pattern carry out 32 to 8K order any 2 nfFT or the IFFT computing of 32 single precision floating datums of point: on 2 groups of 4*2K sheets, data-carrier store carries out ping-pong operation, by controller, be configured to successively operand store group and operation result memory set, FFT arithmetic element is counted memory set and operation result memory set by interior storage control unit and MUX while accessing operation, each clock period reads 4 operands and carries out FFT or IFFT computing from operand store group is parallel, produces 4 intermediate operations results simultaneously and writes operation result memory set; The complete one-level of FFT computing, on 2 groups of 4*2K sheets, data-carrier store carries out ping-pong operation one time, and operand store group and operation result memory set are switched mutually; Through n level ping-pong operation, obtain net result data;
Described processor carries out FFT or the IFFT computing of 32 single precision floating datums of 16K point by following two-dimensional FFT operation pattern: on two groups of 4*2K sheets, data-carrier store, as 16K buffer memory, forms the matrix of 128*128; 3 groups of interim reservoirs of 4*32 carry out inside and outside ping-pong operation, by controller, are configured to successively operand store group, operation result memory set and data transmission memory set;
Described data transmission unit is by external memory control module and MUX, parallel 2 raw data of 16K buffer memory matrix the first column or row that read of each clock period, by interior storage control unit data writing transmission memory group, complete 128 data transmission of the first column or row again;
After first row or row data are transmitted, operand store group, operation result memory set and data transmission memory set are carried out once outer ping-pong operation, data transmission memory set is become to operand store group, operand store group is become to operation result memory set, operation result memory set is become to data transmission memory set; Data transmission unit reads the second column or row raw data, and data writing transmission memory group, in data transmission, each clock period of FFT arithmetic element is read 4 operands from operand store group is parallel, produces 4 result datas and writes into operation result memory set; Operand store group and operation result memory set are carried out interior ping-pong operation and are completed the first column or row FFT or IFFT computing;
When previous column or row FFT or IFFT computing complete, 3 groups of interim reservoirs of 4*32 carry out once outer ping-pong operation again, mutually switch; Described data transmission unit was walked abreast and is read 2 previous columns or row FFT operation result by each clock period of external memory control module, then write the corresponding column or row of 16K buffer memory matrix by external memory control module and MUX; Then described data transmission unit is by external memory control module and MUX, and each clock period walks abreast and reads 2 raw data of next column or row, then by interior storage control unit data writing transmission memory group; In data transmission, FFT arithmetic element is carried out interior ping-pong operation to operand store group and operation result memory set and is completed these column or row FFT or IFFT computing; In next moment, operand store group, operation result memory set and data transmission memory set are carried out once outer ping-pong operation, until complete all one dimension column or row FFT or IFFT computing;
Controller coordinate data transmission unit and FFT arithmetic element are worked in order, complete successively FFT or the IFFT computing of all groups of column or row, finally complete FFT or the IFFT computing of 32 single precision floating datums of 16K point.
Described one dimension FFT operational pattern must meet formula 1:
M d+C d<n/8 (1)
Described two-dimensional FFT operation pattern must meet formula 2:
M d+C d<min{L,C}/8 (2)
In formula: M dfor Memory accessing delay; C dfor the time delay of butterfly unit flowing water; N is counting of FFT, and L is line number, and C is columns.
Compared with the prior art, beneficial effect of the present invention is embodied in:
1, the pattern that fft processor of the present invention adopts one dimension two dimension to mix is carried out FFT or IFFT computing to data, FFT arithmetic speed, storage resource consumption and dirigibility have been considered, integrated one dimension fft processor fast operation and the few advantage of Two-dimensional FFT processor storage resources, reached the equilibrium between speed and resource;
2, processor of the present invention takes two kinds of methods to carry out the time loss on hiding data path, adopt the pre-read operation of simple dual port RAM to hide the time delay of butterfly unit flowing water and Memory accessing delay, adopt 3 groups of interim reservoirs of 4*32 to carry out the time loss that ping-pong operation carrys out hiding data carrying, the more effective speed that improved; In addition, it also adopts twiddle factor compression algorithm compression outer buttons transposon ROM storage resources;
3, processor adopting one dimension FFT operational pattern of the present invention carry out 32 to 8K order any 2 nfFT or the IFFT computing of 32 single precision floating datums of point, adopt two-dimensional FFT operation pattern to carry out the FFT of 32 single precision floating datums of 16K point or the fixed configurations pattern of IFFT computing, when guaranteeing higher arithmetic speed, realizes minimizing of hardware resource;
4, the FFT arithmetic element of processor of the present invention adopts 2 butterfly unit parallel work-flows, complete 32 to 16K, order any 2 nfFT or the IFFT computing of 32 single precision floating datums of point, effectively promoted arithmetic speed.
Accompanying drawing explanation
Fig. 1 is existing one dimension FFT structural representation;
Fig. 2 is existing Two-dimensional FFT structural representation;
Fig. 3 is the structural representation of 1-D/2-D mixed architecture fft processor of the present invention;
Fig. 4 is the one dimension FFT operational pattern process flow diagram of 1-D/2-D mixed architecture fft processor of the present invention;
Fig. 5 is the two-dimensional FFT operation model process figure of 1-D/2-D mixed architecture fft processor of the present invention;
Fig. 6 is pre-read operation streamline schematic diagram;
Fig. 7 is FFT computing topology diagram;
Fig. 8 is the relative speed-up ratio linearity curve of FFT;
The relative speed-up ratio linearity curve of Tu9Wei FFT unit storage resources.
Specific embodiment
As shown in Figure 3, the present embodiment 1-D/2-D mixed architecture fft processor comprises controller, storage system, data switching networks, data transmission unit and FFT arithmetic element;
Controller is for control data transmission unit and FFT arithmetic element;
Storage system comprises data-carrier store (Memory0/1_x) and 3 groups of interim reservoirs of 4*32 (Memory2/3/4_x) on 2 groups of 4*2K sheets, and on 2 groups of 4*2K sheets, data-carrier store and 3 groups of interim reservoirs of 4*32 adopt simple dual port RAM; According to 16K=128*128, it is 128 points that Two-dimensional FFT tupe needs maximum one dimension FFT to be processed, and in order to save storage resources, fft processor adopts the temporary storage ping-pong operation of 3 groups of 4*32 to complete ranks one dimension FFT computing.
Data switching networks comprises interior storage control unit, external memory control module and MUX;
Interior storage control unit is for carrying out the requirement that address assignment and management read and store to meet FFT/IFFT computation process parallel data flexibly to data-carrier store on 2 groups of 4*2K sheets and 3 groups of interim reservoirs of 4*32, avoid the generation of address conflict, and for FFT arithmetic element and data transmission unit, access 5 groups of data-carrier stores unified interface is provided, make each FPDP (comprise operand port, transmission FPDP, operation result port) can carry out memory access to any storage unit of each piece on-chip memory, and without being concerned about which piece on-chip memory is this storage unit belong to,
External memory control module is accessed data-carrier store on 2 groups of 4*2K sheets for data transmission unit unified interface is provided;
MUX is used for selecting interior storage control unit or the external memory control module data-carrier store on 2 groups of 4*2K sheets that conducts interviews;
Data transmission unit has been used for the data transmission between data-carrier store and 3 groups of interim reservoirs of 4*32 on 2 groups of 4*2K sheets, realize real-time row or column data carrying, for FFT arithmetic element provides the row or column FFT arithmetic operation that will carry out number, and FFT arithmetic element result of calculation is passed back on 2 groups of 4*2K sheets in data-carrier store;
FFT arithmetic element is for realizing FFT or the IFFT computing of data.FFT arithmetic element comprises inner integrated read/write address generation unit, 2 butterfly processing elements and twiddle factor maker.
Processor adopting base-2 algorithm and fixedly addressing structure; Processor takes two kinds of methods to carry out the time loss on hiding data path, adopts the pre-read operation of simple dual port RAM to hide the time delay of butterfly unit flowing water and Memory accessing delay, adopts 3 groups of RAM ping-pong operations to carry out the time loss of hiding data carrying; In addition, it also adopts twiddle factor compression algorithm compression outer buttons transposon ROM storage resources.
The integrated peacekeeping two dimension of processor is mixed FFT tupe, carry out 32 to 8K, order any 2 nwhen the point FFT of 32 single precision floating datums or IFFT computing, adopt one dimension FFT computing, when carrying out the FFT of 32 single precision floating datums of 16K point or IFFT computing, adopt two-dimensional FFT operation.
As shown in Figure 4, processor by following one dimension FFT operational pattern carry out 32 to 8K order any 2 nfFT or the IFFT computing of 32 single precision floating datums of point: on 2 groups of 4*2K sheets, data-carrier store carries out ping-pong operation, by controller, be configured to successively operand store group and operation result memory set, FFT arithmetic element is counted memory set and operation result memory set by interior storage control unit and MUX while accessing operation, each clock period reads 4 operands and carries out FFT or IFFT computing from operand store group is parallel, produces 4 intermediate operations results simultaneously and writes operation result memory set; The complete one-level of FFT computing, on 2 groups of 4*2K sheets, data-carrier store carries out ping-pong operation one time, and operand store group and operation result memory set are switched mutually; Through n level ping-pong operation, obtain net result data;
As shown in Figure 5, processor carries out FFT or the IFFT computing of 32 single precision floating datums of 16K point by following two-dimensional FFT operation pattern: on two groups of 4*2K sheets, data-carrier store, as 16K buffer memory, forms the matrix of 128*128; 3 groups of interim reservoirs of 4*32 carry out inside and outside ping-pong operation, by controller, are configured to successively operand store group, operation result memory set and data transmission memory set;
Data transmission unit is by external memory control module and MUX, parallel 2 raw data of 16K buffer memory matrix the first column or row that read of each clock period, by interior storage control unit data writing transmission memory group, complete 128 data transmission of the first column or row again;
After first row or row data are transmitted, operand store group, operation result memory set and data transmission memory set are carried out once outer ping-pong operation, data transmission memory set is become to operand store group, operand store group is become to operation result memory set, operation result memory set is become to data transmission memory set; Data transmission unit reads the second column or row raw data, and data writing transmission memory group, in data transmission, each clock period of FFT arithmetic element is read 4 operands from operand store group is parallel, produces 4 result datas and writes into operation result memory set; Operand store group and operation result memory set are carried out interior ping-pong operation and are completed the first column or row FFT or IFFT computing;
When previous column or row FFT or IFFT computing complete, 3 groups of interim reservoirs of 4*32 carry out once outer ping-pong operation again, mutually switch; Data transmission unit was walked abreast and is read 2 previous columns or row FFT operation result by each clock period of external memory control module, then write the corresponding column or row of 16K buffer memory matrix by external memory control module and MUX; Then data transmission unit is by external memory control module and MUX, and each clock period walks abreast and reads 2 raw data of next column or row, then by interior storage control unit data writing transmission memory group; In data transmission, FFT arithmetic element is carried out interior ping-pong operation to operand store group and operation result memory set and is completed these column or row FFT or IFFT computing; In next moment, operand store group, operation result memory set and data transmission memory set are carried out once outer ping-pong operation, until complete all one dimension column or row FFT or IFFT computing;
Controller coordinate data transmission unit and FFT arithmetic element are worked in order, complete successively FFT or the IFFT computing of all groups of column or row, finally complete FFT or the IFFT computing of 32 single precision floating datums of 16K point.
For N point FFT, computing need to move log to FFT arithmetic element 2n level, every one-level has butterfly unit flowing water time delay (C d) and Memory accessing delay (M d). so, one dimension FFT operational pattern calculates the total delay cycle of N point FFT and is:
Total_delay=log 2N(M d+C d) (3)
The total delay cycle of two-dimensional FFT operation mode computation N point FFT is:
Total_delay=(Clog 2L+Llog 2C)(M d+C d) (4)
Wherein: M dfor Memory accessing delay; C dfor the time delay of butterfly unit flowing water.We take pre-read operation to reduce the extra clock consumption that these bring due to time delay.
As shown in Figure 6, this fft processor adopts the pre-read operation of simple dual-port to hide the delay of butterfly unit flowing water and memory access latency.FFT reads one-level operand complete after, also need to wait for (M d+ C d) clock period could all write into destination memory by operation result at the corresponding levels.Pre-read operation is (M in advance d+ C d) clock period reads next stage operand, from another port reads fetch operand of destination memory at the corresponding levels, sends into butterfly unit.Must guarantee that the next stage operand reading write into destination memory, otherwise FFT operation result is made mistakes.
Fft processor adopts base-2DIT-FFT algorithm, fixing addressing framework, and this operating structure is inverted order input, Sequential output, topological structure is as shown in Figure 7.
Original operational data deposits respectively one group of 4 Memory in storage system according to certain rule.During butterfly computation, read respectively an operand from each Memory, send into arithmetic element, 4 number of results of generation are sent into respectively in 4 Memory of another group.The FFT of ordering for a N or IFFT, address addressing mode is as shown in the table:
The addressing of table 1. ensuring escapement from confliction location
Figure BDA0000458458540000071
According to FFT computing topological structure and ensuring escapement from confliction location addressing mode, know that reading with storing process of FFT operational data is as shown in the table:
Table 2. data read address sequence
Figure BDA0000458458540000072
Table 3. address data memory sequence
Figure BDA0000458458540000073
According to operational data read and store rule, when operation result at the corresponding levels is write into destination memory completely, from the pre-read operands of next stage, be less than N/2 and just can guarantee that the next stage operand reading write into storer.
Described one dimension FFT operational pattern must meet formula 1:
M d+C d<n/8 (1)
Described two-dimensional FFT operation pattern must meet formula 2:
M d+C d<min{L,C}/8 (2)
M in the design d=3; C d=12, by formula, obtain n>120 or min{L, C}>120.One dimension FFT operational pattern is processed and is less than 128 FFT, does not meet formula (1); Two-dimensional FFT operation mode treatment is less than the FFT that 16K is ordered, and does not meet formula (2), can not adopt pre-read operation to hide butterfly unit flowing water and postpone, and causes arithmetic speed significantly to increase.
Traditional 1-D fft processor has fast processing speed, but data storage resource is larger.2-D fft processor is because consume extra time, and arithmetic speed is slower, but data storage resource greatly reduces.
Comprehensive the two advantage of 1-D/2-D mixed architecture fft processor of the present invention, has excellent processing speed and less storage resources.In detail relatively in Table 4.1-D FFT operational pattern is processed 128 to 8K point FFT and is adopted pre-read operation to hide flowing water time delay and the Memory accessing delay of butterfly unit, the computing clock period increases by 16 clock period than theoretical value (T-V), but be less than the FFT computing of 128, adopt pre-read operation cannot hide flowing water time delay and the Memory accessing delay of butterfly unit, so and 64 FFT increase to some extent than theoretical value operation time at 32.2-D FFT operational pattern carries out the FFT below 16K, adopts pre-read operation cannot hide flowing water time delay and the Memory accessing delay of butterfly unit, therefore compare with theoretical value operation time, significantly increases.
Table 4FFT computing clock period contrast table
According to the ratio of each point FFT computing clock period and respective point theoretical value, we can draw the relative speed-up ratio linearity curve of FFT, as shown in Figure 8.One dimension fft processor and mixed architecture fft processor have desirable processing speed, and their relative speed-up ratio linearity curve almost overlaps.When carrying out 128 when the 16K point FFT computing, the relative speed-up ratio coefficient of the two is all close to 1.In Two-dimensional FFT processor, treatable one dimension FFT maximum number of points is 1K.If FFT counts, be less than 2K, N is configured to the matrix form of 1 row N row, and available two dimensional form is carried out many group one dimension FFT computing simultaneously, otherwise N will be configured to the matrix form that L is capable and C is listed as.Cause thus Two-dimensional FFT processor relative acceleration at 2K point, to occur a turning point than linearity curve.Two-dimensional FFT processor calculating speed is slower, but its arithmetic speed approaches theoretical velocity when counting greatly 16K point FFT computing.Three groups of relative speed-up ratio linearity curves intersect at 16K point.
Above-mentioned three fft processor is all realized on Virtex-6XC6VLX760FPGA, and table 5 provides detailed comparison, can find out that mixing fft processor reaches at a high speed the balance with resource.
Table 5FFT comprehensive resources and maximum frequency contrast table
Relative acceleration by fft processor is than coefficient divided by the analog value of block RAM and be normalized, and we can draw the relative speed-up ratio linearity curve of FFT unit's storage resources, as shown in Figure 9.We can see, compare with Two-dimensional FFT processor with one dimension fft processor, and the fft processor of mixed architecture has the relative speed-up ratio coefficient of higher unit storage resources.But at particular point 16K point place, the relative speed-up ratio coefficient ratio of the unit storage resources mixed architecture fft processor of 2-D fft processor large, and close to 1.Generally speaking, when carrying out any 2 npoint (n=5,6 ... 14), when single precision floating datum FFT or IFFT computing, the fft processor of mixed architecture has higher performance resource ratio.

Claims (6)

1. a 1-D/2-D mixed architecture fft processor, is characterized in that: described processor comprises controller, storage system, data switching networks, data transmission unit and FFT arithmetic element;
Described controller is for control data transmission unit and FFT arithmetic element;
Described storage system comprises data-carrier store and 3 groups of interim reservoirs of 4*32 on 2 groups of 4*2K sheets, and on 2 groups of 4*2K sheets, data-carrier store and 3 groups of interim reservoirs of 4*32 adopt simple dual port RAM; Described data switching networks comprises interior storage control unit, external memory control module and MUX;
Described interior storage control unit is for carrying out address assignment and management to data-carrier store on 2 groups of 4*2K sheets and 3 groups of interim reservoirs of 4*32, and for FFT arithmetic element and data transmission unit, accesses 5 groups of data-carrier stores unified interface is provided;
Described external memory control module is accessed data-carrier store on 2 groups of 4*2K sheets for data transmission unit unified interface is provided;
Described MUX is used for selecting interior storage control unit or the external memory control module data-carrier store on 2 groups of 4*2K sheets that conducts interviews;
Described data transmission unit has been used for the data transmission between data-carrier store and 3 groups of interim reservoirs of 4*32 on 2 groups of 4*2K sheets, realize real-time row or column data carrying, for FFT arithmetic element provides the row or column FFT arithmetic operation that will carry out number, and FFT arithmetic element result of calculation is passed back on 2 groups of 4*2K sheets in data-carrier store;
Described FFT arithmetic element is for realizing FFT or the IFFT computing of data.
2. 1-D/2-D mixed architecture fft processor according to claim 1, is characterized in that: described processor adopting base-2 algorithm and fixedly addressing structure; The time delay of butterfly unit flowing water and Memory accessing delay are hidden in the pre-read operation of the simple dual port RAM of described processor adopting, adopt 3 groups of interim reservoirs of 4*32 to carry out the time loss that ping-pong operation carrys out hiding data carrying; Described processor adopting twiddle factor compression algorithm compression outer buttons transposon ROM storage resources.
3. 1-D/2-D mixed architecture fft processor according to claim 1, is characterized in that: the integrated peacekeeping two dimension of described processor is mixed FFT tupe, carry out 32 to 8K, order any 2 nwhen the point FFT of 32 single precision floating datums or IFFT computing, adopt one dimension FFT operational pattern, when carrying out the FFT of 32 single precision floating datums of 16K point or IFFT computing, adopt two-dimensional FFT operation pattern.
4. 1-D/2-D mixed architecture fft processor according to claim 1, is characterized in that:
Described processor by following one dimension FFT operational pattern carry out 32 to 8K order any 2 nfFT or the IFFT computing of 32 single precision floating datums of point: on 2 groups of 4*2K sheets, data-carrier store carries out ping-pong operation, by controller, be configured to successively operand store group and operation result memory set, FFT arithmetic element is counted memory set and operation result memory set by interior storage control unit and MUX while accessing operation, each clock period reads 4 operands and carries out FFT or IFFT computing from operand store group is parallel, produces 4 intermediate operations results simultaneously and writes operation result memory set; The complete one-level of FFT computing, on 2 groups of 4*2K sheets, data-carrier store carries out ping-pong operation one time, and operand store group and operation result memory set are switched mutually; Through n level ping-pong operation, obtain net result data;
Described processor carries out FFT or the IFFT computing of 32 single precision floating datums of 16K point by following two-dimensional FFT operation pattern: on two groups of 4*2K sheets, data-carrier store, as 16K buffer memory, forms the matrix of 128*128; 3 groups of interim reservoirs of 4*32 carry out inside and outside ping-pong operation, by controller, are configured to successively operand store group, operation result memory set and data transmission memory set;
Described data transmission unit is by external memory control module and MUX, parallel 2 raw data of 16K buffer memory matrix the first column or row that read of each clock period, by interior storage control unit data writing transmission memory group, complete 128 data transmission of the first column or row again;
After first row or row data are transmitted, operand store group, operation result memory set and data transmission memory set are carried out once outer ping-pong operation, data transmission memory set is become to operand store group, operand store group is become to operation result memory set, operation result memory set is become to data transmission memory set; Data transmission unit reads the second column or row raw data, and data writing transmission memory group, in data transmission, each clock period of FFT arithmetic element is read 4 operands from operand store group is parallel, produces 4 result datas and writes into operation result memory set; Operand store group and operation result memory set are carried out interior ping-pong operation and are completed the first column or row FFT or IFFT computing;
When previous column or row FFT or IFFT computing complete, 3 groups of interim reservoirs of 4*32 carry out once outer ping-pong operation again, mutually switch; Described data transmission unit was walked abreast and is read 2 previous columns or row FFT operation result by each clock period of external memory control module, then write the corresponding column or row of 16K buffer memory matrix by external memory control module and MUX; Then described data transmission unit is by external memory control module and MUX, and each clock period walks abreast and reads 2 raw data of next column or row, then by interior storage control unit data writing transmission memory group; In data transmission, FFT arithmetic element is carried out interior ping-pong operation to operand store group and operation result memory set and is completed these column or row FFT or IFFT computing; In next moment, operand store group, operation result memory set and data transmission memory set are carried out once outer ping-pong operation, until complete all one dimension column or row FFT or IFFT computing;
Controller coordinate data transmission unit and FFT arithmetic element are worked in order, complete successively FFT or the IFFT computing of all groups of column or row, finally complete FFT or the IFFT computing of 32 single precision floating datums of 16K point.
5. according to the 1-D/2-D mixed architecture fft processor described in claim 3 or 4, it is characterized in that:
Described one dimension FFT operational pattern must meet formula 1:
M d+C d<n/8 (1)
Described two-dimensional FFT operation pattern must meet formula 2:
M d+C d<min{L,C}/8 (2)
In formula: M dfor Memory accessing delay; C dfor the time delay of butterfly unit flowing water; N is counting of FFT, and L is line number, and C is columns.
6. 1-D/2-D mixed architecture fft processor according to claim 1, is characterized in that: described FFT arithmetic element comprises inner integrated read/write address generation unit, 2 butterfly processing elements and twiddle factor maker.
CN201410023273.8A 2014-01-17 2014-01-17 A kind of 1-D/2-D mixed architecture fft processor Active CN103699517B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410023273.8A CN103699517B (en) 2014-01-17 2014-01-17 A kind of 1-D/2-D mixed architecture fft processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410023273.8A CN103699517B (en) 2014-01-17 2014-01-17 A kind of 1-D/2-D mixed architecture fft processor

Publications (2)

Publication Number Publication Date
CN103699517A true CN103699517A (en) 2014-04-02
CN103699517B CN103699517B (en) 2016-06-29

Family

ID=50361049

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410023273.8A Active CN103699517B (en) 2014-01-17 2014-01-17 A kind of 1-D/2-D mixed architecture fft processor

Country Status (1)

Country Link
CN (1) CN103699517B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103902505A (en) * 2014-04-12 2014-07-02 复旦大学 Configurable FFT processor circuit structure based on switching network
CN105630738A (en) * 2016-01-11 2016-06-01 北京北方烽火科技有限公司 FFT (Fast Fourier Transform)/IFFT (Inverse Fast Fourier Transform) device based on LTE (Long Term Evolution) system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040143616A1 (en) * 2002-12-27 2004-07-22 Lg Electronics Inc. Fast fourier transform processor
WO2008132510A2 (en) * 2007-04-27 2008-11-06 University Of Bradford Fft processor
CN101504638A (en) * 2009-03-19 2009-08-12 北京理工大学 Point-variable assembly line FFT processor
CN103034621A (en) * 2012-12-13 2013-04-10 合肥工业大学 Address mapping method and system of radix-2*K parallel FFT (fast Fourier transform) architecture

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040143616A1 (en) * 2002-12-27 2004-07-22 Lg Electronics Inc. Fast fourier transform processor
WO2008132510A2 (en) * 2007-04-27 2008-11-06 University Of Bradford Fft processor
CN101504638A (en) * 2009-03-19 2009-08-12 北京理工大学 Point-variable assembly line FFT processor
CN103034621A (en) * 2012-12-13 2013-04-10 合肥工业大学 Address mapping method and system of radix-2*K parallel FFT (fast Fourier transform) architecture

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
DUO-LI ZHANG 等: "Design and implementation of a large points FFT acceleration unit in multi-processor system based on FPGA", 《INTERNATIONAL SYMPOSIUM ON COMPUTER, COMMUNICATION, CONTROL AND AUTOMATION(ISCCCA-2013)》, 30 August 2013 (2013-08-30), pages 0830 - 0833 *
李小进 等: "高速基2FFT处理器的结构设计与FPGA实现", 《电路与系统学报》, vol. 10, no. 5, 30 October 2005 (2005-10-30), pages 49 - 53 *
高振斌 等: "超长可变点数FFT处理器设计与实现", 《电讯技术》, 28 August 2005 (2005-08-28), pages 92 - 96 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103902505A (en) * 2014-04-12 2014-07-02 复旦大学 Configurable FFT processor circuit structure based on switching network
CN105630738A (en) * 2016-01-11 2016-06-01 北京北方烽火科技有限公司 FFT (Fast Fourier Transform)/IFFT (Inverse Fast Fourier Transform) device based on LTE (Long Term Evolution) system
CN105630738B (en) * 2016-01-11 2019-01-08 北京北方烽火科技有限公司 FFT/IFFT converting means based on LTE system

Also Published As

Publication number Publication date
CN103699517B (en) 2016-06-29

Similar Documents

Publication Publication Date Title
WO2019128404A1 (en) Matrix multiplier
CN106940815B (en) Programmable convolutional neural network coprocessor IP core
KR102655386B1 (en) Method and apparatus for distributed and cooperative computation in artificial neural networks
CN104899182B (en) A kind of Matrix Multiplication accelerated method for supporting variable partitioned blocks
Yu et al. A deep learning prediction process accelerator based FPGA
US11620818B2 (en) Spatially sparse neural network accelerator for multi-dimension visual analytics
CN103049241B (en) A kind of method improving CPU+GPU isomery device calculated performance
CN103761215B (en) Matrix transpose optimization method based on graphic process unit
CN103955446B (en) DSP-chip-based FFT computing method with variable length
US11403104B2 (en) Neural network processor, chip and electronic device
CN105389277A (en) Scientific computation-oriented high performance DMA (Direct Memory Access) part in GPDSP (General-Purpose Digital Signal Processor)
CN103970720A (en) Embedded reconfigurable system based on large-scale coarse granularity and processing method of system
US20220043770A1 (en) Neural network processor, chip and electronic device
CN103412284A (en) Matrix transposition method in SAR imaging system based on DSP chip
CN104679670B (en) A kind of shared data buffer structure and management method towards FFT and FIR
CN107562549B (en) Isomery many-core ASIP framework based on on-chip bus and shared drive
Wei et al. Memory access optimization of a neural network accelerator based on memory controller
CN109472734B (en) Target detection network based on FPGA and implementation method thereof
CN103699517B (en) A kind of 1-D/2-D mixed architecture fft processor
CN109948113A (en) A kind of Two-dimensional FFT accelerator based on FPGA
CN110515872A (en) Direct memory access method, apparatus, dedicated computing chip and heterogeneous computing system
Chen et al. The parallel algorithm implementation of matrix multiplication based on ESCA
WO2013097235A1 (en) Parallel bit order reversing device and method
CN105718993B (en) Cellular array computing system and communication means therein
Liang et al. Design of 16-bit fixed-point CNN coprocessor based on FPGA

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant