CN114422315B - Ultra-high throughput IFFT/FFT modulation and demodulation method - Google Patents

Ultra-high throughput IFFT/FFT modulation and demodulation method Download PDF

Info

Publication number
CN114422315B
CN114422315B CN202210315781.8A CN202210315781A CN114422315B CN 114422315 B CN114422315 B CN 114422315B CN 202210315781 A CN202210315781 A CN 202210315781A CN 114422315 B CN114422315 B CN 114422315B
Authority
CN
China
Prior art keywords
fft
ifft
sequence
dft
row
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210315781.8A
Other languages
Chinese (zh)
Other versions
CN114422315A (en
Inventor
李天瑞
黄以华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN202210315781.8A priority Critical patent/CN114422315B/en
Publication of CN114422315A publication Critical patent/CN114422315A/en
Application granted granted Critical
Publication of CN114422315B publication Critical patent/CN114422315B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L27/00Modulated-carrier systems
    • H04L27/26Systems using multi-frequency codes
    • H04L27/2601Multicarrier modulation systems
    • H04L27/2626Arrangements specific to the transmitter only
    • H04L27/2627Modulators
    • H04L27/2628Inverse Fourier transform modulators, e.g. inverse fast Fourier transform [IFFT] or inverse discrete Fourier transform [IDFT] modulators
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L27/00Modulated-carrier systems
    • H04L27/26Systems using multi-frequency codes
    • H04L27/2601Multicarrier modulation systems
    • H04L27/2647Arrangements specific to the receiver only
    • H04L27/2649Demodulators
    • H04L27/265Fourier transform demodulators, e.g. fast Fourier transform [FFT] or discrete Fourier transform [DFT] demodulators

Landscapes

  • Physics & Mathematics (AREA)
  • Discrete Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Complex Calculations (AREA)
  • Error Detection And Correction (AREA)

Abstract

The invention relates to the technical field of communication, in particular to an IFFT/FFT modulation and demodulation method with ultra-high throughput, which uses a framework which is completely unfolded and specially optimizes rotation factors as a realization basis, and uses a row-column interleaver to carry out compact data rearrangement so as to realize the decomposition of an IFFT/FFT framework; the invention has the advantages of smaller area overhead, shorter critical path, higher operating frequency and lower power consumption, and can be used for different architectures.

Description

Ultra-high throughput IFFT/FFT modulation and demodulation method
Technical Field
The invention relates to the technical field of communication, in particular to an ultra-high throughput IFFT/FFT modulation and demodulation method.
Background
The ultra-high-speed multi-carrier communication system needs an ultra-high-throughput IFFT/FFT module, taking a 50G access network baseband chip adopting the SEFDM as an example, when the baseband receiver performs iterative decoding, a 512-point IFFT demodulation module and a 512-point FFT modulation module with throughput of more than 250M times/second need to be implemented, and no method capable of directly generating such a high-throughput IFFT/FFT design is disclosed at present. An alternative approach that may be employed in the industry is to use multiple commercially available radix-2/radix-4 serial N-point IFFT/FFT IP blocks in combination to replace the target block by caching multiple sets of FFT input data to achieve on average the same computational power as the target over a longer period of time. The use of multiple low throughput (serial/partially parallel) modules brings the following disadvantages: a larger chip area overhead; longer critical path delay; continuous multiple groups of FFT input are required to gradually reach the target throughput; more buffers are needed to store multiple sets of FFT inputs; more complex power consumption control logic, otherwise higher power consumption would result; the overall time delay of modulation and demodulation is increased; depending on the purchased IP, the implementation of the complete module may be limited to a particular architecture (e.g., FPGA of a particular business).
Disclosure of Invention
In order to overcome the defects in the prior art, the invention provides an ultra-high throughput IFFT/FFT modulation and demodulation method, which reduces the area overhead, reduces the power consumption and improves the operating frequency.
In order to solve the technical problems, the invention adopts the technical scheme that: an ultra-high throughput IFFT/FFT modulation and demodulation method comprises the following steps:
and (3) demodulation FFT process:
s11. definition of n-point DFT is:
Figure 415409DEST_PATH_IMAGE001
in the formula (I), the compound is shown in the specification,
Figure 255189DEST_PATH_IMAGE002
v is the demodulation input, V is the demodulation output, j is the pure imaginary number, and N represents the length of the FFT/IFFT;
Figure 597833DEST_PATH_IMAGE003
is the base of the Fourier transform rotation primer; i is the serial number of the time domain signal; k is the serial number of the frequency domain signal;
s12, decomposing the n-point DFT into a plurality of DFT sub-modules and setting
Figure 155853DEST_PATH_IMAGE004
The following coefficient substitutions were introduced:
Figure 12951DEST_PATH_IMAGE005
s13, after the substitution, a row-column interleaver is used, the sequence of output variables is changed, and a two-dimensional index is used for replacing a one-dimensional index:
Figure 972817DEST_PATH_IMAGE006
s14, substituting the coefficients into the original formula and recording
Figure 483432DEST_PATH_IMAGE007
Obtaining:
Figure 263170DEST_PATH_IMAGE008
wherein, in parentheses:
Figure 720696DEST_PATH_IMAGE009
part is one
Figure 535068DEST_PATH_IMAGE010
DFT transform of points, the result and twiddle factors
Figure 623110DEST_PATH_IMAGE011
Multiplying;
the whole transformation is one
Figure 890143DEST_PATH_IMAGE012
Point DFT transform, decomposing original n-point DFT transform into nested ones
Figure 89043DEST_PATH_IMAGE013
An
Figure 616976DEST_PATH_IMAGE014
Dot transformation sum
Figure 610340DEST_PATH_IMAGE014
An
Figure 426987DEST_PATH_IMAGE013
Point transformation;
a modulation IFFT process:
s21. n-point IDFT is defined as:
Figure 429578DEST_PATH_IMAGE015
s22, for n-point IDFT, decomposing the n-point IDFT into a plurality of IDFT sub-modules and setting
Figure 218542DEST_PATH_IMAGE016
The following coefficient substitutions were introduced:
Figure 382807DEST_PATH_IMAGE017
s23, after the substitution, a row-column interleaver is used to change the order of the output variables, replacing the one-dimensional indices with two-dimensional indices:
Figure 624433DEST_PATH_IMAGE006
s24, substituting the coefficients into the original formula and recording
Figure 21260DEST_PATH_IMAGE018
Obtaining:
Figure 399152DEST_PATH_IMAGE019
the IDFT process is consistent with the DFT process, and only in the IDFT process, the twiddle factor is changed, and finally 1/N scaling is carried out, and under the condition that N is an integral power of 2, 1/N does not need to be calculated.
Further, in steps S14 and S24, the multiplication by the twiddle factor is implemented by a complex multiplier.
In one embodiment, the complex multiplier is noted as:
Figure 265477DEST_PATH_IMAGE020
(ii) a Wherein p is the product of complex multiplication; a. b is the multiplier of the complex multiplier;
wherein the real part and the imaginary part are respectively recorded as
Figure 100002_DEST_PATH_IMAGE021
For complex multipliers of unspecified value, a fast algorithm using 3 multiplications, 5 additions is used:
Figure 728819DEST_PATH_IMAGE022
in one embodiment, in the case of not fully expanded, i.e. parallelism less than P, for a row-column interleaver, the following architecture is implemented by combining a barrel shifter and RAM: in an input cycle, data enters the RAM group through the barrel shifter by an incremental shift quantity in each cycle; after all input periods, the data to be interleaved is completely stored in the RAM in a conflict-free mode; after the output period is started, the interleaved data is output through the barrel shifter.
In one embodiment, in the row interleaver, the following address mapping scheme is adopted, so that the input and output of each cycle are from different RAM ports, so as to avoid collision; the address mapping scheme comprises:
the number of RAMs is equal to the larger number of the row/column number;
for data belonging to the same row, allocating to different RAMs, and increasing the sequence number from the head of the row;
for data belonging to the same column, distributing the data to different RAMs, and starting from the head of the column to the serial number of the RAM;
for data belonging to the same RAM, its address is incremented over time.
In one embodiment, for the non-fully spread FFT/IFFT, the fully spread FFT/IFFT implementation method is used as a basic implementation unit, and the non-fully spread FFT/IFFT is decomposed into a plurality of basic implementation units according to the target parallelism of the non-fully spread FFT/IFFT, and then the non-fully spread FFT/IFFT is implemented according to the basic implementation unit, i.e. the fully spread FFT/IFFT implementation method.
In one embodiment, when IFFT/FFT is used for modulation and demodulation of a multi-carrier system, the IFFT input is a spread conjugate symmetric sequence and the output is a real sequence, the FFT input is a real sequence and the output is a conjugate symmetric sequence; for the IFFT, the following algorithm is adopted for simplification, and for two conjugated symmetric sequences to be transformed
Figure 932267DEST_PATH_IMAGE023
Pretreatment: construction sequence
Figure 164666DEST_PATH_IMAGE024
Performing IFFT on the D to obtain D;
and (3) post-treatment: hope to
Figure 998629DEST_PATH_IMAGE025
Separated from D, since IDFT is a linear transformation, there are
Figure 949268DEST_PATH_IMAGE026
Due to the fact that
Figure 97353DEST_PATH_IMAGE027
All are real sequences, so that the real part and the imaginary part of the result are respectively output;
for real sequence FFT, the following algorithm is adopted for simplification, and for two real sequences needing to be transformed
Figure 449836DEST_PATH_IMAGE028
Pretreatment: construction sequence
Figure 392385DEST_PATH_IMAGE029
Performing FFT on the D to obtain D;
and (3) post-treatment: hope to
Figure 689374DEST_PATH_IMAGE030
Separated from D, DFT is a linear transformation, and thus has
Figure 375570DEST_PATH_IMAGE031
Figure 644877DEST_PATH_IMAGE032
And D is also conjugate symmetric, having:
Figure 758327DEST_PATH_IMAGE033
the invention also provides an ultra-high throughput IFFT/FFT modulation and demodulation system, which comprises an FFT module and an IFFT module, wherein:
the FFT module comprises:
DFT definition unit: the DFT for n points is defined as:
Figure 683558DEST_PATH_IMAGE034
in the formula (I), the compound is shown in the specification,
Figure 173445DEST_PATH_IMAGE002
v is the demodulation input, V is the demodulation output, j is the pure imaginary number, and N represents the length of the FFT/IFFT;
Figure 500521DEST_PATH_IMAGE003
is the base of the Fourier transform rotation primer; i is the serial number of the time domain signal; k is the serial number of the frequency domain signal;
DFT subunit: for n-point DFT, decomposing it into multiple DFT sub-modules and setting
Figure 381277DEST_PATH_IMAGE035
The following coefficient substitutions were introduced:
Figure 793804DEST_PATH_IMAGE036
a row-column interleaver unit: for changing the order of the output variables, the one-dimensional index is replaced by a two-dimensional index:
Figure 821802DEST_PATH_IMAGE006
A complex multiplier unit: for substituting the coefficients into the original form and recording
Figure 268964DEST_PATH_IMAGE037
Obtaining:
Figure 724216DEST_PATH_IMAGE038
wherein the content of the first and second substances,in parentheses:
Figure 358460DEST_PATH_IMAGE009
part is one
Figure 783625DEST_PATH_IMAGE039
DFT transform of points, the result and twiddle factors
Figure 85293DEST_PATH_IMAGE040
Multiplying; the multiplication with the twiddle factor is realized by a complex multiplier;
the IFFT module comprises:
an IDFT definition unit: the IDFT for n points is defined as:
Figure 773764DEST_PATH_IMAGE015
IDFT subunit: for n-point IDFT, decomposing the IDFT into multiple IDFT submodules and setting
Figure 895304DEST_PATH_IMAGE041
The following coefficient substitutions were introduced:
Figure 265105DEST_PATH_IMAGE042
a row-column interleaver unit: for changing the order of the output variables, the one-dimensional index is replaced by a two-dimensional index:
Figure 686859DEST_PATH_IMAGE006
a complex multiplier unit: for substituting the coefficients into the original form and recording
Figure 483914DEST_PATH_IMAGE043
Obtaining:
Figure 951804DEST_PATH_IMAGE044
wherein, in parentheses:
Figure 859717DEST_PATH_IMAGE009
part is one
Figure 932716DEST_PATH_IMAGE045
IDFT conversion of points, the result and twiddle factors
Figure 900672DEST_PATH_IMAGE046
Multiplying; the multiplication with the twiddle factor is implemented using a complex multiplier.
The present invention also provides an electronic device comprising: a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the ultra-high throughput IFFT/FFT modem method.
The present invention provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the ultra high throughput IFFT/FFT modem method described above.
Compared with the prior art, the beneficial effects are that: the invention provides an IFFT/FFT modulation and demodulation method with ultra-high throughput, which uses a framework which is completely unfolded and specially optimizes rotation factors as a realization basis, and uses a row-column interleaver to carry out compact data rearrangement so as to realize the decomposition of the IFFT/FFT framework; the invention has the advantages of smaller area overhead, shorter key path, higher operating frequency and lower power consumption, and can be used for different architectures.
Drawings
Fig. 1 is a schematic diagram of FFT calculation in embodiment 1 of the present invention.
FIG. 2 is a simplified FFT calculation flow chart in embodiment 1 of the present invention, and is supplemented with the row-column interleaving required when the input and output are in natural order.
Fig. 3 is a schematic diagram of a recursive decomposition of an FFT algorithm.
Fig. 4 is a schematic diagram of a complex multiplier implementation.
Fig. 5 is a diagram of a specific twiddle factor.
FIG. 6 is a DFT-2 (butterfly unit) schematic.
FIG. 7 is a DFT-4 schematic.
FIG. 8 is a schematic diagram of a fully expanded row-column interleaver.
FIG. 9 is a schematic diagram of a row-column interleaver implementation architecture.
Fig. 10 is a schematic diagram of a memory mapping scheme of a column interleaver.
Figure 11 is a schematic diagram of an IFFT/FFT implementation architecture with factor parallelism.
Fig. 12 is a schematic diagram of a real sequence/conjugate symmetric sequence optimization architecture.
Fig. 13 is a schematic diagram of a transmitter IFFT modulation module architecture.
Figure 14 is a schematic diagram of a receiver IFFT module architecture.
Fig. 15 is a schematic diagram of a receiver FFT module architecture.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. The invention is described below in one of its embodiments with reference to specific embodiments. Wherein the showings are for the purpose of illustration only and are shown by way of illustration only and not in actual form, and are not to be construed as limiting the present patent; to better illustrate the embodiments of the present invention, some parts of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
In the description of the present invention, it should be understood that if there is an orientation or positional relationship indicated by the terms "upper", "lower", "left", "right", etc. based on the orientation or positional relationship shown in the drawings, it is only for convenience of describing the present invention and simplifying the description, but it is not intended to indicate or imply that the referred device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and therefore, the terms describing the positional relationship in the drawings are only used for illustrative purposes and are not to be construed as limiting the present patent, and the specific meaning of the terms may be understood by those skilled in the art according to specific circumstances. In addition, if there is a description of "first", "second", etc. in an embodiment of the present invention, the description of "first", "second", etc. is for descriptive purposes only and is not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In addition, the meaning of "and/or" appearing throughout is to include three juxtapositions, exemplified by "A and/or B" including either scheme A, or scheme B, or a scheme in which both A and B are satisfied.
Example 1:
the embodiment provides an ultra-high throughput IFFT/FFT modulation and demodulation method, which comprises the following steps:
Cooley-Tukey FFT procedure:
s11. definition of n-point DFT is:
Figure 262383DEST_PATH_IMAGE047
in the formula (I), the compound is shown in the specification,
Figure 973987DEST_PATH_IMAGE048
v is the demodulation input, V is the demodulation output, j is the pure imaginary number, and N represents the length of the FFT/IFFT;
Figure 839175DEST_PATH_IMAGE003
is the base of the Fourier transform rotation primer; i is the serial number of the time domain signal; k is the serial number of the frequency domain signal;
s12, decomposing the n-point DFT into a plurality of DFT sub-modules and setting
Figure 857594DEST_PATH_IMAGE049
The following coefficient substitutions were introduced:
Figure 441022DEST_PATH_IMAGE050
s13, after the substitution, a row-column interleaver is used, the sequence of output variables is changed, and a two-dimensional index is used for replacing a one-dimensional index:
Figure 221897DEST_PATH_IMAGE006
s14, substituting the coefficients into the original formula and recording
Figure 941591DEST_PATH_IMAGE051
Obtaining:
Figure 251350DEST_PATH_IMAGE052
wherein, in parentheses:
Figure 181128DEST_PATH_IMAGE009
part is one
Figure 234535DEST_PATH_IMAGE053
DFT transform of points, the result and twiddle factors
Figure 136632DEST_PATH_IMAGE054
Multiplying;
the whole transformation is one
Figure 617292DEST_PATH_IMAGE055
Point DFT transform, decomposing original n-point DFT transform into nested ones
Figure 175312DEST_PATH_IMAGE056
An
Figure 32410DEST_PATH_IMAGE053
Dot transformed sum
Figure 726696DEST_PATH_IMAGE057
An
Figure 502891DEST_PATH_IMAGE056
Point transformation;
Cooley-TukeyIFFT Process:
s21. n-point IDFT is defined as:
Figure 282628DEST_PATH_IMAGE015
s22, for n-point IDFT, decomposing the n-point IDFT into a plurality of IDFT sub-modules and setting
Figure 677838DEST_PATH_IMAGE058
The following coefficient substitutions were introduced:
Figure 554527DEST_PATH_IMAGE059
s23, after the substitution, a row-column interleaver is used to change the order of the output variables, replacing the one-dimensional indices with two-dimensional indices:
Figure 376989DEST_PATH_IMAGE006
S24, substituting the coefficients into the original formula and recording
Figure 909602DEST_PATH_IMAGE060
And obtaining:
Figure 108502DEST_PATH_IMAGE061
the IDFT process is consistent with the DFT process, and only in the IDFT process, the rotation factor is changed, and scaling is carried out through 1/N finally, and under the condition that N is an integral power of 2, 1/N does not need to be calculated.
According to the above algorithm, taking 12-point DFT as an example, a hardware block diagram corresponding to the calculation flow of the calculator is drawn, as shown in fig. 1. The structure shown in fig. 1 can be simplified to the form shown in fig. 2.
The decomposition process of the Cooley-Tukey FFT algorithm is recursive, as shown in FIG. 3. for a DFT sub-module whose point number is not prime, the algorithm can be used again to decompose it, so for any 2 n The point DFT/IDFT conversion only needs to realize 2/4 point DFT/IDFT modules, and can obtain conversion with higher point number in a combined mode.
One example is for a 64-point DFT:
first to 16 x 4, which means that 16-point and 4-point DFT modules need to be implemented;
when a 16-point DFT module is implemented, it is decomposed into 4 × 4, so that only one module of a 4-point DFT module is finally implemented.
According to the above demonstration, 2 n The hardware implementation of the point FFT can be decomposed into hardware implementations and combinations of the following three items, 2/4 point DFT sub-module, complex multiplier module, row-column interleaver module.
The realization of the complex multiplier:
the complex multiplication is noted as:
Figure 639365DEST_PATH_IMAGE062
wherein p is the product of complex multiplication; a. b is the multiplier of the complex multiplier;
wherein the real part and the imaginary part are respectively recorded as
Figure 632729DEST_PATH_IMAGE063
For complex multipliers of unspecified value, a fast algorithm using 3 multiplications, 5 additions is used:
Figure 387058DEST_PATH_IMAGE022
the mapping of the above algorithm on hardware is given in fig. 4, the implementation architecture can be adapted in different pipeline configuration modes under different underlying processes, and the design of different pipeline stages can be obtained by adjusting the calculation time and inserting the calculation time into a pipeline register (a dotted line part in fig. 3) without changing the front-back driving relationship of the hardware.
In an FPGA, a suitable implementation is a pipeline stage =5 implementation, and the overhead can be completely covered by DSP slices on 3 FPGAs. In an ASIC, depending on the process and target operating frequency, designers have the flexibility to adopt a more compact pipeline configuration, an example of which is the schematic diagram given in fig. 4 at pipeline stages 0,3, 5.
In the twiddle factor of DFT, the calculation of 8 special values can be simplified, which are uniformly distributed on the unit circle of the complex plane, as shown in fig. 5. For four special values (1, -1, j, -j) on the coordinate axis, multiplication is not needed, and only the exchange and the negation of the real part and the imaginary part are needed. For the remaining four absolute values of real and imaginary parts
Figure 389649DEST_PATH_IMAGE064
The special value of (2) only needs to carry out two multiplication calculations, and the pipeline structure of the special value is similar to that of a complete complex multiplier, and the special value can also be flexibly configured.
2/4 point DFT sub-module implementation:
the 2/4 point DFT/IDFT point module is characterized in that, because the twiddle factors have a plurality of special values, the number of multipliers used in the direct realization is obviously smaller than that obtained by the base 2 mode decomposition, and the occurrence number of the twiddle factor special values is reduced on a larger point number, so that the property is no longer true. Therefore, 2/4-point DFT modules are always used as basic implementation units, and the implementation architecture thereof is shown in fig. 6 and fig. 7.
The implementation of IDFT is identical to DFT hardware structure, but the output order is slightly different, so it is omitted.
Implementation of the row-column interleaver module:
the row and column interleaving in fig. 1 and 2, in the case of "fully expanded" they can be simply implemented as connections on the circuit, as shown in fig. 8.
Without full unrolling (parallelism less than P), for a row-column interleaver, the following architecture can be implemented by combining a general shifter and RAM, as shown in fig. 9: in an input cycle, data enters the RAM group through the barrel shifter by an incremental shift quantity in each cycle; after all input periods, the data to be interleaved is completely stored in the RAM in a conflict-free mode; after the output period is started, the interleaved data is output through the barrel shifter.
Accordingly, there is a need for an address mapping scheme such that the inputs and outputs per cycle are from different RAM ports, as shown in fig. 10, to avoid collisions, the address mapping scheme given by the following rules:
the number of RAMs is equal to the larger number of rows/columns;
for data belonging to the same row, allocating to different RAMs, and increasing the sequence number from the head of the row;
for data belonging to the same column, distributing the data to different RAMs, and starting from the head of the column to the serial number of the RAM;
for data belonging to the same RAM, its address is incremented over time.
In fig. 10, an address mapping scheme of a (4,4) interleaver with parallelism of 4 is shown, different colors represent different RAMs to which data belongs, and the shift relationship of input data during each cycle can be seen from the figure.
Example 2
The embodiment provides a method for realizing the FFT/IFFT which is not completely unfolded; in embodiment 1, a method for implementing a fully-extended (with parallelism P, and a set of transform results generated per cycle) FFT/IFFT is provided, and the method implementation steps include (taking 64-point FFT as an example):
determining a factorization of n, 64 = 16 × 4 = 4 × 4, so that the whole module is divided into 3 columns of DFT-4, each column having 16 DFT-4;
According to the DFT-4 implementation method, all DFT modules are implemented;
according to the implementation method of the interleaver and the complex multiplier, the first two columns of DFT-4 modules are combined into 4 FFT-16 modules through the (4,4) interleaver and the complex multiplier.
When the required throughput is lower than that provided by the fully spread FFT/IFFT module, a method is proposed that makes it possible to implement an FFT/IFFT module with a factor of parallelism of P, the basic building blocks of which are no different from the fully spread FFT module, however, since the implementation of the fully spread FFT has been discussed, it is also considered as a "basic implementation unit", and the implementation of the non-fully spread FFT/IFFT comprises the following steps (taking as an example the implementation of a 512-point FFT with parallelism of 64, as shown in fig. 11):
decomposing 512 into 64 (target parallelism) × 8 according to the target parallelism;
1 FFT-64 module with complete expansion is realized;
8 = 64/8 fully expanded FFT-8 modules were implemented;
the ROM for controlling the pre-written twiddle factors provides different coefficients for 64 complex multipliers after FFT-64;
connected thereto by a (64,8) row-column interleaver;
pre-processing and post-processing are performed in tandem by a (8,64) row-column interleaver.
Example 3
When IFFT/FFT is used for modulation and demodulation of a multi-carrier system, the input of the IFFT is a conjugate symmetric sequence obtained by extension, and the output is a real sequence; the input of the FFT is a real sequence, and the output is a conjugate symmetric sequence.
For two conjugated symmetric sequences to be transformed
Figure 913034DEST_PATH_IMAGE065
Pretreatment: construction sequence
Figure 201933DEST_PATH_IMAGE066
Performing IFFT on the D to obtain D;
and (3) post-treatment: hope to
Figure 443559DEST_PATH_IMAGE067
Separated from D, since IDFT is a linear transformation, there are
Figure 718682DEST_PATH_IMAGE068
Due to the fact that
Figure 424470DEST_PATH_IMAGE069
All are real sequences, so that the real part and the imaginary part of the result are respectively output;
for real sequence FFT, the following algorithm is adopted for simplification, and for two real sequences needing to be transformed
Figure 759637DEST_PATH_IMAGE070
Pretreatment: construction sequence
Figure 754137DEST_PATH_IMAGE071
Performing FFT on the D to obtain D;
and (3) post-treatment: hope to
Figure 832952DEST_PATH_IMAGE072
Separated from D, DFT is a linear transformation, and thus has
Figure 330929DEST_PATH_IMAGE073
Figure 961631DEST_PATH_IMAGE074
And D is also conjugate symmetric, having:
Figure 912269DEST_PATH_IMAGE075
in hardware, the above transformation can be realized only by an adder/subtractor, because the division of 2 and 2j can be realized by a wire under 2 system.
Thus, an IFFT/FFT module with parallelism P can be used to achieve the actual throughput with parallelism 2P, and the schematic diagram is shown in fig. 12.
Example 4
The present embodiment provides a SEFDM modem module design for a 50G access network.
Under the running frequency of 250MHz, modulating 4 bits in each complex plane symbol, and carrying out iterative decoding for 5 times; if the target throughput is desired, at the receiver, the IFFT/FFT module is capable of generating a set of modulation/demodulation results per cycle; at the transmitter, the IFFT module is capable of generating a set of modulation/demodulation results every 5 cycles; thus, a 512-point transform with parallelism equal to 128 is required at the transmitter and a 512-point transform with parallelism 512 (fully expanded) is required at the receiver.
According to the design method in the embodiment 2, the hardware block diagrams of the corresponding modules in the transmitter and the receiver are obtained by adding the preprocessing module and the postprocessing module in the embodiment 3; as shown in fig. 13-15.
In the embodiment, the design is realized and tested on the Xilinx FPGA, and the design is compared with the prior art; the data in the table below are from the test results using Xilinx foundries FFT IP configured as a tapelined _ structuring _ io mode on the version xcvu9p chip, bit width = 18. The parallelism provided by the IP is 2 (one complete 512-point transform result is output every 256 = 512/2 cycles), which is already the highest throughput mode of the IP.
Figure 60354DEST_PATH_IMAGE076
If the implementation is used as the basic unit to form the modules we need
The transmitter IFFT requires 64 such units; the receiver FFT requires 256 of these units; the receiver IFFT requires 256 of these units.
Based on the above data, the area overhead and the highest operating frequency of the method of the present invention and the implementation based on the above IP are compared, wherein the estimation of the overhead of the prior art (low throughput IP combination) does not include the logic and buffering overhead of combining multiple IPs, and the negative impact of these parts on fmax is not considered, even so, the area overhead is reduced by about 50% -70% (increasing with the increase of the implementation scale) by using the design of the present invention, and the advantage of on-chip storage usage is more obvious.
An operating frequency is provided which is equivalent to that of the official IP, and in fact, the operating frequency using the prior art may be significantly inferior to the following data, considering that the buffer portion necessary to combine a plurality of IPs also requires data interleaving.
Transmitter IFFT
Figure 412838DEST_PATH_IMAGE077
Receiver IFFT
Figure 355386DEST_PATH_IMAGE078
Receiver FFT
Figure 855637DEST_PATH_IMAGE079
In summary, the present invention realizes an IFFT/FFT modulation/demodulation method with a very high throughput, a smaller area overhead, a shorter critical path (thus, a higher operating frequency), and a lower power consumption, and can be used for different architectures.
It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims (5)

1. An ultra-high throughput IFFT/FFT modulation and demodulation method is characterized by comprising the following steps:
and (3) demodulation FFT process:
s11. definition of n-point DFT is:
Figure DEST_PATH_IMAGE001
in the formula (I), the compound is shown in the specification,
Figure DEST_PATH_IMAGE003
v is the demodulation input, V is the demodulation output, j is the pure imaginary number, and N represents the length of the FFT/IFFT;
Figure DEST_PATH_IMAGE005
is the base of the Fourier transform rotation primer; i is the serial number of the time domain signal; k is the serial number of the frequency domain signal;
s12, decomposing the n-point DFT into a plurality of DFT sub-modules and setting
Figure DEST_PATH_IMAGE007
The following coefficient substitutions were introduced:
Figure 133587DEST_PATH_IMAGE008
s13, after the substitution, a row-column interleaver is used, the sequence of output variables is changed, and a two-dimensional index is used for replacing a one-dimensional index:
Figure 479118DEST_PATH_IMAGE010
s14, substituting the coefficients into the original formula and recording
Figure 708105DEST_PATH_IMAGE012
Obtaining:
Figure DEST_PATH_IMAGE013
wherein, in parentheses:
Figure DEST_PATH_IMAGE015
part is one
Figure DEST_PATH_IMAGE017
DFT transform of points, the result and twiddle factors
Figure DEST_PATH_IMAGE019
Multiplying;
the whole transformation is one
Figure DEST_PATH_IMAGE021
Point DFT transform, decomposing original n-point DFT transform into nested ones
Figure 286723DEST_PATH_IMAGE023
An
Figure 16781DEST_PATH_IMAGE025
Dot transformed sum
Figure 92185DEST_PATH_IMAGE017
An
Figure 616707DEST_PATH_IMAGE027
Point transformation;
modulation IFFT process:
s21. n-point IDFT is defined as:
Figure DEST_PATH_IMAGE028
s22, for n-point IDFT, decomposing the n-point IDFT into a plurality of IDFT sub-modules and setting
Figure DEST_PATH_IMAGE030
The following coefficient substitutions were introduced:
Figure 698932DEST_PATH_IMAGE031
s23, after the substitution, a row-column interleaver is used, the sequence of output variables is changed, and a two-dimensional index is used for replacing a one-dimensional index:
Figure DEST_PATH_IMAGE032
s24, substituting the coefficients into the original formula and recording
Figure DEST_PATH_IMAGE034
Obtaining:
Figure 154053DEST_PATH_IMAGE035
the IDFT process is consistent with the DFT process, only in the IDFT process, the twiddle factor is changed, and scaling is carried out through 1/N at last, and under the condition that N is an integral power of 2, 1/N does not need to be calculated;
wherein, in step S14 and step S24, the multiplication with the twiddle factor is implemented by a complex multiplier; the complex multiplier is noted as:
Figure 83963DEST_PATH_IMAGE037
wherein p is the product of complex multiplication; a. b is the multiplier of the complex multiplier;
wherein the real part and the imaginary part are respectively recorded as
Figure DEST_PATH_IMAGE038
For complex multipliers of unspecified value, a fast algorithm using 3 multiplications, 5 additions is used:
Figure 700758DEST_PATH_IMAGE039
In the case of not fully expanded, i.e. parallelism less than P, for a row-column interleaver, the following architecture is implemented by combining barrel shifters and RAM: in an input cycle, data enters the RAM group through the barrel shifter by an incremental shift quantity in each cycle; after all input periods, the data to be interleaved is completely stored in the RAM in a conflict-free mode; after the output period is started, the interleaved data is output through the barrel shifter; in the row interleaver, the following address mapping scheme is adopted, so that the input and output of each period are from different RAM ports, and collision is avoided; the address mapping scheme comprises:
the number of RAMs is equal to the larger number of the row/column number;
for data belonging to the same row, allocating to different RAMs, and increasing the sequence number from the head of the row;
for data belonging to the same column, allocating the data to different RAMs, and increasing the sequence number of the data from the head of the column;
for data belonging to the same RAM, its address is incremented over time.
2. The ultra-high throughput IFFT/FFT modem method according to claim 1, wherein for the non-fully spread FFT/IFFT, the fully spread FFT/IFFT implementation method is used as a basic implementation unit, and according to the target parallelism of the non-fully spread FFT/IFFT, the non-fully spread FFT/IFFT is decomposed into a plurality of basic implementation units, and then the non-fully spread FFT/IFFT is implemented according to the basic implementation unit, i.e. the fully spread FFT/IFFT implementation method.
3. The ultra-high throughput IFFT/FFT modem method according to claim 2, wherein when IFFT/FFT is used for modulation and demodulation of a multi-carrier system, the IFFT has a conjugate symmetric sequence obtained by spreading as an input and a real sequence as an output, and the FFT has a real sequence as an input and a real sequence as an outputIs a conjugated symmetric sequence; for the IFFT, the following algorithm is adopted for simplification, and for two conjugated symmetric sequences to be transformed
Figure 739121DEST_PATH_IMAGE041
Pretreatment: construction sequence
Figure DEST_PATH_IMAGE042
Performing IFFT on the D to obtain D;
and (3) post-treatment: hope to
Figure 686349DEST_PATH_IMAGE043
Separated from D, since IDFT is a linear transformation, there are
Figure 985612DEST_PATH_IMAGE045
Due to the fact that
Figure 851937DEST_PATH_IMAGE047
All are real sequences, so that the real part and the imaginary part of the result are respectively output;
for real sequence FFT, the following algorithm is adopted for simplification, and for two real sequences needing to be transformed
Figure DEST_PATH_IMAGE048
Pretreatment: construction sequence
Figure DEST_PATH_IMAGE050
Performing FFT on the D to obtain D;
and (3) post-treatment: hope to
Figure 784121DEST_PATH_IMAGE052
Separated from D, DFT is a linear transformation, and thus has
Figure 253148DEST_PATH_IMAGE054
Figure DEST_PATH_IMAGE056
And D is also conjugate symmetric, having:
Figure 688809DEST_PATH_IMAGE057
Figure 991614DEST_PATH_IMAGE059
4. an electronic device, comprising: memory, processor and computer program stored on the memory and executable on the processor, characterized in that the processor executes the computer program to implement the ultra high throughput IFFT/FFT modem method of any of claims 1 to 3.
5. A computer readable storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the ultra high throughput IFFT/FFT modem method of any of claims 1-3.
CN202210315781.8A 2022-03-29 2022-03-29 Ultra-high throughput IFFT/FFT modulation and demodulation method Active CN114422315B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210315781.8A CN114422315B (en) 2022-03-29 2022-03-29 Ultra-high throughput IFFT/FFT modulation and demodulation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210315781.8A CN114422315B (en) 2022-03-29 2022-03-29 Ultra-high throughput IFFT/FFT modulation and demodulation method

Publications (2)

Publication Number Publication Date
CN114422315A CN114422315A (en) 2022-04-29
CN114422315B true CN114422315B (en) 2022-07-29

Family

ID=81263493

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210315781.8A Active CN114422315B (en) 2022-03-29 2022-03-29 Ultra-high throughput IFFT/FFT modulation and demodulation method

Country Status (1)

Country Link
CN (1) CN114422315B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103636131A (en) * 2011-05-18 2014-03-12 松下电器产业株式会社 Parallel bit interleaver
CN111079075A (en) * 2019-12-19 2020-04-28 网络通信与安全紫金山实验室 Non-2-base DFT optimized signal processing method

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105988973B (en) * 2015-02-13 2019-04-19 上海澜至半导体有限公司 Fast Fourier Transform (FFT)/Fast Fourier Transform (FFT) method and circuit
CN106484658B (en) * 2016-09-26 2019-01-11 西安电子科技大学 The device and method of 65536 pulses compression is realized based on FPGA

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103636131A (en) * 2011-05-18 2014-03-12 松下电器产业株式会社 Parallel bit interleaver
CN111079075A (en) * 2019-12-19 2020-04-28 网络通信与安全紫金山实验室 Non-2-base DFT optimized signal processing method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
数字电视国标核心模块优化与下一代演进标准的相关技术研究;冷继南;《中国优秀博硕士学位论文全文数据库(博士)信息科技辑》;20140615;摘要、3-5、12-14、25-42页 *

Also Published As

Publication number Publication date
CN114422315A (en) 2022-04-29

Similar Documents

Publication Publication Date Title
Shin et al. A high-speed four-parallel radix-2 4 FFT/IFFT processor for UWB applications
Cheng et al. High-throughput VLSI architecture for FFT computation
Chen et al. An indexed-scaling pipelined FFT processor for OFDM-based WPAN applications
Liu et al. Pipelined architecture for a radix-2 fast Walsh–Hadamard–Fourier transform algorithm
US8880575B2 (en) Fast fourier transform using a small capacity memory
CN110765709A (en) FPGA-based 2-2 fast Fourier transform hardware design method
JP5763911B2 (en) Radix-8 fixed-point FFT logic circuit characterized by holding root i (√i) operation
Prasanna Kumar et al. Optimized pipelined fast Fourier transform using split and merge parallel processing units for OFDM
JP5486226B2 (en) Apparatus and method for calculating DFT of various sizes according to PFA algorithm using Ruritanian mapping
CN114422315B (en) Ultra-high throughput IFFT/FFT modulation and demodulation method
JP2010016831A (en) Device for computing various sizes of dft
KR100892292B1 (en) Parallel and Pipelined Radix - 2 to the Fourth Power FFT Processor
Dora et al. Low complexity implementation of OTFS transmitter using fully parallel and pipelined hardware architecture
CN115033840A (en) Modulation signal processing device and electronic equipment
KR100444729B1 (en) Fast fourier transform apparatus using radix-8 single-path delay commutator and method thereof
KR100576520B1 (en) Variable fast fourier transform processor using iteration algorithm
CN112149046A (en) FFT (fast Fourier transform) processor and processing method based on parallel time division multiplexing technology
Kumar et al. Performance Analysis of FPGA based Implementation of FFT Architecture with Pruning Algorithm for Industrial Applications
KR100667188B1 (en) Apparatus and method for fast fourier transform
Ramesha et al. Design and Implementation of fully pipelined 64-point FFT Processor in a FPGA
Mamatha et al. Triple-matrix product-based 2D systolic implementation of discrete Fourier transform
Kannan et al. FPGA implementation of FFT architecture using modified Radix-4 algorithm
Li Studies on implementation of low power FFT processors
Karachalios et al. A new FFT architecture for 4× 4 MIMO-OFDMA systems with variable symbol lengths
Reddy et al. Design and simulation of FFT processor using radix-4 algorithm using FPGA

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant