CN113094639A - DFT parallel processing method, device, equipment and storage medium - Google Patents

DFT parallel processing method, device, equipment and storage medium Download PDF

Info

Publication number
CN113094639A
CN113094639A CN202110276067.8A CN202110276067A CN113094639A CN 113094639 A CN113094639 A CN 113094639A CN 202110276067 A CN202110276067 A CN 202110276067A CN 113094639 A CN113094639 A CN 113094639A
Authority
CN
China
Prior art keywords
butterfly
unit
address
level
parallelism
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110276067.8A
Other languages
Chinese (zh)
Other versions
CN113094639B (en
Inventor
刘福良
房旭
张丽君
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Oppo Mobile Telecommunications Corp Ltd
Original Assignee
Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Oppo Mobile Telecommunications Corp Ltd filed Critical Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority to CN202110276067.8A priority Critical patent/CN113094639B/en
Publication of CN113094639A publication Critical patent/CN113094639A/en
Application granted granted Critical
Publication of CN113094639B publication Critical patent/CN113094639B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/14Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
    • G06F17/141Discrete Fourier transforms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3004Arrangements for executing specific machine instructions to perform operations on memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Discrete Mathematics (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Complex Calculations (AREA)

Abstract

The embodiment of the application discloses a DFT parallel processing method, which comprises the following steps: determining the state information of m counters of at least one group of input data input in parallel by each stage of butterfly unit according to a preset parallel address-taking rule; determining storage addresses of at least one group of input data according to the state information of the m counters and a preset address mapping rule; reading at least one group of input data from the storage unit in parallel according to the storage address; at least one group of input data is parallelly sent to at least one butterfly unit of each stage of butterfly unit for parallel processing, and at least one group of output data is obtained; and writing at least one group of output data into the storage space corresponding to the input data according to the original storage address. Therefore, through the m counters, the parallel addressing rule and the address mapping rule, conflict-free parallel access to the plurality of storage units is realized, the parallel processing efficiency of DFT is realized, and the DFT processing delay is reduced.

Description

DFT parallel processing method, device, equipment and storage medium
Technical Field
The present application relates to the field of digital signal processing, and in particular, to a Discrete Fourier Transform (DFT) parallel processing method, apparatus, device, and storage medium.
Background
An Orthogonal Frequency Division Multiplexing (OFDM) modulation mode is adopted in a Long Term Evolution (Long Term Evolution, LTE) downlink, and a base station and a terminal respectively adopt Inverse Fast Fourier Transform (IFFT) for modulation and Fast Fourier Transform (FFT) for demodulation. The LTE uplink employs a Single Carrier Frequency Division Multiple Access (SC-FDMA) modulation scheme, and a baseband signal is subjected to DFT (Discrete Fourier Transform) spreading before IFFT modulation, which is also called DFT-Spread OFDM (DFT-s-OFDM) in 5G NR.
The number of DFT points in 4G/5G at the uplink meets the condition that N is 2m13m25m3Since the fft (ifft) point number satisfies m 2-m 3-0, the fft (ifft) can be regarded as a specific example of the DFT point number, and is collectively expressed by DFT herein. The DFT of one N point is generally completed by dividing the DFT into a plurality of levels of butterfly units, all the levels of butterfly units share one memory, and the processing of each level of butterfly units is completed by continuously accessing the memory, so that higher throughput rate is realized with lower hardware resource overhead.
The key of the memory structure is to design a conflict-free address scheme, which can read a plurality of data from the memory as input data of one or more butterfly units and then write the output data of the one or more butterfly units back to the memory according to the original address. However, no memory-based DFT parallel processing scheme exists in the prior art, which causes great delay in DFT processing.
Disclosure of Invention
In order to solve the foregoing technical problems, embodiments of the present application desirably provide a DFT parallel processing method, apparatus, device, and storage medium.
The technical scheme of the application is realized as follows:
in a first aspect, a DFT parallel processing method is provided, where the method includes:
determining m-level butterfly units for executing Discrete Fourier Transform (DFT) parallel processing; each stage of butterfly unit comprises at least one butterfly unit;
determining the state information of m counters of at least one group of input data input in parallel by each stage of butterfly unit according to a preset parallel address-taking rule;
determining the storage address of the at least one group of input data according to the state information of the m counters and a preset address mapping rule; wherein the memory address comprises a memory cell identification and a memory cell offset address;
reading the at least one group of input data from the storage units in parallel according to the storage address;
sending the at least one group of input data to at least one butterfly unit of each stage of butterfly unit in parallel for parallel processing, and outputting at least one group of output data;
and writing the at least one group of output data into a storage space corresponding to the input data according to the original storage address.
In a second aspect, a DFT parallel processing apparatus is provided, the apparatus comprising: the device comprises a processing unit, an address management unit and a plurality of storage units; wherein,
the processing unit comprises a plurality of butterfly units with different bases;
the processing unit is used for determining an m-level butterfly unit for executing Discrete Fourier Transform (DFT) parallel processing; each stage of butterfly unit comprises at least one butterfly unit;
the address management unit is used for determining the state information of m counters of at least one group of input data input in parallel by each stage of butterfly unit according to a preset parallel address-taking rule; determining the storage address of the at least one group of input data according to the state information of the m counters and a preset address mapping rule; wherein the memory address comprises a memory cell identification and a memory cell offset address;
the processing unit is used for reading the at least one group of input data from the storage unit in parallel according to the storage address; sending the at least one group of input data to at least one butterfly unit of each stage of butterfly unit in parallel for parallel processing, and outputting at least one group of output data;
and the processing unit is also used for writing the at least one group of output data into a storage space corresponding to the input data according to the original storage address.
In a third aspect, an electronic device is provided, including: a processor and a memory configured to store a computer program capable of running on the processor,
wherein the processor is configured to perform the steps of the aforementioned method when running the computer program.
In a fourth aspect, a computer storage medium is provided, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the steps of the aforementioned method.
The embodiment of the application provides a DFT parallel processing method, which comprises the following steps: determining an m-level butterfly unit for performing DFT; each stage of butterfly unit comprises at least one butterfly unit; determining the state information of m counters of at least one group of input data input in parallel by each stage of butterfly unit according to a preset parallel address-taking rule; determining the storage address of the at least one group of input data according to the state information of the m counters and a preset address mapping rule; wherein the memory address comprises a memory cell identification and a memory cell offset address; reading the at least one group of input data from the storage units in parallel according to the storage address; sending the at least one group of input data to at least one butterfly unit of each stage of butterfly unit in parallel for parallel processing, and outputting at least one group of output data; and writing the at least one group of output data into a storage space corresponding to the input data according to the original storage address. Therefore, through the m counters, the parallel addressing rule and the address mapping rule, conflict-free parallel access to the plurality of storage units is realized, the parallel processing efficiency of DFT is realized, and the DFT processing delay is reduced.
Drawings
FIG. 1 is a block diagram of a DFT-s-OFDM transmitter;
FIG. 2 is a block diagram of a DFT-s-OFDM receiver;
FIG. 3 is a first flowchart of a DFT parallel processing method according to an embodiment of the present application;
FIG. 4 is a diagram of a first framework of DFT parallel processing in the embodiment of the present application;
FIG. 5 is a diagram of a second framework of DFT parallel processing in the embodiment of the present application;
FIG. 6 is a second flowchart of a DFT parallel processing method according to an embodiment of the present application;
FIG. 7 is a third flowchart of a DFT parallel processing method according to an embodiment of the present application;
FIG. 8 is a schematic diagram illustrating a structure of a DFT parallel processing apparatus according to an embodiment of the present application;
fig. 9 is a schematic structural diagram of an electronic device in an embodiment of the present application.
Detailed Description
So that the manner in which the features and elements of the present embodiments can be understood in detail, a more particular description of the embodiments, briefly summarized above, may be had by reference to embodiments, some of which are illustrated in the appended drawings.
The number of DFT points in 4G/5G at the uplink meets the condition that N is 2m13m25m3The FFT (ifft) point number satisfies that m2 ═ m3 ═ 0, so the FFT (ifft) can be regarded as a special case of DFT point number, that is, the DFT parallel processing method provided in the embodiment of the present application can also be applied to FFT parallel processing, but this document is collectively expressed by DFT.
Fig. 1 is a block diagram of a DFT-s-OFDM transmitter, and as shown in fig. 1, a bit stream sequentially undergoes constellation point debugging, serial-to-parallel conversion, N-point DFT processing, subcarrier mapping, M-point IFFT processing, parallel-to-serial conversion, cyclic prefix adding, up-conversion, and radio frequency processing to generate a transmission signal.
Fig. 2 is a block diagram of a DFT-s-OFDM receiver, and as shown in fig. 2, an antenna receives a signal, and the signal is sequentially subjected to down-conversion and radio frequency processing, cyclic prefix addition, serial-to-parallel conversion, M-point FFT processing, subcarrier mapping, N-point IDFT processing, parallel-to-serial conversion, and constellation point demodulation to obtain a bit stream.
The DFT parallel processing method provided by the embodiment of the present application can be applied to the positions indicated by the dashed line in fig. 1 and fig. 2, that is, N-point DFT processing, M-point IFFT processing, M-point FFT processing, and N-point IDFT processing. The following describes an embodiment of the DFT parallel processing method in detail.
Fig. 3 is a first flowchart of a DFT parallel processing method in the embodiment of the present application, and as shown in fig. 3, the method may specifically include:
step 301: determining m-level butterfly units for executing DFT parallel processing; each stage of butterfly unit comprises at least one butterfly unit;
specifically, the DFT operation is decomposed into m levels of butterfly operations according to the DFT data length, m is an integer greater than or equal to 1, and the first level of butterfly operations are executed by first level butterfly units, and each level of butterfly unit comprises one or more butterfly units with the same basis. The bases of the butterfly units required by the present application include 2, 4, 8, 16, 3, 9, 5, which can perform DFT processing of arbitrary data length.
Illustratively, if the DFT data length N is 60, then 60 × 3 × 4 × 5 is divided into 3-stage butterflies, the first stage of butterflies being performed by the radix-3 butterfly units, the second stage of butterflies being performed by the radix-4 butterfly units, and the third stage of butterflies being performed by the radix-5 butterfly units. N is 54, then 54 ═ 2 × 3 × 9.
Step 302: determining the state information of m counters of at least one group of input data input in parallel by each stage of butterfly unit according to a preset parallel address-taking rule;
here, m counters are used as the basis for reading the memory cells, and the number of the memory cells to be read and written and the offset address inside the storage unit are calculated according to the state information of the counters and the address mapping rule. The method comprises the steps that a plurality of storage units (bank) are included, each storage unit comprises a plurality of storage spaces, the number of each storage unit is used for uniquely identifying one storage unit, and the offset address is used for identifying one storage space in one storage unit.
Specifically, the state information of the m counters includes m bits; the m bits comprise a first counting bit, a second counting bit and other counting bits; the first counting bits are different, and the second counting bits are the same as the other counting bits and are used for indicating input data of different input positions of the same butterfly unit; the second counting bits are different, and the first counting bits are the same as the other counting bits and are used for indicating input data of the same input position of different butterfly units.
In practical application, when each counter starts counting from 0 and each counter counts to the upper limit value, the storage unit is completely traversed, the processing of the butterfly unit at the current stage is completed, each counter returns to zero, and the input data of the butterfly unit at the next stage starts to be counted. Here, the upper limit value of each counter is defined by the basis N of each stage of butterfly unitsAnd (6) determining.
In some embodiments, the m bits sequentially correspond to the m levels of butterfly units from top to bottom, and each bit is set to be a base N of each level of butterfly units(ii) a The first counting bit is a bit corresponding to a first-level butterfly unit which is executing DFT parallel processing; when the base of the first-stage butterfly unit corresponding to the first counting bit independently forms the maximum parallelism, the second counting bit is any one bit of the m bits except the first counting bit;
when the base of the first-stage butterfly unit corresponding to the first counting bit does not form the maximum parallelism, or the base of the first-stage butterfly unit corresponding to the second counting bit and the bases of other-stage butterfly units form the maximum parallelism together, the base of the first-stage butterfly unit corresponding to the second counting bit can be divided by the parallelism of the first-stage butterfly unit corresponding to the first counting bit, and the base of the first-stage butterfly unit corresponding to the second counting bit is used for forming the maximum parallelism.
Correspondingly, the parallel addressing rule comprises the following steps:
accumulating the second counting bits from 0 to Ns-1 on the basis of the initial state information of the m counters to obtain the state information of the m counters of at least one group of input data, and carrying to other counting bits if the second counting bits meet the carry condition;
ns is the basis of the first-level butterfly unit that is performing DFT parallel processing.
Illustratively, for the s-th stage butterfly unit, byTo ns(first count bit) counts from 0 to Ns-1 as input to the same butterfly unit. By pairing nt(second counting bit) accumulating the corresponding bank address of the count as the input of different butterfly units, nsCarry the system to Ns,ntCarry the system to Nt. Wherein N issFor the s-th butterfly unit, when NsWhen composing bank alone, NtCan be the basis of any other butterfly unit, when N issWhen there is no bank formed or no bank formed alone, NtTo be able to be PsTrimmed and used to make up the bank base. That is, the l1(l1=0,1,...,Nt-1) first butterfly unit2(l2=0,1,...,Ns-1) the state information of the m counters corresponding to the input addresses is (n)0,...ns-1,l2,...,nt-1,l1,...,nm-1). And mapping the state of the counter into a corresponding bank address (namely a storage address) by using an address mapping rule, reading data in the bank according to the bank address and inputting the data into a corresponding position of the butterfly unit, and writing back the original address after the butterfly unit finishes processing. By pairing ntAnd accumulating and carrying to other counters can realize parallel input of other groups until all data are traversed, and the processing of the s-th-stage butterfly unit is completed.
E.g. 54-2 x 3 x 9, for the third pole butterfly unit Ns=9,ntEither the first bit or the second bit, i.e., NtWhich may be radix 2 of the first stage butterfly unit or radix 3 of the second stage butterfly unit. For the first stage butterfly unit Ns2 and second stage butterfly unit Ns=3,ntAre all the third position, i.e. NtAre the bases 9 of the third-level butterfly unit.
Step 303: determining the storage address of the at least one group of input data according to the state information of the m counters and a preset address mapping rule; wherein the memory address comprises a memory cell identification and a memory cell offset address;
here, an input terminal of one butterfly unit corresponds to one state information of m counters, an address mapping rule makes the state information and the storage space establish a one-to-one mapping relationship, and one state information is mapped onto one storage space by using the address mapping rule, that is, one data can be read from the storage space according to one state information and input to an input terminal of the corresponding butterfly unit.
That is to say, when the input of one or more butterfly units of each stage of butterfly unit is obtained, the state information of at least one group of m counters is set through a parallel value-taking rule, the state information of at least one group of m counters is mapped into at least one group of storage addresses through an address mapping rule, and then the storage units are read in parallel to obtain at least one group of input data.
Here, the set of input data is input data of one butterfly unit.
In some embodiments, the address mapping rule comprises: determining the storage unit identification of the at least one group of input data according to the parallelism of each level of butterfly unit, the state information of the m counters and the maximum parallelism of the storage unit; and determining the offset address of the storage unit of the at least one group of input data according to the base of each stage of butterfly unit and the state information of the m counters.
In some embodiments, the method further comprises: decomposing the DFT operation into m-level butterfly operation according to the DFT data length, and determining the basis of the m-level butterfly unit; determining the maximum parallelism of a storage unit according to the basis of the m-level butterfly unit; and determining the parallelism of the m-level butterfly units according to the maximum parallelism of the storage units and the bases of the m-level butterfly units.
Specifically, the determining the maximum parallelism of the storage unit according to the m-level butterfly unit bases includes: taking the maximum base in the m-level butterfly unit bases as the maximum parallelism of the storage unit; or, taking the product of the bases of at least two levels of butterfly units in the m-level butterfly unit bases as the maximum parallelism of the storage unit; wherein the maximum parallelism is greater than or equal to each level of butterfly unit basis.
Namely Nb=NiOr Nb=Ni*Nj,NiAnd NjRelatively prime, Nb≥Nd,d=0,1,2,…,m-1 (1)
Wherein N isbTo maximize parallelism, NiAnd NjTo form NbThe basis of the butterfly unit of (1), NdFor the base of the d-th level butterfly unit, m represents the number of levels of decomposition, and the definition set B ═ { i } or B ═ i, j } represents the set of number of levels of level numbers used to compose the maximum parallelism of the memory cell.
Specifically, the determining the parallelism of the m-level butterfly units according to the maximum parallelism of the storage unit and the basis of the m-level butterfly units includes: acquiring at least one parallelism in which the product of the product and the basis of each stage of butterfly unit is less than or equal to the maximum parallelism; selecting one parallelism which can be evenly divided by the basis of the butterfly unit of the target level from the at least one parallelism as the parallelism of the butterfly unit of each level; and the target-level butterfly unit is a first-level butterfly unit forming the maximum parallelism.
I.e. the presence of i e B such that
Figure BDA0002976690890000051
And P isdNd≤Nb (2)
Illustratively, for 54-2-3-9, the maximum parallelism is 9, then the parallelism of stage1 may be divided by 9 to include 1, 3 and 9, but 9 does not satisfy the above condition, and the larger parallelism is taken from 1 and 3 as the parallelism of stage 1. In the same way, the parallelism of the three levels is 3, 3 and 1 respectively.
Specifically, the m-level butterfly units have the same prime as any two levels of butterfly units; or when the bases of the two levels of butterfly units are not mutually prime in the m levels of butterfly units, the two levels of butterfly units are set to be two continuous levels.
Illustratively, for example, for 54 ═ 2 × 27, it is necessary to further decompose 27 into 3 and 9, while 3 and 9 are not coprime and need to be solved with a CFA scheme, and for hardware processing, several bases that are not coprime should be placed in a sequential order of positions.
In other embodiments, the method further comprises: presetting a mapping relation of at least one DFT data length, the base of the m-level butterfly unit and the parallelism of the m-level butterfly unit; and determining the basis of the m-level butterfly units and the parallelism of the m-level butterfly units according to the mapping relation and the DFT data length.
That is to say, the parallelism of the radix of the m-level butterfly unit and the parallelism of the m-level butterfly unit corresponding to different data lengths can be predetermined according to the determination method, the mapping relationship is established and stored, and the mapping relationship is directly searched according to the DFT data length after the DFT parallel processing is performed, so that the parallelism of the radix of the m-level butterfly unit and the parallelism of the m-level butterfly unit is obtained.
Specifically, the address mapping rule includes:
Figure BDA0002976690890000061
Figure BDA0002976690890000062
wherein b is a memory cell identification, PiIs the parallelism of the i-th stage butterfly unit, niIs the state information of the ith counter, NbFor maximum parallelism, a is the offset address of the memory cell, B is the set of number of butterfly cell stages forming the maximum parallelism, ndIs the state information of the d-th counter, NsIs the basis of the s-th stage butterfly unit.
Step 304: reading at least one group of input data from the storage unit in parallel according to the storage address;
specifically, the memory is composed of a plurality of banks (i.e., memory cells), and N may be usedbAnd executing the DTF parallel processing by each bank, wherein the depth of each bank is D, and the bit width of each address is B bits. In each clock cycle, only one memory space of each bank can be read and written, namely the maximum parallelism of reading and writing is Nb. According to the NR upstream DFT/FFT protocol the memory comprises 16 banks, each bank comprising 256 memory spaces, and up to 4096 operands can be stored.
At most N can be read in one clock cyclebAnd the number is used as input data of the first-stage butterfly unit.
Step 305: at least one group of input data is parallelly sent to at least one butterfly unit of each stage of butterfly unit for parallel processing, and at least one group of output data is obtained;
step 306: and writing at least one group of output data into the storage space corresponding to the input data according to the original storage address.
The above embodiment provides a parallel access process to the memory unit in a one-time parallel processing process of the first-level butterfly unit, and in practical application, the parallel access to the memory unit is realized by adopting the parallel addressing scheme and the address mapping method in each other parallel processing process.
In practical applications, after writing the at least one set of output data into the storage space corresponding to the input data, the method further includes: determining the state information of m counters of at least one new group of input data according to a preset parallel address-taking rule; when the state information of the m counters is not preset state information, acquiring at least one new group of input data according to the state information of the m counters; and when the state information of the m counters is preset state information, determining that the processing of the butterfly unit at the current stage is finished.
And when the butterfly unit at the current stage is determined to be processed completely, continuing to execute the butterfly unit at the next stage until the butterfly unit at the last stage is processed completely.
For example, the preset state information may be state information of m counters when the memory cell starts to be traversed, for example, each counter in the preset state information is 0.
FIG. 4 is a first block diagram of DFT parallel processing in the embodiment of the present application, and as shown in FIG. 4, DFT parallel processing is performed by m-level butterfly units, where the i-th level butterfly unit includes PiEach butterfly unit, i-th stage butterfly unit reads P from the storage unit in paralleliGroup input data, and combine PiThe group output data is written back to the memory cell according to the original address.
FIG. 5 is a diagram illustrating a second framework of DFT parallel processing in the embodiment of the present application, and as shown in FIG. 5, the DFT parallel processing is performed by m-level butterfly units, where the m-level butterfly units include a level 1 butterfly unit to an m-level butterfly unit, and the level 1 butterfly unit includes a P level0A butterfly unit of the 2 nd level including P1A butterfly unit of m-th level including Pm-1A butterfly unit. And inputting data from the storage unit in parallel by each level of butterfly unit according to the read-write flow shown in fig. 4, and writing the output data back to the storage unit according to the original address until the processing of the mth level of butterfly unit is completed.
Here, the execution subject of steps 301 to 306 may be a processor of an electronic device that performs the DFT parallel processing operation.
By adopting the technical scheme, conflict-free parallel access to a plurality of storage units is realized through the m counters, the parallel address-fetching rule and the address mapping rule, so that the parallel processing efficiency of DFT is realized, and the DFT processing delay is reduced.
To further illustrate the object of the present application based on the above embodiments of the present application, as shown in fig. 6, the method specifically includes:
step 601: determining m-level butterfly units for executing Discrete Fourier Transform (DFT) parallel processing; each stage of butterfly unit comprises at least one butterfly unit;
in some embodiments, the method further comprises: decomposing the DFT operation into m-level butterfly operation according to the DFT data length, and determining the basis of the m-level butterfly unit; determining the maximum parallelism of a storage unit according to the basis of the m-level butterfly unit; and determining the parallelism of the m-level butterfly units according to the maximum parallelism of the storage units and the bases of the m-level butterfly units.
Specifically, the determining the maximum parallelism of the storage unit according to the m-level butterfly unit bases includes: taking the maximum base in the m-level butterfly unit bases as the maximum parallelism of the storage unit; or, taking the product of the bases of at least two levels of butterfly units in the m-level butterfly unit bases as the maximum parallelism of the storage unit; wherein the maximum parallelism is greater than or equal to each level of butterfly unit basis.
Specifically, the determining the parallelism of the m-level butterfly units according to the maximum parallelism of the storage unit and the basis of the m-level butterfly units includes: acquiring at least one parallelism in which the product of the product and the basis of each stage of butterfly unit is less than or equal to the maximum parallelism; selecting one parallelism which can be evenly divided by the basis of the butterfly unit of the target level from the at least one parallelism as the parallelism of the butterfly unit of each level; and the target-level butterfly unit is a first-level butterfly unit forming the maximum parallelism.
Specifically, the m-level butterfly units have the same prime as any two levels of butterfly units; or when the bases of the two levels of butterfly units are not mutually prime in the m levels of butterfly units, the two levels of butterfly units are set to be two continuous levels.
In other embodiments, the method further comprises: presetting a mapping relation of at least one DFT data length, the base of the m-level butterfly unit and the parallelism of the m-level butterfly unit; and determining the basis of the m-level butterfly units and the parallelism of the m-level butterfly units according to the mapping relation and the DFT data length.
Step 602, determining state information of m counters of at least one group of input data input in parallel by each stage of butterfly unit according to a preset parallel address-taking rule;
here, m counters are used as the basis for reading the memory cells, and the number of the memory cells to be read and written and the offset address inside the storage unit are calculated according to the state information of the counters and the address mapping rule. The method comprises the steps that a plurality of storage units (bank) are included, each storage unit comprises a plurality of storage spaces, the number of each storage unit is used for uniquely identifying one storage unit, and the offset address is used for identifying one storage space in one storage unit.
Specifically, the state information of the m counters includes m bits; the m bits comprise a first counting bit, a second counting bit and other counting bits; the first counting bits are different, and the second counting bits are the same as the other counting bits and are used for indicating input data of different input positions of the same butterfly unit; the second counting bits are different, and the first counting bits are the same as the other counting bits and are used for indicating input data of the same input position of different butterfly units.
In practical application, when each counter starts counting from 0 and each counter counts to the upper limit value, the storage unit is completely traversed, the processing of the butterfly unit at the current stage is completed, each counter returns to zero, and the input data of the butterfly unit at the next stage starts to be counted. Here, the upper limit value of each counter is defined by the basis N of each stage of butterfly unitsAnd (6) determining.
In some embodiments, the m bits sequentially correspond to the m levels of butterfly units from top to bottom, and each bit is set to be a base N of each level of butterfly units(ii) a The first counting bit is a bit corresponding to a first-level butterfly unit which is executing DFT parallel processing; when the base of the first-stage butterfly unit corresponding to the first counting bit independently forms the maximum parallelism, the second counting bit is any one bit of the m bits except the first counting bit;
when the base of the first-stage butterfly unit corresponding to the first counting bit does not form the maximum parallelism, or the base of the first-stage butterfly unit corresponding to the second counting bit and the bases of other-stage butterfly units form the maximum parallelism together, the base of the first-stage butterfly unit corresponding to the second counting bit can be divided by the parallelism of the first-stage butterfly unit corresponding to the first counting bit, and the base of the first-stage butterfly unit corresponding to the second counting bit is used for forming the maximum parallelism.
Correspondingly, the parallel addressing rule comprises the following steps:
accumulating the second counting bits from 0 to Ns-1 on the basis of the initial state information of the m counters to obtain the state information of the m counters of at least one group of input data, and carrying to other counting bits if the second counting bits meet the carry condition;
ns is the basis of the first-level butterfly unit that is performing DFT parallel processing.
Here, an input terminal of one butterfly unit corresponds to one state information of m counters, an address mapping rule makes the state information and the storage space establish a one-to-one mapping relationship, and one state information is mapped onto one storage space by using the address mapping rule, that is, one data can be read from the storage space according to one state information and input to an input terminal of the corresponding butterfly unit.
That is to say, when the input of one or more butterfly units of each stage of butterfly unit is obtained, the state information of at least one group of m counters is set through a parallel value-taking rule, the state information of at least one group of m counters is mapped into at least one group of storage addresses through an address mapping rule, and then the storage units are read in parallel to obtain at least one group of input data.
Step 603: judging whether the state information of the m counters is preset state information, if so, executing step 608; if not, go to step 604;
step 604: determining the storage address of the at least one group of input data according to the state information of the m counters and a preset address mapping rule; wherein the memory address comprises a memory cell identification and a memory cell offset address;
here, the set of input data is input data of one butterfly unit.
In some embodiments, the address mapping rule comprises: determining the storage unit identification of the at least one group of input data according to the parallelism of each level of butterfly unit, the state information of the m counters and the maximum parallelism of the storage unit; and determining the offset address of the storage unit of the at least one group of input data according to the base of each stage of butterfly unit and the state information of the m counters.
Step 605: reading at least one group of input data from the storage unit in parallel according to the storage address;
step 606: sending the at least one group of input data to at least one butterfly unit of each stage of butterfly unit in parallel for parallel processing, and outputting at least one group of output data;
step 607: writing the at least one group of output data into a storage space corresponding to the input data according to the original storage address; and returns to step 602 to continue to perform data processing of the butterfly unit in the current stage until all data of the storage unit is traversed.
Step 608: determining that the processing of the butterfly unit at the current stage is finished;
step 609: judging whether the current stage is the last stage, if so, executing step 610, and if not, returning to step 602;
here, the determination method may be determined according to the identification information of the current stage, or the number of stages currently processed may be recorded by setting a counter, and the determination may be performed according to the state of the counter, and other determinations commonly used by those skilled in the art may be applied in the present application.
Step 610: and determining that the m-level butterfly unit processing is finished.
To further illustrate the object of the present application based on the above embodiments of the present application, as shown in fig. 7, the method specifically includes:
step 701: performing address decomposition on each data in DFT data to be processed to obtain state information of m counters corresponding to each data;
a set of basis decompositions for DFT can be represented as
Figure BDA0002976690890000091
N is the DFT data length, m represents the number of decomposed stages with the number of stages being 0, 1, …, m-1, and N is each stageiAre relatively prime. For m2, PFA is decomposed into
Figure BDA0002976690890000092
Wherein,
Figure BDA0002976690890000093
PFA has a lower inter-stage twiddle factor than CFA, as twiddle factor.
The corresponding address resolution can be expressed as
Figure BDA0002976690890000094
Wherein n isiAddressing each data sequenceNumber ni=0,1,…,Ni-1,niCorresponding weight is piAnd the product of all other levels of butterfly bases, where piSatisfy the requirement of
Figure BDA0002976690890000095
Correspondingly have
Figure BDA0002976690890000096
Represents N to NiThe modulus is taken. For example, 18 ═ 2 × 9, and 10 ═ 10, with 10 ═ 0, 1. For stage i, N in butterflyiDecomposition of individual addresses (n)0,n1,…,nm-1) Of only niIs different. For example, 18 ═ 2 × 9, stage0 base 2, divided into 9 groups, 0 ═ 0, (0, 0) and 9 ═ 1, 0 in a butterfly base; stage1 is divided into 2 groups, 0 is (0, 0), 1 is (0, 1), 2 is (0, 2), …, 8 is (0, 8) in a butterfly group. Accordingly, the resolution of the output address may be expressed as
Figure BDA0002976690890000097
Wherein k isi=0,1,…,Ni-1,kiThe corresponding weight is the product of all other levels of butterfly bases.
When the number of DFT points is high, the radix of each level of butterfly unit is also large, and for a large butterfly radix, the large butterfly radix needs to be further decomposed into small butterfly bases for processing, but this cannot guarantee that the radix of each level of small butterfly units is relatively prime. For example, for 54 ═ 2 × 27, it is necessary to further decompose 27 into 3 and 9, while 3 and 9 are not coprime and need to be solved with a CFA solution. I.e., PFA scheme between large stages and CFA scheme inside large stages. For this PFA and CFA mixed address scheme, the aforementioned address resolution scheme needs further modification. For m is 3, N is N0N1N2,N0N1Non co prime, N0N1And N2Coprime, the new address resolution scheme can be expressed as
Figure BDA0002976690890000101
Wherein
Figure BDA0002976690890000102
Correspondingly have
Figure BDA0002976690890000103
For the output have
Figure BDA0002976690890000104
Note that inside the CFA, the output is the inverted order of the input. For hardware processing convenience, several bases that are not coprime should be placed in sequential order of positions.
Step 702: determining the storage address of each datum according to the state information of the m counters and a preset address mapping rule;
here, m counters are used as the basis for reading the memory cells, and the number of the memory cells to be read and written and the offset address inside the storage unit are calculated according to the state information of the counters and the address mapping rule. The method comprises the steps that a plurality of storage units (bank) are included, each storage unit comprises a plurality of storage spaces, the number of each storage unit is used for uniquely identifying one storage unit, and the offset address is used for identifying one storage space in one storage unit.
Specifically, the state information of the m counters includes m bits; the m bits comprise a first counting bit, a second counting bit and other counting bits; the first counting bits are different, and the second counting bits are the same as the other counting bits and are used for indicating input data of different input positions of the same butterfly unit; the second counting bits are different, and the first counting bits are the same as the other counting bits and are used for indicating input data of the same input position of different butterfly units.
In practical application, when each counter starts counting from 0 and each counter counts to the upper limit value, the memory cell is completely traversed and currentlyAnd finishing the processing of the butterfly units, enabling each counter to return to zero, and starting to count the input data of the butterfly unit at the next stage. Here, the upper limit value of each counter is defined by the basis N of each stage of butterfly unitsAnd (6) determining.
In some embodiments, the m bits sequentially correspond to the m levels of butterfly units from top to bottom, and each bit is set to be a base N of each level of butterfly units(ii) a The first counting bit is a bit corresponding to a first-level butterfly unit which is executing DFT parallel processing, and the parallelism of the first-level butterfly unit corresponding to the second counting bit can be divided by the radix of the first-level butterfly unit forming the maximum parallelism.
Here, an input terminal of one butterfly unit corresponds to one state information of m counters, an address mapping rule makes the state information and the storage space establish a one-to-one mapping relationship, and one state information is mapped onto one storage space by using the address mapping rule, that is, one data can be read from the storage space according to one state information and input to an input terminal of the corresponding butterfly unit.
That is to say, when the input of one or more butterfly units of each stage of butterfly unit is obtained, the state information of at least one group of m counters is set through a parallel value-taking rule, the state information of at least one group of m counters is mapped into at least one group of storage addresses through an address mapping rule, and then the storage units are read in parallel to obtain at least one group of input data.
Here, the set of input data is input data of one butterfly unit.
In some embodiments, the address mapping rule comprises: determining the storage unit identification of the at least one group of input data according to the parallelism of each level of butterfly unit, the state information of the m counters and the maximum parallelism of the storage unit; and determining the offset address of the storage unit of the at least one group of input data according to the base of each stage of butterfly unit and the state information of the m counters.
In some embodiments, the method further comprises: decomposing the DFT operation into m-level butterfly operation according to the DFT data length, and determining the basis of the m-level butterfly unit; determining the maximum parallelism of a storage unit according to the basis of the m-level butterfly unit; and determining the parallelism of the m-level butterfly units according to the maximum parallelism of the storage units and the bases of the m-level butterfly units.
Specifically, the determining the maximum parallelism of the storage unit according to the m-level butterfly unit bases includes: taking the maximum base in the m-level butterfly unit bases as the maximum parallelism of the storage unit; or, taking the product of the bases of at least two levels of butterfly units in the m-level butterfly unit bases as the maximum parallelism of the storage unit; wherein the maximum parallelism is greater than or equal to each level of butterfly unit basis.
Namely Nb=NiOr Nb=Ni*Nj,NiAnd NjRelatively prime, Nb≥Nd,d=0,1,2,…,m-1 (1)
Wherein N isbTo maximize parallelism, NiAnd NjTo form NbThe basis of the butterfly unit of (1), NdFor the base of the d-th level butterfly unit, m represents the number of levels of decomposition, and the definition set B ═ { i } or B ═ i, j } represents the set of number of levels of level numbers used to compose the maximum parallelism of the memory cell.
Specifically, the determining the parallelism of the m-level butterfly units according to the maximum parallelism of the storage unit and the basis of the m-level butterfly units includes: acquiring at least one parallelism in which the product of the product and the basis of each stage of butterfly unit is less than or equal to the maximum parallelism; selecting one parallelism which can be evenly divided by the basis of the butterfly unit of the target level from the at least one parallelism as the parallelism of the butterfly unit of each level; and the target-level butterfly unit is a first-level butterfly unit forming the maximum parallelism.
I.e. the presence of i e B such that
Figure BDA0002976690890000111
And P isdNd≤Nb (2)
Illustratively, for 54-2-3-9, the maximum parallelism is 9, then the parallelism of stage1 may be divided by 9 to include 1, 3 and 9, but 9 does not satisfy the above condition, and the larger parallelism is taken from 1 and 3 as the parallelism of stage 1. In the same way, the parallelism of the three levels is 3, 3 and 1 respectively.
Specifically, the m-level butterfly units have the same prime as any two levels of butterfly units; or when the bases of the two levels of butterfly units are not mutually prime in the m levels of butterfly units, the two levels of butterfly units are set to be two continuous levels.
In other embodiments, the method further comprises: presetting a mapping relation of at least one DFT data length, the base of the m-level butterfly unit and the parallelism of the m-level butterfly unit; and determining the basis of the m-level butterfly units and the parallelism of the m-level butterfly units according to the mapping relation and the DFT data length.
That is to say, the parallelism of the radix of the m-level butterfly unit and the parallelism of the m-level butterfly unit corresponding to different data lengths can be predetermined according to the determination method, the mapping relationship is established and stored, and the mapping relationship is directly searched according to the DFT data length after the DFT parallel processing is performed, so that the parallelism of the radix of the m-level butterfly unit and the parallelism of the m-level butterfly unit is obtained.
Specifically, the address mapping rule is referred to equations (3) and (4).
Step 703: storing DFT data to be processed into a storage unit according to the storage address;
it should be noted that the data writing order and the writing method (including parallel writing and serial writing) are set by an input addressing rule, the input addressing rule is defined by an input interface, and the state information of the m counters can be set according to the input addressing rule.
That is, before performing DFT parallel processing, the m-level butterfly unit needs to write N number N to 0, 1, N-1 into the memory unit, and decompose the address N into an m-bit counter (N)0,n1,…,nm-1). Specifically, when N counts from 0 to N-1, if NiAnd Ni+1Relatively prime, then niAccumulating and modulus taking each time; if N is presentiAnd Ni+1Not co-prime, then ni+1Each time of carry, niAn accumulation modulo is performed. Then calculating the bank number and the ba according to the formulas (3) and (4)nk internal offset address, and write the input data in sequence.
Step 704: performing DFT parallel processing;
here, the DFT parallel processing is an m-level butterfly operation performed by the m-level butterfly unit in the above embodiment of the present application, and is not described herein again.
Step 705: determining that the processing of the m-level butterfly units is finished, and determining state information of m counters outputting data according to a preset output addressing rule;
here, the data reading order and the reading manner (including parallel reading and serial reading) are set by an output addressing rule which is specified by the input interface, and the state information of the m counters can be set according to the output addressing rule.
Step 706: determining a storage address of the output data according to the state information of the m counters and a preset address mapping rule;
here, the specific address mapping method is referred to as step 702.
Step 707: and reading the output data from the memory cell according to the memory address.
That is, the m-stage butterfly unit sets the state information of the m counters according to a preset readout order after performing DFT parallel processing, and reads out output data.
Then, according to the parallel address-fetching rule and address mapping rule, proving the address number nxAnd the mapped bank address (b)x,ax) Is one-to-one, proving for the s-th level PsP of butterfly-shaped unitsNsAnd the mapped bank numbers of the input data are different.
(1) Address n for any two different dataxAnd nyThe corresponding address resolution is respectively
Figure BDA0002976690890000121
And
Figure BDA0002976690890000122
mapped bank address (b)x,ax) And (b)y,ay) As well as different.
For Nb=NiTo make ax=ayNeed to have
Figure BDA0002976690890000123
Namely in addition to
Figure BDA0002976690890000124
At ax=ayOn the premise of (b)x=byNeed to have
Figure BDA0002976690890000125
Namely, it is
Figure BDA0002976690890000126
This is in combination with nx≠nyAnd (4) contradiction.
To Nb=Ni*NjTo make ax=ayNeed to have
Figure BDA0002976690890000127
Namely in addition to
Figure BDA0002976690890000128
At ax=ayOn the premise of (b)x=byNeed to have
Figure BDA0002976690890000129
Due to Pi=Nj,Pj=NiIt can also be derived that
Figure BDA00029766908900001210
And is
Figure BDA00029766908900001211
This is in combination with nx≠nyAnd (4) contradiction.
(2) Setting the address n of two data taken out in the same clock cyclexAnd nyCorresponding address resolution respectivelyIs composed of
Figure BDA00029766908900001212
And
Figure BDA00029766908900001213
to make bank number bx=byNeed to have
Figure BDA00029766908900001214
Namely, it is
Figure BDA00029766908900001215
Figure BDA00029766908900001216
In view of
Figure BDA00029766908900001217
Figure BDA00029766908900001218
Need to have
Figure BDA00029766908900001219
Or
Figure BDA00029766908900001220
(or-N)b)。
Take into account
Figure BDA00029766908900001221
NtAnd PtRelatively prime, so PsAnd PtThe quality of the mixture is relatively prime,
Figure BDA00029766908900001222
if P t1, with Nt=NbIs provided with
Figure BDA00029766908900001223
Is provided with
Figure BDA0002976690890000131
If PtNot equal to 1, having PtNt=Nb,Ps=NtHaving P oftPs=Nb,NbIs PtAnd PsThe least common multiple of
Figure BDA0002976690890000132
Obtaining the syndrome.
Next, data reading is exemplified according to the parallel address fetching rule and the address mapping rule. For example, for 54 x 2 x 3 x 9, the bank number NbWhen the parallelism is 3, 3 and 1 in sequence, the bank mapping result is shown in table (1), wherein the columns represent bank identifiers, the rows represent offset addresses in the banks, (only the banks and the storage space with data are involved, the actual memory is much larger, the highest support is 16 banks, and each bank comprises 256 storage spaces)
Watch (1)
Figure BDA0002976690890000133
For example, for 54 ═ 2 × 3 × 9, NbThe parallelism of 3 stages is 3, 3, 1 in order, 9. The parallel addressing order of Stage1 (radix 3) is shown in table (2), and the same symbol in the lower right corner represents the input data of multiple parallel butterfly bases fetched in the same clock cycle.
Watch (2)
Figure BDA0002976690890000134
And the corresponding counter (number in parentheses) is changed to
Reading 1 (denoted by symbol:)
First set of data (corresponding to butterfly unit 0): 0(0,0,0)36(0,1,0)18(0,2,0)
Second set of data (corresponding to butterfly 1): 28(0,0,1)10(0,1,1)46(0,2,1)
Third set of data (corresponding to butterfly 2): 2(0,0,2)38(0,1,2)20(0,2,2)
Read 2 (symbolized by:)
First set of data (corresponding to butterfly unit 0): 30(0,0,3)12(0,1,3)48(0,2,3)
Second set of data (corresponding to butterfly 1): 4(0,0,4)40(0,1,4)22(0,2,4)
Third set of data (corresponding to butterfly 2): 32(0,0,5)14(0,1,5)50(0,2,5)
Read 3 rd time (symbolized by c)
First set of data (corresponding to butterfly unit 0): 6(0,0,6)42(0,1,6)24(0,2,6)
Second set of data (corresponding to butterfly 1): 34(0,0,7)16(0,1,7)52(0,2,7)
Third set of data (corresponding to butterfly 2): 8(0,0,8)44(0,1,8)26(0,2,8)
Reading 4 (indicated by the symbol r)
First set of data (corresponding to butterfly unit 0): 27(1,0,0)9(1,1,0)45(1,2,0)
Second set of data (corresponding to butterfly 1): 1(1,0,1)37(1,1,1)19(1,2,1)
Third set of data (corresponding to butterfly 2): 29(1,0,2)11(1,1,2)47(1,2,2)
Read 5 (symbolized by a number of five)
First set of data (corresponding to butterfly unit 0): 3(1,0,3)39(1,1,3)21(1,2,3)
Second set of data (corresponding to butterfly 1): 31(1,0,4)13(1,1,4)49(1,2,4)
Third set of data (corresponding to butterfly 2): 5(1,0,5)41(1,1,5)23(1,2,5)
Reading at 6 th time (symbolized by:)
First set of data (corresponding to butterfly unit 0): 33(1,0,6)15(1,1,6)51(1,2,6)
Second set of data (corresponding to butterfly 1): 7(1,0,7)43(1,1,7)25(1,2,7)
Third set of data (corresponding to butterfly 2): 35(1,0,8)17(1,1,8)53(1,2,8)
The parallel addressing order of Stage0 (radix 2) is shown in table (3), and the same symbol in the lower right corner represents the input data of multiple parallel butterfly bases fetched in the same clock cycle.
Watch (3)
Figure BDA0002976690890000141
1 st time
The 0 th butterfly unit 0(0, 0, 0)27(1, 0, 0)
Butterfly unit 1 (0, 0, 1)1(1, 0, 1)
Butterfly unit 2(0, 0, 2)29(1, 0, 2)
2 nd time
The 0 th butterfly unit 30(0, 0, 3)3(1, 0, 3)
Butterfly unit 14(0, 0, 4)31(1, 0, 4)
The 2 nd butterfly unit 32(0, 0, 5)5(1, 0, 5)
3 rd time
Butterfly unit 0(0, 0, 6)33(1, 0, 6)
Butterfly unit 1 (0, 0, 7)7(1, 0, 7)
Butterfly unit 28(0, 0, 8)35(1, 0, 8)
4 th time
Butterfly unit 0(0, 1, 0)9(1, 1, 0)
Butterfly unit 1 (0, 1, 1)37(1, 1, 1)
Butterfly unit 2(0, 1, 2)11(1, 1, 2)
5 th time
Butterfly unit 0(0, 1, 3)36(1, 1, 3)
Butterfly unit 1 (0, 1, 4)10(1, 1, 4)
Butterfly unit 2(0, 1, 5)38(1, 1, 5)
6 th time
Butterfly unit 0(0, 1, 6)15(1, 1, 6)
Butterfly unit 1 (0, 1, 7)43(1, 1, 7)
Butterfly unit 2(0, 1, 8)17(1, 1, 8)
7 th time
The 0 th butterfly unit 18(0, 2, 0)45(1, 2, 0)
Butterfly unit 1 (0, 2, 1)19(1, 2, 1)
The 2 nd butterfly unit 20(0, 2, 2)47(1, 2, 2)
8 th time
Butterfly unit 0(0, 2, 3)21(1, 2, 3)
Butterfly unit 1 (0, 2, 4)49(1, 2, 4)
Butterfly unit 2 50(0, 2, 5)23(1, 2, 5)
9 th time
Butterfly unit 0(0, 2, 6)51(1, 2, 6)
Butterfly unit 1 (0, 2, 7)25(1, 2, 7)
Butterfly unit 2(0, 2, 8)53(1, 2, 8)
The parallel addressing order of Stage0 (radix 2) is shown in table (4), and the same symbol in the lower right corner represents the input data of multiple parallel butterfly bases fetched in the same clock cycle.
Watch (4)
Figure BDA0002976690890000151
1 st time
The 0 th butterfly unit 0(0, 0, 0), 28(0, 0, 1), 2(0, 0, 2), 30(0, 0, 3), 4(0, 0, 4), 32(0, 0, 5), 6(0, 0, 6), 34(0, 0, 7), 8(0, 0, 8)
2 nd time
Butterfly unit 0 36(0, 1, 0), 10(0, 1, 1), 38(0, 1, 2), 12(0, 1, 3), 40(0, 1, 4), 14(0, 1, 5), 42(0, 1, 6), 16(0, 1, 7), 44(0, 1, 8)
3 rd time
Butterfly unit 0, 18(0, 2, 0), 46(0, 2, 1), 20(0, 2, 2), 48(0, 2, 3), 22(0, 2, 4), 50(0, 2, 5), 24(0, 2, 6), 52(0, 2, 7), 26(0, 2, 8)
4 th time
The 0 th butterfly unit 27(1, 0, 0), 1(1, 0, 1), 29(1, 0, 2), 3(1, 0, 3), 31(1, 0, 4), 5(1, 0, 5), 33(1, 0, 6), 7(1, 0, 7), 35(1, 0, 8)
5 th time
Butterfly unit 0(1, 1, 0), 37(1, 1, 1), 11(1, 1, 2), 39(1, 1, 3), 13(1, 1, 4), 41(1, 1, 5), 15(1, 1, 6), 43(1, 1, 7), 17(1, 1, 8)
6 th time
The 0 th butterfly unit 45(1, 2, 0), 19(1, 2, 1), 47(1, 2, 2), 21(1, 2, 3), 49(1, 2, 4), 23(1, 2, 5), 51(1, 2, 6), 25(1, 2, 7), 53(1, 2, 8).
The DFT parallel processing scheme provided by the application can be applied to the discrete Fourier transform processing of PFA and CFA, and compared with CFA, the value range of the basis of the butterfly unit of each stage of PFA is less (the basis of each stage of the PFA is relatively prime), so that N is formed by the basis of a plurality of butterfly unitsbThe proposed address mapping scheme and parallel processing scheme may also be applied to CFAs (the bases of each level of butterfly unit of a CFA may not be co-prime).
In order to implement the method of the embodiment of the present application, based on the same inventive concept, an embodiment of the present application further provides a DFT parallel processing apparatus, as shown in fig. 8, the apparatus includes: a processing unit 801, an address management unit 802, and a plurality of storage units 803; wherein,
the processing unit 801 comprises a plurality of butterfly units of different bases;
the processing unit 801 is configured to determine an m-level butterfly unit that performs discrete fourier transform DFT parallel processing; each stage of butterfly unit comprises at least one butterfly unit;
the address management unit 802 is configured to determine, according to a preset parallel address fetching rule, state information of m counters of at least one set of input data input in parallel by each stage of butterfly unit; determining the storage address of the at least one group of input data according to the state information of the m counters and a preset address mapping rule; wherein the memory address comprises a memory cell identification and a memory cell offset address;
the processing unit 801 is configured to read the at least one set of input data from the storage unit 803 in parallel according to the storage address; sending the at least one group of input data to at least one butterfly unit of each stage of butterfly unit in parallel for parallel processing, and outputting at least one group of output data;
the processing unit 801 is further configured to write the at least one set of output data into a storage space corresponding to the input data according to the original storage address.
Illustratively, the basis of butterfly units required by the present application includes 2, 4, 8, 16, 3, 9, 5.
The address management unit comprises a plurality of counters, can set the states of a plurality of technical devices according to the parallel address-fetching rule, and calculates the bank identification and the offset in the bank which need to be read and written according to the states of the counters and the address mapping rule.
The device comprises a plurality of banks (i.e. memory cells), and N can be usedbAnd executing the DTF parallel processing by each bank, wherein the depth of each bank is D, and the bit width of each address is B bits. In each clock cycle, only one memory space of each bank can be read and written, namely the maximum parallelism of reading and writing is Nb. According to the NR upstream DFT/FFT protocol the memory comprises 16 banks, each bank comprising 256 memory spaces, and up to 4096 operands can be stored.
In some embodiments, the address management unit 802 is further configured to determine, according to a preset parallel addressing rule, state information of m counters of at least one new set of input data; when the state information of the m counters is not preset state information, acquiring at least one new group of input data according to the state information of the m counters; and when the state information of the m counters is preset state information, determining that the processing of the butterfly unit at the current stage is finished.
In some embodiments, the address management unit 802 is further configured to determine that the processing of the m-level butterfly unit is completed, and determine state information of m counters outputting data according to a preset output addressing rule; determining a storage address of the output data according to the state information of the m counters and a preset address mapping rule;
the processing unit 801 is further configured to read the output data from the storage unit 803 according to the storage address.
That is, the apparatus can realize three operations.
Firstly, data input, namely writing input data to be converted into a storage unit according to a certain parallelism and an address mapping relation given by the application.
And secondly, performing parallel processing on the butterfly units, continuously reading one or more groups of data from the storage unit and inputting the data into one or more butterfly units at each stage according to the parallel address-taking rule and the address mapping relation of the application, and writing the data back to the storage unit according to the original address after the processing of the butterfly units is finished until all input data are traversed.
And thirdly, outputting data, and reading the data from the storage unit according to a certain parallelism and the address mapping relation of the application after the processing of all the butterfly units is finished.
In some embodiments, the processing unit 801 is further configured to perform address decomposition on each data in the DFT data to be processed, so as to obtain state information of m counters corresponding to each data;
the address management unit 802 is further configured to determine a storage address of each data according to the state information of the m counters and a preset address mapping rule;
and storing the DFT data to be processed into the storage unit 803 according to the storage address.
In some embodiments, the state information of the m counters comprises m bits;
the m bits comprise a first counting bit, a second counting bit and other counting bits;
the first counting bits are different, and the second counting bits are the same as the other counting bits and are used for indicating input data of different input positions of the same butterfly unit;
the second counting bits are different, and the first counting bits are the same as the other counting bits and are used for indicating input data of the same input position of different butterfly units.
In some embodiments, the m bits sequentially correspond to the m levels of butterfly units from top to bottom, and each bit is set to be a base N of each level of butterfly units
The first counting bit is a bit corresponding to a first-level butterfly unit which is executing DFT parallel processing;
when the base of the first-stage butterfly unit corresponding to the first counting bit independently forms the maximum parallelism, the second counting bit is any one bit of the m bits except the first counting bit;
when the base of the first-stage butterfly unit corresponding to the first counting bit does not form the maximum parallelism, or the base of the first-stage butterfly unit corresponding to the second counting bit and the bases of other-stage butterfly units form the maximum parallelism together, the base of the first-stage butterfly unit corresponding to the second counting bit can be divided by the parallelism of the first-stage butterfly unit corresponding to the first counting bit, and the base of the first-stage butterfly unit corresponding to the second counting bit is used for forming the maximum parallelism.
In some embodiments, the parallel addressing rules comprise:
accumulating the second counting bits from 0 to Ns-1 on the basis of the initial state information of the m counters to obtain the state information of the m counters of at least one group of input data, and carrying to other counting bits if the second counting bits meet the carry condition;
ns is the basis of the first-level butterfly unit that is performing DFT parallel processing.
In some embodiments, the address mapping rule comprises:
determining the storage unit identification of the at least one group of input data according to the parallelism of each level of butterfly unit, the state information of the m counters and the maximum parallelism of the storage unit; and determining the offset address of the storage unit of the at least one group of input data according to the base of each stage of butterfly unit and the state information of the m counters.
In some embodiments, the address mapping rule comprises:
Figure BDA0002976690890000181
Figure BDA0002976690890000182
wherein b is a memory cell identification, PiIs the parallelism of the i-th stage butterfly unit, niIs the state information of the ith counter, NbFor maximum parallelism, a is the offset address of the memory cell, B is the set of number of butterfly cell stages forming the maximum parallelism, ndIs the state information of the d-th counter, NsIs the basis of the s-th stage butterfly unit.
In some embodiments, the processing unit 801 is further configured to decompose the DFT operation into m-level butterfly operations according to the DFT data length, and determine a basis of the m-level butterfly unit; determining the maximum parallelism of a storage unit according to the basis of the m-level butterfly unit; and determining the parallelism of the m-level butterfly units according to the maximum parallelism of the storage units and the bases of the m-level butterfly units.
In some embodiments, the m-level butterfly units are co-prime to any two-level butterfly units; or when the bases of the two levels of butterfly units are not mutually prime in the m levels of butterfly units, the two levels of butterfly units are set to be two continuous levels.
In some embodiments, the processing unit 801 is specifically configured to use a maximum radix of the m-level butterfly unit bases as a maximum parallelism of the storage unit; or, taking the product of the bases of at least two levels of butterfly units in the m-level butterfly unit bases as the maximum parallelism of the storage unit; wherein the maximum parallelism is greater than or equal to each level of butterfly unit basis.
In some embodiments, the processing unit 801 is specifically configured to obtain at least one parallelism in which a product of a product with a basis of each stage of the butterfly unit is less than or equal to a maximum parallelism; selecting one parallelism which can be evenly divided by the basis of the butterfly unit of the target level from the at least one parallelism as the parallelism of the butterfly unit of each level; and the target-level butterfly unit is a first-level butterfly unit forming the maximum parallelism.
In some embodiments, the processing unit 801 is further configured to preset a mapping relationship between at least one DFT data length, a base of the m-level butterfly unit, and a parallelism of the m-level butterfly unit; and determining the basis of the m-level butterfly units and the parallelism of the m-level butterfly units according to the mapping relation and the DFT data length.
Based on the above hardware implementation of each unit in DFT parallel processing, an embodiment of the present application further provides an electronic device, as shown in fig. 9, where the electronic device includes: a processor 901 and a memory 902 configured to store a computer program capable of running on the processor;
wherein the memory 902 comprises a plurality of memory units, the memory units are used for storing DFT data to be processed, and the processor 901 is configured to execute the method steps in the foregoing embodiments when running a computer program.
Of course, in actual practice, the various components of the electronic device are coupled together by a bus system 903, as shown in FIG. 9. It is understood that the bus system 903 is used to enable communications among the components. The bus system 903 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as the bus system 903 in FIG. 9.
In practical applications, the processor may be at least one of an Application Specific Integrated Circuit (ASIC), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a controller, a microcontroller, and a microprocessor. It is understood that the electronic devices for implementing the above processor functions may be other devices, and the embodiments of the present application are not limited in particular.
The Memory may be a volatile Memory (volatile Memory), such as a Random-Access Memory (RAM); or a non-volatile Memory (non-volatile Memory), such as a Read-Only Memory (ROM), a flash Memory (flash Memory), a Hard Disk (HDD), or a Solid-State Drive (SSD); or a combination of the above types of memories and provides instructions and data to the processor.
In an exemplary embodiment, the present application further provides a computer readable storage medium, such as a memory including a computer program, which is executable by a processor of an electronic device to perform the steps of the foregoing method.
It is to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. The expressions "having", "may have", "include" and "contain", or "may include" and "may contain" in this application may be used to indicate the presence of corresponding features (e.g. elements such as values, functions, operations or components) but does not exclude the presence of additional features.
It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another, and are not necessarily used to describe a particular order or sequence. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present invention.
The technical solutions described in the embodiments of the present application can be arbitrarily combined without conflict.
In the several embodiments provided in the present application, it should be understood that the disclosed method, apparatus, and device may be implemented in other ways. The above-described embodiments are merely illustrative, and for example, the division of a unit is only one logical function division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, all functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application.

Claims (17)

1. A DFT parallel processing method, the method comprising:
determining m-level butterfly units for executing Discrete Fourier Transform (DFT) parallel processing; each stage of butterfly unit comprises at least one butterfly unit;
determining the state information of m counters of at least one group of input data input in parallel by each stage of butterfly unit according to a preset parallel address-taking rule;
determining the storage address of the at least one group of input data according to the state information of the m counters and a preset address mapping rule; wherein the memory address comprises a memory cell identification and a memory cell offset address;
reading the at least one group of input data from the storage units in parallel according to the storage address;
sending the at least one group of input data to at least one butterfly unit of each stage of butterfly unit in parallel for parallel processing, and outputting at least one group of output data;
and writing the at least one group of output data into a storage space corresponding to the input data according to the original storage address.
2. The method of claim 1, wherein after writing the at least one set of output data to a storage space corresponding to input data, the method further comprises:
determining the state information of m counters of at least one new group of input data according to a preset parallel address-taking rule;
when the state information of the m counters is not preset state information, acquiring at least one new group of input data according to the state information of the m counters;
and when the state information of the m counters is preset state information, determining that the processing of the butterfly unit at the current stage is finished.
3. The method of claim 2, further comprising:
determining that the processing of the m-level butterfly units is finished, and determining state information of m counters outputting data according to a preset output addressing rule;
determining a storage address of the output data according to the state information of the m counters and a preset address mapping rule;
and reading the output data from the storage unit according to the storage address.
4. The method of claim 1, further comprising:
performing address decomposition on each data in DFT data to be processed to obtain state information of m counters corresponding to each data;
determining the storage address of each datum according to the state information of the m counters and a preset address mapping rule;
and storing the DFT data to be processed to the storage unit according to the storage address.
5. The method according to any of claims 1-4, wherein the state information of the m counters comprises m bits;
the m bits comprise a first counting bit, a second counting bit and other counting bits;
the first counting bits are different, and the second counting bits are the same as the other counting bits and are used for indicating input data of different input positions of the same butterfly unit;
the second counting bits are different, and the first counting bits are the same as the other counting bits and are used for indicating input data of the same input position of different butterfly units.
6. The method of claim 5, wherein the m bits sequentially correspond to the m levels of butterfly units from top to bottom, and each bit is set to be a base N of each level of butterfly units
The first counting bit is a bit corresponding to a first-level butterfly unit which is executing DFT parallel processing;
when the base of the first-stage butterfly unit corresponding to the first counting bit independently forms the maximum parallelism, the second counting bit is any one bit of the m bits except the first counting bit;
when the base of the first-stage butterfly unit corresponding to the first counting bit does not form the maximum parallelism, or the base of the first-stage butterfly unit corresponding to the second counting bit and the bases of other-stage butterfly units form the maximum parallelism together, the base of the first-stage butterfly unit corresponding to the second counting bit can be divided by the parallelism of the first-stage butterfly unit corresponding to the first counting bit, and the base of the first-stage butterfly unit corresponding to the second counting bit is used for forming the maximum parallelism.
7. The method of claim 5, wherein the parallel addressing rules comprise:
accumulating the second counting bits from 0 to Ns-1 on the basis of the initial state information of the m counters to obtain the state information of the m counters of at least one group of input data, and carrying to other counting bits if the second counting bits meet the carry condition;
ns is the basis of the first-level butterfly unit that is performing DFT parallel processing.
8. The method according to any of claims 1-4, wherein the address mapping rule comprises:
determining the storage unit identification of the at least one group of input data according to the parallelism of each level of butterfly unit, the state information of the m counters and the maximum parallelism of the storage unit;
and determining the offset address of the storage unit of the at least one group of input data according to the base of each stage of butterfly unit and the state information of the m counters.
9. The method of claim 8, wherein the address mapping rule comprises:
Figure FDA0002976690880000022
Figure FDA0002976690880000021
wherein b is a memory cell identification, PiIs the parallelism of the i-th stage butterfly unit, niIs the state information of the ith counter, NbFor maximum parallelism, a is the offset address of the memory cell, B is the set of number of butterfly cell stages forming the maximum parallelism, ndIs the state information of the d-th counter, NsIs the basis of the s-th stage butterfly unit.
10. The method of claim 8, further comprising:
decomposing the DFT operation into m-level butterfly operation according to the DFT data length, and determining the basis of the m-level butterfly unit;
determining the maximum parallelism of a storage unit according to the basis of the m-level butterfly unit;
and determining the parallelism of the m-level butterfly units according to the maximum parallelism of the storage units and the bases of the m-level butterfly units.
11. The method of claim 10, wherein the m-level butterfly units are relatively prime to any two levels of butterfly units; or when the bases of the two levels of butterfly units are not mutually prime in the m levels of butterfly units, the two levels of butterfly units are set to be two continuous levels.
12. The method of claim 10, wherein determining the maximum parallelism of the memory cells according to the m-level butterfly cell bases comprises:
taking the maximum base in the m-level butterfly unit bases as the maximum parallelism of the storage unit;
or, taking the product of the bases of at least two levels of butterfly units in the m-level butterfly unit bases as the maximum parallelism of the storage unit;
wherein the maximum parallelism is greater than or equal to each level of butterfly unit basis.
13. The method of claim 10, wherein determining the parallelism of the m-level butterfly units based on the maximum parallelism of the memory units and the bases of the m-level butterfly units comprises:
acquiring at least one parallelism in which the product of the product and the basis of each stage of butterfly unit is less than or equal to the maximum parallelism;
selecting one parallelism which can be evenly divided by the basis of the butterfly unit of the target level from the at least one parallelism as the parallelism of the butterfly unit of each level;
and the target-level butterfly unit is a first-level butterfly unit forming the maximum parallelism.
14. The method of claim 8, further comprising:
presetting a mapping relation of at least one DFT data length, the base of the m-level butterfly unit and the parallelism of the m-level butterfly unit;
and determining the basis of the m-level butterfly units and the parallelism of the m-level butterfly units according to the mapping relation and the DFT data length.
15. A DFT parallel processing apparatus, the apparatus comprising: the device comprises a processing unit, an address management unit and a plurality of storage units; wherein,
the processing unit comprises a plurality of butterfly units with different bases;
the processing unit is used for determining an m-level butterfly unit for executing Discrete Fourier Transform (DFT) parallel processing; each stage of butterfly unit comprises at least one butterfly unit;
the address management unit is used for determining the state information of m counters of at least one group of input data input in parallel by each stage of butterfly unit according to a preset parallel address-taking rule; determining the storage address of the at least one group of input data according to the state information of the m counters and a preset address mapping rule; wherein the memory address comprises a memory cell identification and a memory cell offset address;
the processing unit is used for reading the at least one group of input data from the storage unit in parallel according to the storage address; sending the at least one group of input data to at least one butterfly unit of each stage of butterfly unit in parallel for parallel processing, and outputting at least one group of output data;
and the processing unit is also used for writing the at least one group of output data into a storage space corresponding to the input data according to the original storage address.
16. An electronic device, characterized in that the electronic device comprises: a processor and a memory configured to store a computer program operable on the processor, wherein the processor is configured to perform the steps of the method of any one of claims 1 to 14 when the computer program is executed by the processor.
17. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 14.
CN202110276067.8A 2021-03-15 2021-03-15 DFT parallel processing method, device, equipment and storage medium Active CN113094639B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110276067.8A CN113094639B (en) 2021-03-15 2021-03-15 DFT parallel processing method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110276067.8A CN113094639B (en) 2021-03-15 2021-03-15 DFT parallel processing method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113094639A true CN113094639A (en) 2021-07-09
CN113094639B CN113094639B (en) 2022-12-30

Family

ID=76667946

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110276067.8A Active CN113094639B (en) 2021-03-15 2021-03-15 DFT parallel processing method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113094639B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010032227A1 (en) * 2000-01-25 2001-10-18 Jaber Marwan A. Butterfly-processing element for efficient fast fourier transform method and apparatus
CN101082906A (en) * 2006-05-31 2007-12-05 中国科学院微电子研究所 Fixed base FFT processor with low memory overhead and method thereof
WO2010045808A1 (en) * 2008-10-24 2010-04-29 中兴通讯股份有限公司 Hardware apparatus and method for implementing fast fourier transform and inverse fast fourier transform
US20100174769A1 (en) * 2009-01-08 2010-07-08 Cory Modlin In-Place Fast Fourier Transform Processor
CN102326154A (en) * 2008-12-23 2012-01-18 苹果公司 Architecture for address mapping of managed non-volatile memory
CN102855222A (en) * 2011-06-27 2013-01-02 中国科学院微电子研究所 Method and device for mapping addresses of FFT (fast Fourier transform) of parallel branch butterfly unit
CN104699624A (en) * 2015-03-26 2015-06-10 中国人民解放军国防科学技术大学 FFT (fast Fourier transform) parallel computing-oriented conflict-free storage access method
US20150199299A1 (en) * 2014-01-16 2015-07-16 Qualcomm Incorporated Sample process ordering for dft operations
CN106469134A (en) * 2016-08-29 2017-03-01 北京理工大学 A kind of data conflict-free access method for fft processor
CN109496306A (en) * 2016-07-13 2019-03-19 金泰亨 Multi-functional arithmetic and fast Fourier transformation operation device
CN112163184A (en) * 2020-09-02 2021-01-01 上海深聪半导体有限责任公司 Device and method for realizing FFT

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010032227A1 (en) * 2000-01-25 2001-10-18 Jaber Marwan A. Butterfly-processing element for efficient fast fourier transform method and apparatus
CN101082906A (en) * 2006-05-31 2007-12-05 中国科学院微电子研究所 Fixed base FFT processor with low memory overhead and method thereof
WO2010045808A1 (en) * 2008-10-24 2010-04-29 中兴通讯股份有限公司 Hardware apparatus and method for implementing fast fourier transform and inverse fast fourier transform
CN102326154A (en) * 2008-12-23 2012-01-18 苹果公司 Architecture for address mapping of managed non-volatile memory
US20100174769A1 (en) * 2009-01-08 2010-07-08 Cory Modlin In-Place Fast Fourier Transform Processor
CN102855222A (en) * 2011-06-27 2013-01-02 中国科学院微电子研究所 Method and device for mapping addresses of FFT (fast Fourier transform) of parallel branch butterfly unit
US20150199299A1 (en) * 2014-01-16 2015-07-16 Qualcomm Incorporated Sample process ordering for dft operations
CN104699624A (en) * 2015-03-26 2015-06-10 中国人民解放军国防科学技术大学 FFT (fast Fourier transform) parallel computing-oriented conflict-free storage access method
CN109496306A (en) * 2016-07-13 2019-03-19 金泰亨 Multi-functional arithmetic and fast Fourier transformation operation device
CN106469134A (en) * 2016-08-29 2017-03-01 北京理工大学 A kind of data conflict-free access method for fft processor
CN112163184A (en) * 2020-09-02 2021-01-01 上海深聪半导体有限责任公司 Device and method for realizing FFT

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
KAI-FENG XIA ET.AL: "A Memory-Based FFT Processor Design With Generalized Efficient Conflict-Free Address Schemes", 《IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS》 *
夏凯锋 等: "任意2K点存储器结构傅里叶处理器", 《浙江大学学报(工学版)》 *

Also Published As

Publication number Publication date
CN113094639B (en) 2022-12-30

Similar Documents

Publication Publication Date Title
US8549059B2 (en) In-place fast fourier transform processor
US7164723B2 (en) Modulation apparatus using mixed-radix fast fourier transform
US10152455B2 (en) Data processing method and processor based on 3072-point fast Fourier transformation, and storage medium
CN112800386B (en) Fourier transform processing method, processor, terminal, chip and storage medium
US7752249B2 (en) Memory-based fast fourier transform device
US9317481B2 (en) Data access method and device for parallel FFT computation
CN105740405B (en) Method and device for storing data
CN111737638A (en) Data processing method based on Fourier transform and related device
US12009948B2 (en) Data processing apparatus and method, base station, and storage medium
KR102348771B1 (en) resource definition
US8023401B2 (en) Apparatus and method for fast fourier transform/inverse fast fourier transform
JP2006221648A (en) Fast fourier transformation processor and fast fourier transformation method capable of reducing memory size
CN113094639B (en) DFT parallel processing method, device, equipment and storage medium
CN115544438B (en) Twiddle factor generation method and device in digital communication system and computer equipment
US20140365547A1 (en) Mixed-radix pipelined fft processor and fft processing method using the same
US7979485B2 (en) Circuit for fast fourier transform operation
WO2011102291A1 (en) Fast fourier transform circuit
Xia et al. A generalized conflict-free address scheme for arbitrary 2k-point memory-based FFT processors
US11764942B2 (en) Hardware architecture for memory organization for fully homomorphic encryption
CN111356151A (en) Data processing method and device and computer readable storage medium
KR100557160B1 (en) Modulating apparatus for using fast fourier transform of mixed-radix scheme
US20200169274A1 (en) Wireless communication device and method of operating the same
US8484275B1 (en) Reordering discrete fourier transform outputs
WO2023050623A1 (en) Signal transmission method and apparatus, and device and storage medium
US20210255804A1 (en) Data scheduling register tree for radix-2 fft architecture

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant