CN113094639A

CN113094639A - DFT parallel processing method, device, equipment and storage medium

Info

Publication number: CN113094639A
Application number: CN202110276067.8A
Authority: CN
Inventors: 刘福良; 房旭; 张丽君
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2021-03-15
Filing date: 2021-03-15
Publication date: 2021-07-09
Anticipated expiration: 2041-03-15
Also published as: CN113094639B

Abstract

The embodiment of the application discloses a DFT parallel processing method, which comprises the following steps: determining the state information of m counters of at least one group of input data input in parallel by each stage of butterfly unit according to a preset parallel address-taking rule; determining storage addresses of at least one group of input data according to the state information of the m counters and a preset address mapping rule; reading at least one group of input data from the storage unit in parallel according to the storage address; at least one group of input data is parallelly sent to at least one butterfly unit of each stage of butterfly unit for parallel processing, and at least one group of output data is obtained; and writing at least one group of output data into the storage space corresponding to the input data according to the original storage address. Therefore, through the m counters, the parallel addressing rule and the address mapping rule, conflict-free parallel access to the plurality of storage units is realized, the parallel processing efficiency of DFT is realized, and the DFT processing delay is reduced.

Description

DFT parallel processing method, device, equipment and storage medium

Technical Field

The present application relates to the field of digital signal processing, and in particular, to a Discrete Fourier Transform (DFT) parallel processing method, apparatus, device, and storage medium.

Background

An Orthogonal Frequency Division Multiplexing (OFDM) modulation mode is adopted in a Long Term Evolution (Long Term Evolution, LTE) downlink, and a base station and a terminal respectively adopt Inverse Fast Fourier Transform (IFFT) for modulation and Fast Fourier Transform (FFT) for demodulation. The LTE uplink employs a Single Carrier Frequency Division Multiple Access (SC-FDMA) modulation scheme, and a baseband signal is subjected to DFT (Discrete Fourier Transform) spreading before IFFT modulation, which is also called DFT-Spread OFDM (DFT-s-OFDM) in 5G NR.

The number of DFT points in 4G/5G at the uplink meets the condition that N is 2^m13^m25^m3Since the fft (ifft) point number satisfies m 2-m 3-0, the fft (ifft) can be regarded as a specific example of the DFT point number, and is collectively expressed by DFT herein. The DFT of one N point is generally completed by dividing the DFT into a plurality of levels of butterfly units, all the levels of butterfly units share one memory, and the processing of each level of butterfly units is completed by continuously accessing the memory, so that higher throughput rate is realized with lower hardware resource overhead.

The key of the memory structure is to design a conflict-free address scheme, which can read a plurality of data from the memory as input data of one or more butterfly units and then write the output data of the one or more butterfly units back to the memory according to the original address. However, no memory-based DFT parallel processing scheme exists in the prior art, which causes great delay in DFT processing.

Disclosure of Invention

In order to solve the foregoing technical problems, embodiments of the present application desirably provide a DFT parallel processing method, apparatus, device, and storage medium.

The technical scheme of the application is realized as follows:

in a first aspect, a DFT parallel processing method is provided, where the method includes:

determining m-level butterfly units for executing Discrete Fourier Transform (DFT) parallel processing; each stage of butterfly unit comprises at least one butterfly unit;

determining the state information of m counters of at least one group of input data input in parallel by each stage of butterfly unit according to a preset parallel address-taking rule;

determining the storage address of the at least one group of input data according to the state information of the m counters and a preset address mapping rule; wherein the memory address comprises a memory cell identification and a memory cell offset address;

reading the at least one group of input data from the storage units in parallel according to the storage address;

sending the at least one group of input data to at least one butterfly unit of each stage of butterfly unit in parallel for parallel processing, and outputting at least one group of output data;

and writing the at least one group of output data into a storage space corresponding to the input data according to the original storage address.

In a second aspect, a DFT parallel processing apparatus is provided, the apparatus comprising: the device comprises a processing unit, an address management unit and a plurality of storage units; wherein,

the processing unit comprises a plurality of butterfly units with different bases;

the processing unit is used for determining an m-level butterfly unit for executing Discrete Fourier Transform (DFT) parallel processing; each stage of butterfly unit comprises at least one butterfly unit;

the address management unit is used for determining the state information of m counters of at least one group of input data input in parallel by each stage of butterfly unit according to a preset parallel address-taking rule; determining the storage address of the at least one group of input data according to the state information of the m counters and a preset address mapping rule; wherein the memory address comprises a memory cell identification and a memory cell offset address;

the processing unit is used for reading the at least one group of input data from the storage unit in parallel according to the storage address; sending the at least one group of input data to at least one butterfly unit of each stage of butterfly unit in parallel for parallel processing, and outputting at least one group of output data;

and the processing unit is also used for writing the at least one group of output data into a storage space corresponding to the input data according to the original storage address.

In a third aspect, an electronic device is provided, including: a processor and a memory configured to store a computer program capable of running on the processor,

wherein the processor is configured to perform the steps of the aforementioned method when running the computer program.

In a fourth aspect, a computer storage medium is provided, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the steps of the aforementioned method.

The embodiment of the application provides a DFT parallel processing method, which comprises the following steps: determining an m-level butterfly unit for performing DFT; each stage of butterfly unit comprises at least one butterfly unit; determining the state information of m counters of at least one group of input data input in parallel by each stage of butterfly unit according to a preset parallel address-taking rule; determining the storage address of the at least one group of input data according to the state information of the m counters and a preset address mapping rule; wherein the memory address comprises a memory cell identification and a memory cell offset address; reading the at least one group of input data from the storage units in parallel according to the storage address; sending the at least one group of input data to at least one butterfly unit of each stage of butterfly unit in parallel for parallel processing, and outputting at least one group of output data; and writing the at least one group of output data into a storage space corresponding to the input data according to the original storage address. Therefore, through the m counters, the parallel addressing rule and the address mapping rule, conflict-free parallel access to the plurality of storage units is realized, the parallel processing efficiency of DFT is realized, and the DFT processing delay is reduced.

Drawings

FIG. 1 is a block diagram of a DFT-s-OFDM transmitter;

FIG. 2 is a block diagram of a DFT-s-OFDM receiver;

FIG. 3 is a first flowchart of a DFT parallel processing method according to an embodiment of the present application;

FIG. 4 is a diagram of a first framework of DFT parallel processing in the embodiment of the present application;

FIG. 5 is a diagram of a second framework of DFT parallel processing in the embodiment of the present application;

FIG. 6 is a second flowchart of a DFT parallel processing method according to an embodiment of the present application;

FIG. 7 is a third flowchart of a DFT parallel processing method according to an embodiment of the present application;

FIG. 8 is a schematic diagram illustrating a structure of a DFT parallel processing apparatus according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of an electronic device in an embodiment of the present application.

Detailed Description

So that the manner in which the features and elements of the present embodiments can be understood in detail, a more particular description of the embodiments, briefly summarized above, may be had by reference to embodiments, some of which are illustrated in the appended drawings.

The number of DFT points in 4G/5G at the uplink meets the condition that N is 2^m13^m25^m3The FFT (ifft) point number satisfies that m2 ═ m3 ═ 0, so the FFT (ifft) can be regarded as a special case of DFT point number, that is, the DFT parallel processing method provided in the embodiment of the present application can also be applied to FFT parallel processing, but this document is collectively expressed by DFT.

Fig. 1 is a block diagram of a DFT-s-OFDM transmitter, and as shown in fig. 1, a bit stream sequentially undergoes constellation point debugging, serial-to-parallel conversion, N-point DFT processing, subcarrier mapping, M-point IFFT processing, parallel-to-serial conversion, cyclic prefix adding, up-conversion, and radio frequency processing to generate a transmission signal.

Fig. 2 is a block diagram of a DFT-s-OFDM receiver, and as shown in fig. 2, an antenna receives a signal, and the signal is sequentially subjected to down-conversion and radio frequency processing, cyclic prefix addition, serial-to-parallel conversion, M-point FFT processing, subcarrier mapping, N-point IDFT processing, parallel-to-serial conversion, and constellation point demodulation to obtain a bit stream.

The DFT parallel processing method provided by the embodiment of the present application can be applied to the positions indicated by the dashed line in fig. 1 and fig. 2, that is, N-point DFT processing, M-point IFFT processing, M-point FFT processing, and N-point IDFT processing. The following describes an embodiment of the DFT parallel processing method in detail.

Fig. 3 is a first flowchart of a DFT parallel processing method in the embodiment of the present application, and as shown in fig. 3, the method may specifically include:

step 301: determining m-level butterfly units for executing DFT parallel processing; each stage of butterfly unit comprises at least one butterfly unit;

specifically, the DFT operation is decomposed into m levels of butterfly operations according to the DFT data length, m is an integer greater than or equal to 1, and the first level of butterfly operations are executed by first level butterfly units, and each level of butterfly unit comprises one or more butterfly units with the same basis. The bases of the butterfly units required by the present application include 2, 4, 8, 16, 3, 9, 5, which can perform DFT processing of arbitrary data length.

Illustratively, if the DFT data length N is 60, then 60 × 3 × 4 × 5 is divided into 3-stage butterflies, the first stage of butterflies being performed by the radix-3 butterfly units, the second stage of butterflies being performed by the radix-4 butterfly units, and the third stage of butterflies being performed by the radix-5 butterfly units. N is 54, then 54 ═ 2 × 3 × 9.

Step 302: determining the state information of m counters of at least one group of input data input in parallel by each stage of butterfly unit according to a preset parallel address-taking rule;

here, m counters are used as the basis for reading the memory cells, and the number of the memory cells to be read and written and the offset address inside the storage unit are calculated according to the state information of the counters and the address mapping rule. The method comprises the steps that a plurality of storage units (bank) are included, each storage unit comprises a plurality of storage spaces, the number of each storage unit is used for uniquely identifying one storage unit, and the offset address is used for identifying one storage space in one storage unit.

Specifically, the state information of the m counters includes m bits; the m bits comprise a first counting bit, a second counting bit and other counting bits; the first counting bits are different, and the second counting bits are the same as the other counting bits and are used for indicating input data of different input positions of the same butterfly unit; the second counting bits are different, and the first counting bits are the same as the other counting bits and are used for indicating input data of the same input position of different butterfly units.

In practical application, when each counter starts counting from 0 and each counter counts to the upper limit value, the storage unit is completely traversed, the processing of the butterfly unit at the current stage is completed, each counter returns to zero, and the input data of the butterfly unit at the next stage starts to be counted. Here, the upper limit value of each counter is defined by the basis N of each stage of butterfly unit_sAnd (6) determining.

In some embodiments, the m bits sequentially correspond to the m levels of butterfly units from top to bottom, and each bit is set to be a base N of each level of butterfly unit_s(ii) a The first counting bit is a bit corresponding to a first-level butterfly unit which is executing DFT parallel processing; when the base of the first-stage butterfly unit corresponding to the first counting bit independently forms the maximum parallelism, the second counting bit is any one bit of the m bits except the first counting bit;

when the base of the first-stage butterfly unit corresponding to the first counting bit does not form the maximum parallelism, or the base of the first-stage butterfly unit corresponding to the second counting bit and the bases of other-stage butterfly units form the maximum parallelism together, the base of the first-stage butterfly unit corresponding to the second counting bit can be divided by the parallelism of the first-stage butterfly unit corresponding to the first counting bit, and the base of the first-stage butterfly unit corresponding to the second counting bit is used for forming the maximum parallelism.

Correspondingly, the parallel addressing rule comprises the following steps:

accumulating the second counting bits from 0 to Ns-1 on the basis of the initial state information of the m counters to obtain the state information of the m counters of at least one group of input data, and carrying to other counting bits if the second counting bits meet the carry condition;

ns is the basis of the first-level butterfly unit that is performing DFT parallel processing.

Illustratively, for the s-th stage butterfly unit, byTo n_s(first count bit) counts from 0 to N_s-1 as input to the same butterfly unit. By pairing n_t(second counting bit) accumulating the corresponding bank address of the count as the input of different butterfly units, n_sCarry the system to N_s，n_tCarry the system to N_t. Wherein N is_sFor the s-th butterfly unit, when N_sWhen composing bank alone, N_tCan be the basis of any other butterfly unit, when N is_sWhen there is no bank formed or no bank formed alone, N_tTo be able to be P_sTrimmed and used to make up the bank base. That is, the l₁(l₁＝0,1,...,N_t-1) first butterfly unit₂(l₂＝0,1,...,N_s-1) the state information of the m counters corresponding to the input addresses is (n)₀,...n_s-1,l₂,...,n_t-1,l₁,...,n_m-1). And mapping the state of the counter into a corresponding bank address (namely a storage address) by using an address mapping rule, reading data in the bank according to the bank address and inputting the data into a corresponding position of the butterfly unit, and writing back the original address after the butterfly unit finishes processing. By pairing n_tAnd accumulating and carrying to other counters can realize parallel input of other groups until all data are traversed, and the processing of the s-th-stage butterfly unit is completed.

E.g. 54-2 x 3 x 9, for the third pole butterfly unit N_s＝9，n_tEither the first bit or the second bit, i.e., N_tWhich may be radix 2 of the first stage butterfly unit or radix 3 of the second stage butterfly unit. For the first stage butterfly unit N_s2 and second stage butterfly unit N_s＝3，n_tAre all the third position, i.e. N_tAre the bases 9 of the third-level butterfly unit.

Step 303: determining the storage address of the at least one group of input data according to the state information of the m counters and a preset address mapping rule; wherein the memory address comprises a memory cell identification and a memory cell offset address;

here, an input terminal of one butterfly unit corresponds to one state information of m counters, an address mapping rule makes the state information and the storage space establish a one-to-one mapping relationship, and one state information is mapped onto one storage space by using the address mapping rule, that is, one data can be read from the storage space according to one state information and input to an input terminal of the corresponding butterfly unit.

That is to say, when the input of one or more butterfly units of each stage of butterfly unit is obtained, the state information of at least one group of m counters is set through a parallel value-taking rule, the state information of at least one group of m counters is mapped into at least one group of storage addresses through an address mapping rule, and then the storage units are read in parallel to obtain at least one group of input data.

Here, the set of input data is input data of one butterfly unit.

In some embodiments, the address mapping rule comprises: determining the storage unit identification of the at least one group of input data according to the parallelism of each level of butterfly unit, the state information of the m counters and the maximum parallelism of the storage unit; and determining the offset address of the storage unit of the at least one group of input data according to the base of each stage of butterfly unit and the state information of the m counters.

In some embodiments, the method further comprises: decomposing the DFT operation into m-level butterfly operation according to the DFT data length, and determining the basis of the m-level butterfly unit; determining the maximum parallelism of a storage unit according to the basis of the m-level butterfly unit; and determining the parallelism of the m-level butterfly units according to the maximum parallelism of the storage units and the bases of the m-level butterfly units.

Specifically, the determining the maximum parallelism of the storage unit according to the m-level butterfly unit bases includes: taking the maximum base in the m-level butterfly unit bases as the maximum parallelism of the storage unit; or, taking the product of the bases of at least two levels of butterfly units in the m-level butterfly unit bases as the maximum parallelism of the storage unit; wherein the maximum parallelism is greater than or equal to each level of butterfly unit basis.

Namely N_b＝N_iOr N_b＝N_i*N_j，N_iAnd N_jRelatively prime, N_b≥N_d，d＝0，1，2，…，m-1 (1)

Wherein N is_bTo maximize parallelism, N_iAnd N_jTo form N_bThe basis of the butterfly unit of (1), N_dFor the base of the d-th level butterfly unit, m represents the number of levels of decomposition, and the definition set B ═ { i } or B ═ i, j } represents the set of number of levels of level numbers used to compose the maximum parallelism of the memory cell.

Specifically, the determining the parallelism of the m-level butterfly units according to the maximum parallelism of the storage unit and the basis of the m-level butterfly units includes: acquiring at least one parallelism in which the product of the product and the basis of each stage of butterfly unit is less than or equal to the maximum parallelism; selecting one parallelism which can be evenly divided by the basis of the butterfly unit of the target level from the at least one parallelism as the parallelism of the butterfly unit of each level; and the target-level butterfly unit is a first-level butterfly unit forming the maximum parallelism.

I.e. the presence of i e B such that

And P is_dN_d≤N_b (2)

Illustratively, for 54-2-3-9, the maximum parallelism is 9, then the parallelism of stage1 may be divided by 9 to include 1, 3 and 9, but 9 does not satisfy the above condition, and the larger parallelism is taken from 1 and 3 as the parallelism of stage 1. In the same way, the parallelism of the three levels is 3, 3 and 1 respectively.

Specifically, the m-level butterfly units have the same prime as any two levels of butterfly units; or when the bases of the two levels of butterfly units are not mutually prime in the m levels of butterfly units, the two levels of butterfly units are set to be two continuous levels.

Illustratively, for example, for 54 ═ 2 × 27, it is necessary to further decompose 27 into 3 and 9, while 3 and 9 are not coprime and need to be solved with a CFA scheme, and for hardware processing, several bases that are not coprime should be placed in a sequential order of positions.

In other embodiments, the method further comprises: presetting a mapping relation of at least one DFT data length, the base of the m-level butterfly unit and the parallelism of the m-level butterfly unit; and determining the basis of the m-level butterfly units and the parallelism of the m-level butterfly units according to the mapping relation and the DFT data length.

That is to say, the parallelism of the radix of the m-level butterfly unit and the parallelism of the m-level butterfly unit corresponding to different data lengths can be predetermined according to the determination method, the mapping relationship is established and stored, and the mapping relationship is directly searched according to the DFT data length after the DFT parallel processing is performed, so that the parallelism of the radix of the m-level butterfly unit and the parallelism of the m-level butterfly unit is obtained.

Specifically, the address mapping rule includes:

wherein b is a memory cell identification, P_iIs the parallelism of the i-th stage butterfly unit, n_iIs the state information of the ith counter, N_bFor maximum parallelism, a is the offset address of the memory cell, B is the set of number of butterfly cell stages forming the maximum parallelism, n_dIs the state information of the d-th counter, N_sIs the basis of the s-th stage butterfly unit.

Step 304: reading at least one group of input data from the storage unit in parallel according to the storage address;

specifically, the memory is composed of a plurality of banks (i.e., memory cells), and N may be used_bAnd executing the DTF parallel processing by each bank, wherein the depth of each bank is D, and the bit width of each address is B bits. In each clock cycle, only one memory space of each bank can be read and written, namely the maximum parallelism of reading and writing is N_b. According to the NR upstream DFT/FFT protocol the memory comprises 16 banks, each bank comprising 256 memory spaces, and up to 4096 operands can be stored.

At most N can be read in one clock cycle_bAnd the number is used as input data of the first-stage butterfly unit.

Step 305: at least one group of input data is parallelly sent to at least one butterfly unit of each stage of butterfly unit for parallel processing, and at least one group of output data is obtained;

step 306: and writing at least one group of output data into the storage space corresponding to the input data according to the original storage address.

The above embodiment provides a parallel access process to the memory unit in a one-time parallel processing process of the first-level butterfly unit, and in practical application, the parallel access to the memory unit is realized by adopting the parallel addressing scheme and the address mapping method in each other parallel processing process.

In practical applications, after writing the at least one set of output data into the storage space corresponding to the input data, the method further includes: determining the state information of m counters of at least one new group of input data according to a preset parallel address-taking rule; when the state information of the m counters is not preset state information, acquiring at least one new group of input data according to the state information of the m counters; and when the state information of the m counters is preset state information, determining that the processing of the butterfly unit at the current stage is finished.

And when the butterfly unit at the current stage is determined to be processed completely, continuing to execute the butterfly unit at the next stage until the butterfly unit at the last stage is processed completely.

For example, the preset state information may be state information of m counters when the memory cell starts to be traversed, for example, each counter in the preset state information is 0.

FIG. 4 is a first block diagram of DFT parallel processing in the embodiment of the present application, and as shown in FIG. 4, DFT parallel processing is performed by m-level butterfly units, where the i-th level butterfly unit includes P_iEach butterfly unit, i-th stage butterfly unit reads P from the storage unit in parallel_iGroup input data, and combine P_iThe group output data is written back to the memory cell according to the original address.

FIG. 5 is a diagram illustrating a second framework of DFT parallel processing in the embodiment of the present application, and as shown in FIG. 5, the DFT parallel processing is performed by m-level butterfly units, where the m-level butterfly units include a level 1 butterfly unit to an m-level butterfly unit, and the level 1 butterfly unit includes a P level₀A butterfly unit of the 2 nd level including P₁A butterfly unit of m-th level including P_m-1A butterfly unit. And inputting data from the storage unit in parallel by each level of butterfly unit according to the read-write flow shown in fig. 4, and writing the output data back to the storage unit according to the original address until the processing of the mth level of butterfly unit is completed.

Here, the execution subject of steps 301 to 306 may be a processor of an electronic device that performs the DFT parallel processing operation.

By adopting the technical scheme, conflict-free parallel access to a plurality of storage units is realized through the m counters, the parallel address-fetching rule and the address mapping rule, so that the parallel processing efficiency of DFT is realized, and the DFT processing delay is reduced.

To further illustrate the object of the present application based on the above embodiments of the present application, as shown in fig. 6, the method specifically includes:

step 601: determining m-level butterfly units for executing Discrete Fourier Transform (DFT) parallel processing; each stage of butterfly unit comprises at least one butterfly unit;

Step 602, determining state information of m counters of at least one group of input data input in parallel by each stage of butterfly unit according to a preset parallel address-taking rule;

Correspondingly, the parallel addressing rule comprises the following steps:

Step 603: judging whether the state information of the m counters is preset state information, if so, executing step 608; if not, go to step 604;

step 604: determining the storage address of the at least one group of input data according to the state information of the m counters and a preset address mapping rule; wherein the memory address comprises a memory cell identification and a memory cell offset address;

here, the set of input data is input data of one butterfly unit.

Step 605: reading at least one group of input data from the storage unit in parallel according to the storage address;

step 606: sending the at least one group of input data to at least one butterfly unit of each stage of butterfly unit in parallel for parallel processing, and outputting at least one group of output data;

step 607: writing the at least one group of output data into a storage space corresponding to the input data according to the original storage address; and returns to step 602 to continue to perform data processing of the butterfly unit in the current stage until all data of the storage unit is traversed.

Step 608: determining that the processing of the butterfly unit at the current stage is finished;

step 609: judging whether the current stage is the last stage, if so, executing step 610, and if not, returning to step 602;

here, the determination method may be determined according to the identification information of the current stage, or the number of stages currently processed may be recorded by setting a counter, and the determination may be performed according to the state of the counter, and other determinations commonly used by those skilled in the art may be applied in the present application.

Step 610: and determining that the m-level butterfly unit processing is finished.

To further illustrate the object of the present application based on the above embodiments of the present application, as shown in fig. 7, the method specifically includes:

step 701: performing address decomposition on each data in DFT data to be processed to obtain state information of m counters corresponding to each data;

a set of basis decompositions for DFT can be represented as

N is the DFT data length, m represents the number of decomposed stages with the number of stages being 0, 1, …, m-1, and N is each stage_iAre relatively prime. For m2, PFA is decomposed into

Wherein,

PFA has a lower inter-stage twiddle factor than CFA, as twiddle factor.

The corresponding address resolution can be expressed as

Wherein n is_iAddressing each data sequenceNumber n_i＝0，1，…，N_i-1，n_iCorresponding weight is p_iAnd the product of all other levels of butterfly bases, where p_iSatisfy the requirement of

Correspondingly have

Represents N to N_iThe modulus is taken. For example, 18 ═ 2 × 9, and 10 ═ 10, with 10 ═ 0, 1. For stage i, N in butterfly_iDecomposition of individual addresses (n)₀，n₁，…，n_m-1) Of only n_iIs different. For example, 18 ═ 2 × 9, stage0 base 2, divided into 9 groups, 0 ═ 0, (0, 0) and 9 ═ 1, 0 in a butterfly base; stage1 is divided into 2 groups, 0 is (0, 0), 1 is (0, 1), 2 is (0, 2), …, 8 is (0, 8) in a butterfly group. Accordingly, the resolution of the output address may be expressed as

Wherein k is_i＝0，1，…，N_i-1，k_iThe corresponding weight is the product of all other levels of butterfly bases.

When the number of DFT points is high, the radix of each level of butterfly unit is also large, and for a large butterfly radix, the large butterfly radix needs to be further decomposed into small butterfly bases for processing, but this cannot guarantee that the radix of each level of small butterfly units is relatively prime. For example, for 54 ═ 2 × 27, it is necessary to further decompose 27 into 3 and 9, while 3 and 9 are not coprime and need to be solved with a CFA solution. I.e., PFA scheme between large stages and CFA scheme inside large stages. For this PFA and CFA mixed address scheme, the aforementioned address resolution scheme needs further modification. For m is 3, N is N₀N₁N₂，N₀N₁Non co prime, N₀N₁And N₂Coprime, the new address resolution scheme can be expressed as

Wherein

Correspondingly have

For the output have

Note that inside the CFA, the output is the inverted order of the input. For hardware processing convenience, several bases that are not coprime should be placed in sequential order of positions.

Step 702: determining the storage address of each datum according to the state information of the m counters and a preset address mapping rule;

In practical application, when each counter starts counting from 0 and each counter counts to the upper limit value, the memory cell is completely traversed and currentlyAnd finishing the processing of the butterfly units, enabling each counter to return to zero, and starting to count the input data of the butterfly unit at the next stage. Here, the upper limit value of each counter is defined by the basis N of each stage of butterfly unit_sAnd (6) determining.

In some embodiments, the m bits sequentially correspond to the m levels of butterfly units from top to bottom, and each bit is set to be a base N of each level of butterfly unit_s(ii) a The first counting bit is a bit corresponding to a first-level butterfly unit which is executing DFT parallel processing, and the parallelism of the first-level butterfly unit corresponding to the second counting bit can be divided by the radix of the first-level butterfly unit forming the maximum parallelism.

Here, the set of input data is input data of one butterfly unit.

I.e. the presence of i e B such that

And P is_dN_d≤N_b (2)

Specifically, the address mapping rule is referred to equations (3) and (4).

Step 703: storing DFT data to be processed into a storage unit according to the storage address;

it should be noted that the data writing order and the writing method (including parallel writing and serial writing) are set by an input addressing rule, the input addressing rule is defined by an input interface, and the state information of the m counters can be set according to the input addressing rule.

That is, before performing DFT parallel processing, the m-level butterfly unit needs to write N number N to 0, 1, N-1 into the memory unit, and decompose the address N into an m-bit counter (N)₀，n₁，…，n_m-1). Specifically, when N counts from 0 to N-1, if N_iAnd N_i+1Relatively prime, then n_iAccumulating and modulus taking each time; if N is present_iAnd N_i+1Not co-prime, then n_i+1Each time of carry, n_iAn accumulation modulo is performed. Then calculating the bank number and the ba according to the formulas (3) and (4)nk internal offset address, and write the input data in sequence.

Step 704: performing DFT parallel processing;

here, the DFT parallel processing is an m-level butterfly operation performed by the m-level butterfly unit in the above embodiment of the present application, and is not described herein again.

Step 705: determining that the processing of the m-level butterfly units is finished, and determining state information of m counters outputting data according to a preset output addressing rule;

here, the data reading order and the reading manner (including parallel reading and serial reading) are set by an output addressing rule which is specified by the input interface, and the state information of the m counters can be set according to the output addressing rule.

Step 706: determining a storage address of the output data according to the state information of the m counters and a preset address mapping rule;

here, the specific address mapping method is referred to as step 702.

Step 707: and reading the output data from the memory cell according to the memory address.

That is, the m-stage butterfly unit sets the state information of the m counters according to a preset readout order after performing DFT parallel processing, and reads out output data.

Then, according to the parallel address-fetching rule and address mapping rule, proving the address number n^xAnd the mapped bank address (b)^x，a^x) Is one-to-one, proving for the s-th level P_sP of butterfly-shaped unit_sN_sAnd the mapped bank numbers of the input data are different.

(1) Address n for any two different data^xAnd n^yThe corresponding address resolution is respectively

And

mapped bank address (b)^x，a^x) And (b)^y，a^y) As well as different.

For N_b＝N_iTo make a^x＝a^yNeed to have

Namely in addition to

At a^x＝a^yOn the premise of (b)^x＝b^yNeed to have

Namely, it is

This is in combination with n^x≠n^yAnd (4) contradiction.

To N_b＝N_i*N_jTo make a^x＝a^yNeed to have

Namely in addition to

At a^x＝a^yOn the premise of (b)^x＝b^yNeed to have

Due to P_i＝N_j，P_j＝N_iIt can also be derived that

And is

This is in combination with n^x≠n^yAnd (4) contradiction.

(2) Setting the address n of two data taken out in the same clock cycle^xAnd n^yCorresponding address resolution respectivelyIs composed of

And

to make bank number b^x＝b^yNeed to have

Namely, it is

In view of

Need to have

Or

(or-N)_b)。

Take into account

N_tAnd P_tRelatively prime, so P_sAnd P_tThe quality of the mixture is relatively prime,

if P _t1, with N_t＝N_bIs provided with

Is provided with

If P_tNot equal to 1, having P_tN_t＝N_b，P_s＝N_tHaving P of_tP_s＝N_b，N_bIs P_tAnd P_sThe least common multiple of

Obtaining the syndrome.

Next, data reading is exemplified according to the parallel address fetching rule and the address mapping rule. For example, for 54 x 2 x 3 x 9, the bank number N_bWhen the parallelism is 3, 3 and 1 in sequence, the bank mapping result is shown in table (1), wherein the columns represent bank identifiers, the rows represent offset addresses in the banks, (only the banks and the storage space with data are involved, the actual memory is much larger, the highest support is 16 banks, and each bank comprises 256 storage spaces)

Watch (1)

For example, for 54 ═ 2 × 3 × 9, N_bThe parallelism of 3 stages is 3, 3, 1 in order, 9. The parallel addressing order of Stage1 (radix 3) is shown in table (2), and the same symbol in the lower right corner represents the input data of multiple parallel butterfly bases fetched in the same clock cycle.

Watch (2)

And the corresponding counter (number in parentheses) is changed to

Reading 1 (denoted by symbol:)

First set of data (corresponding to butterfly unit 0): 0(0,0,0)36(0,1,0)18(0,2,0)

Second set of data (corresponding to butterfly 1): 28(0,0,1)10(0,1,1)46(0,2,1)

Third set of data (corresponding to butterfly 2): 2(0,0,2)38(0,1,2)20(0,2,2)

Read 2 (symbolized by:)

First set of data (corresponding to butterfly unit 0): 30(0,0,3)12(0,1,3)48(0,2,3)

Second set of data (corresponding to butterfly 1): 4(0,0,4)40(0,1,4)22(0,2,4)

Third set of data (corresponding to butterfly 2): 32(0,0,5)14(0,1,5)50(0,2,5)

Read 3 rd time (symbolized by c)

First set of data (corresponding to butterfly unit 0): 6(0,0,6)42(0,1,6)24(0,2,6)

Second set of data (corresponding to butterfly 1): 34(0,0,7)16(0,1,7)52(0,2,7)

Third set of data (corresponding to butterfly 2): 8(0,0,8)44(0,1,8)26(0,2,8)

Reading 4 (indicated by the symbol r)

First set of data (corresponding to butterfly unit 0): 27(1,0,0)9(1,1,0)45(1,2,0)

Second set of data (corresponding to butterfly 1): 1(1,0,1)37(1,1,1)19(1,2,1)

Third set of data (corresponding to butterfly 2): 29(1,0,2)11(1,1,2)47(1,2,2)

Read 5 (symbolized by a number of five)

First set of data (corresponding to butterfly unit 0): 3(1,0,3)39(1,1,3)21(1,2,3)

Second set of data (corresponding to butterfly 1): 31(1,0,4)13(1,1,4)49(1,2,4)

Third set of data (corresponding to butterfly 2): 5(1,0,5)41(1,1,5)23(1,2,5)

Reading at 6 th time (symbolized by:)

First set of data (corresponding to butterfly unit 0): 33(1,0,6)15(1,1,6)51(1,2,6)

Second set of data (corresponding to butterfly 1): 7(1,0,7)43(1,1,7)25(1,2,7)

Third set of data (corresponding to butterfly 2): 35(1,0,8)17(1,1,8)53(1,2,8)

The parallel addressing order of Stage0 (radix 2) is shown in table (3), and the same symbol in the lower right corner represents the input data of multiple parallel butterfly bases fetched in the same clock cycle.

Watch (3)

1 st time

The 0 th butterfly unit 0(0, 0, 0)27(1, 0, 0)

Butterfly unit 1 (0, 0, 1)1(1, 0, 1)

Butterfly unit 2(0, 0, 2)29(1, 0, 2)

2 nd time

The 0 th butterfly unit 30(0, 0, 3)3(1, 0, 3)

Butterfly unit 14(0, 0, 4)31(1, 0, 4)

The 2 nd butterfly unit 32(0, 0, 5)5(1, 0, 5)

3 rd time

Butterfly unit 0(0, 0, 6)33(1, 0, 6)

Butterfly unit 1 (0, 0, 7)7(1, 0, 7)

Butterfly unit 28(0, 0, 8)35(1, 0, 8)

4 th time

Butterfly unit 0(0, 1, 0)9(1, 1, 0)

Butterfly unit 1 (0, 1, 1)37(1, 1, 1)

Butterfly unit 2(0, 1, 2)11(1, 1, 2)

5 th time

Butterfly unit 0(0, 1, 3)36(1, 1, 3)

Butterfly unit 1 (0, 1, 4)10(1, 1, 4)

Butterfly unit 2(0, 1, 5)38(1, 1, 5)

6 th time

Butterfly unit 0(0, 1, 6)15(1, 1, 6)

Butterfly unit 1 (0, 1, 7)43(1, 1, 7)

Butterfly unit 2(0, 1, 8)17(1, 1, 8)

7 th time

The 0 th butterfly unit 18(0, 2, 0)45(1, 2, 0)

Butterfly unit 1 (0, 2, 1)19(1, 2, 1)

The 2 nd butterfly unit 20(0, 2, 2)47(1, 2, 2)

8 th time

Butterfly unit 0(0, 2, 3)21(1, 2, 3)

Butterfly unit 1 (0, 2, 4)49(1, 2, 4)

Butterfly unit 2 50(0, 2, 5)23(1, 2, 5)

9 th time

Butterfly unit 0(0, 2, 6)51(1, 2, 6)

Butterfly unit 1 (0, 2, 7)25(1, 2, 7)

Butterfly unit 2(0, 2, 8)53(1, 2, 8)

The parallel addressing order of Stage0 (radix 2) is shown in table (4), and the same symbol in the lower right corner represents the input data of multiple parallel butterfly bases fetched in the same clock cycle.

Watch (4)

1 st time

The 0 th butterfly unit 0(0, 0, 0), 28(0, 0, 1), 2(0, 0, 2), 30(0, 0, 3), 4(0, 0, 4), 32(0, 0, 5), 6(0, 0, 6), 34(0, 0, 7), 8(0, 0, 8)

2 nd time

Butterfly unit 0 36(0, 1, 0), 10(0, 1, 1), 38(0, 1, 2), 12(0, 1, 3), 40(0, 1, 4), 14(0, 1, 5), 42(0, 1, 6), 16(0, 1, 7), 44(0, 1, 8)

3 rd time

Butterfly unit 0, 18(0, 2, 0), 46(0, 2, 1), 20(0, 2, 2), 48(0, 2, 3), 22(0, 2, 4), 50(0, 2, 5), 24(0, 2, 6), 52(0, 2, 7), 26(0, 2, 8)

4 th time

The 0 th butterfly unit 27(1, 0, 0), 1(1, 0, 1), 29(1, 0, 2), 3(1, 0, 3), 31(1, 0, 4), 5(1, 0, 5), 33(1, 0, 6), 7(1, 0, 7), 35(1, 0, 8)

5 th time

Butterfly unit 0(1, 1, 0), 37(1, 1, 1), 11(1, 1, 2), 39(1, 1, 3), 13(1, 1, 4), 41(1, 1, 5), 15(1, 1, 6), 43(1, 1, 7), 17(1, 1, 8)

6 th time

The 0 th butterfly unit 45(1, 2, 0), 19(1, 2, 1), 47(1, 2, 2), 21(1, 2, 3), 49(1, 2, 4), 23(1, 2, 5), 51(1, 2, 6), 25(1, 2, 7), 53(1, 2, 8).

The DFT parallel processing scheme provided by the application can be applied to the discrete Fourier transform processing of PFA and CFA, and compared with CFA, the value range of the basis of the butterfly unit of each stage of PFA is less (the basis of each stage of the PFA is relatively prime), so that N is formed by the basis of a plurality of butterfly units_bThe proposed address mapping scheme and parallel processing scheme may also be applied to CFAs (the bases of each level of butterfly unit of a CFA may not be co-prime).

In order to implement the method of the embodiment of the present application, based on the same inventive concept, an embodiment of the present application further provides a DFT parallel processing apparatus, as shown in fig. 8, the apparatus includes: a processing unit 801, an address management unit 802, and a plurality of storage units 803; wherein,

the processing unit 801 comprises a plurality of butterfly units of different bases;

the processing unit 801 is configured to determine an m-level butterfly unit that performs discrete fourier transform DFT parallel processing; each stage of butterfly unit comprises at least one butterfly unit;

the address management unit 802 is configured to determine, according to a preset parallel address fetching rule, state information of m counters of at least one set of input data input in parallel by each stage of butterfly unit; determining the storage address of the at least one group of input data according to the state information of the m counters and a preset address mapping rule; wherein the memory address comprises a memory cell identification and a memory cell offset address;

the processing unit 801 is configured to read the at least one set of input data from the storage unit 803 in parallel according to the storage address; sending the at least one group of input data to at least one butterfly unit of each stage of butterfly unit in parallel for parallel processing, and outputting at least one group of output data;

the processing unit 801 is further configured to write the at least one set of output data into a storage space corresponding to the input data according to the original storage address.

Illustratively, the basis of butterfly units required by the present application includes 2, 4, 8, 16, 3, 9, 5.

The address management unit comprises a plurality of counters, can set the states of a plurality of technical devices according to the parallel address-fetching rule, and calculates the bank identification and the offset in the bank which need to be read and written according to the states of the counters and the address mapping rule.

The device comprises a plurality of banks (i.e. memory cells), and N can be used_bAnd executing the DTF parallel processing by each bank, wherein the depth of each bank is D, and the bit width of each address is B bits. In each clock cycle, only one memory space of each bank can be read and written, namely the maximum parallelism of reading and writing is N_b. According to the NR upstream DFT/FFT protocol the memory comprises 16 banks, each bank comprising 256 memory spaces, and up to 4096 operands can be stored.

In some embodiments, the address management unit 802 is further configured to determine, according to a preset parallel addressing rule, state information of m counters of at least one new set of input data; when the state information of the m counters is not preset state information, acquiring at least one new group of input data according to the state information of the m counters; and when the state information of the m counters is preset state information, determining that the processing of the butterfly unit at the current stage is finished.

In some embodiments, the address management unit 802 is further configured to determine that the processing of the m-level butterfly unit is completed, and determine state information of m counters outputting data according to a preset output addressing rule; determining a storage address of the output data according to the state information of the m counters and a preset address mapping rule;

the processing unit 801 is further configured to read the output data from the storage unit 803 according to the storage address.

That is, the apparatus can realize three operations.

Firstly, data input, namely writing input data to be converted into a storage unit according to a certain parallelism and an address mapping relation given by the application.

And secondly, performing parallel processing on the butterfly units, continuously reading one or more groups of data from the storage unit and inputting the data into one or more butterfly units at each stage according to the parallel address-taking rule and the address mapping relation of the application, and writing the data back to the storage unit according to the original address after the processing of the butterfly units is finished until all input data are traversed.

And thirdly, outputting data, and reading the data from the storage unit according to a certain parallelism and the address mapping relation of the application after the processing of all the butterfly units is finished.

In some embodiments, the processing unit 801 is further configured to perform address decomposition on each data in the DFT data to be processed, so as to obtain state information of m counters corresponding to each data;

the address management unit 802 is further configured to determine a storage address of each data according to the state information of the m counters and a preset address mapping rule;

and storing the DFT data to be processed into the storage unit 803 according to the storage address.

In some embodiments, the state information of the m counters comprises m bits;

the m bits comprise a first counting bit, a second counting bit and other counting bits;

the first counting bits are different, and the second counting bits are the same as the other counting bits and are used for indicating input data of different input positions of the same butterfly unit;

the second counting bits are different, and the first counting bits are the same as the other counting bits and are used for indicating input data of the same input position of different butterfly units.

In some embodiments, the m bits sequentially correspond to the m levels of butterfly units from top to bottom, and each bit is set to be a base N of each level of butterfly unit_s；

The first counting bit is a bit corresponding to a first-level butterfly unit which is executing DFT parallel processing;

when the base of the first-stage butterfly unit corresponding to the first counting bit independently forms the maximum parallelism, the second counting bit is any one bit of the m bits except the first counting bit;

In some embodiments, the parallel addressing rules comprise:

In some embodiments, the address mapping rule comprises:

determining the storage unit identification of the at least one group of input data according to the parallelism of each level of butterfly unit, the state information of the m counters and the maximum parallelism of the storage unit; and determining the offset address of the storage unit of the at least one group of input data according to the base of each stage of butterfly unit and the state information of the m counters.

In some embodiments, the address mapping rule comprises:

In some embodiments, the processing unit 801 is further configured to decompose the DFT operation into m-level butterfly operations according to the DFT data length, and determine a basis of the m-level butterfly unit; determining the maximum parallelism of a storage unit according to the basis of the m-level butterfly unit; and determining the parallelism of the m-level butterfly units according to the maximum parallelism of the storage units and the bases of the m-level butterfly units.

In some embodiments, the m-level butterfly units are co-prime to any two-level butterfly units; or when the bases of the two levels of butterfly units are not mutually prime in the m levels of butterfly units, the two levels of butterfly units are set to be two continuous levels.

In some embodiments, the processing unit 801 is specifically configured to use a maximum radix of the m-level butterfly unit bases as a maximum parallelism of the storage unit; or, taking the product of the bases of at least two levels of butterfly units in the m-level butterfly unit bases as the maximum parallelism of the storage unit; wherein the maximum parallelism is greater than or equal to each level of butterfly unit basis.

In some embodiments, the processing unit 801 is specifically configured to obtain at least one parallelism in which a product of a product with a basis of each stage of the butterfly unit is less than or equal to a maximum parallelism; selecting one parallelism which can be evenly divided by the basis of the butterfly unit of the target level from the at least one parallelism as the parallelism of the butterfly unit of each level; and the target-level butterfly unit is a first-level butterfly unit forming the maximum parallelism.

In some embodiments, the processing unit 801 is further configured to preset a mapping relationship between at least one DFT data length, a base of the m-level butterfly unit, and a parallelism of the m-level butterfly unit; and determining the basis of the m-level butterfly units and the parallelism of the m-level butterfly units according to the mapping relation and the DFT data length.

Based on the above hardware implementation of each unit in DFT parallel processing, an embodiment of the present application further provides an electronic device, as shown in fig. 9, where the electronic device includes: a processor 901 and a memory 902 configured to store a computer program capable of running on the processor;

wherein the memory 902 comprises a plurality of memory units, the memory units are used for storing DFT data to be processed, and the processor 901 is configured to execute the method steps in the foregoing embodiments when running a computer program.

Of course, in actual practice, the various components of the electronic device are coupled together by a bus system 903, as shown in FIG. 9. It is understood that the bus system 903 is used to enable communications among the components. The bus system 903 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as the bus system 903 in FIG. 9.

In practical applications, the processor may be at least one of an Application Specific Integrated Circuit (ASIC), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a controller, a microcontroller, and a microprocessor. It is understood that the electronic devices for implementing the above processor functions may be other devices, and the embodiments of the present application are not limited in particular.

The Memory may be a volatile Memory (volatile Memory), such as a Random-Access Memory (RAM); or a non-volatile Memory (non-volatile Memory), such as a Read-Only Memory (ROM), a flash Memory (flash Memory), a Hard Disk (HDD), or a Solid-State Drive (SSD); or a combination of the above types of memories and provides instructions and data to the processor.

In an exemplary embodiment, the present application further provides a computer readable storage medium, such as a memory including a computer program, which is executable by a processor of an electronic device to perform the steps of the foregoing method.

It is to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. The expressions "having", "may have", "include" and "contain", or "may include" and "may contain" in this application may be used to indicate the presence of corresponding features (e.g. elements such as values, functions, operations or components) but does not exclude the presence of additional features.

It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another, and are not necessarily used to describe a particular order or sequence. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present invention.

The technical solutions described in the embodiments of the present application can be arbitrarily combined without conflict.

In the several embodiments provided in the present application, it should be understood that the disclosed method, apparatus, and device may be implemented in other ways. The above-described embodiments are merely illustrative, and for example, the division of a unit is only one logical function division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, all functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application.

Claims

1. A DFT parallel processing method, the method comprising:

2. The method of claim 1, wherein after writing the at least one set of output data to a storage space corresponding to input data, the method further comprises:

determining the state information of m counters of at least one new group of input data according to a preset parallel address-taking rule;

when the state information of the m counters is not preset state information, acquiring at least one new group of input data according to the state information of the m counters;

and when the state information of the m counters is preset state information, determining that the processing of the butterfly unit at the current stage is finished.

3. The method of claim 2, further comprising:

determining that the processing of the m-level butterfly units is finished, and determining state information of m counters outputting data according to a preset output addressing rule;

determining a storage address of the output data according to the state information of the m counters and a preset address mapping rule;

and reading the output data from the storage unit according to the storage address.

4. The method of claim 1, further comprising:

performing address decomposition on each data in DFT data to be processed to obtain state information of m counters corresponding to each data;

determining the storage address of each datum according to the state information of the m counters and a preset address mapping rule;

and storing the DFT data to be processed to the storage unit according to the storage address.

5. The method according to any of claims 1-4, wherein the state information of the m counters comprises m bits;

6. The method of claim 5, wherein the m bits sequentially correspond to the m levels of butterfly units from top to bottom, and each bit is set to be a base N of each level of butterfly unit_s；

7. The method of claim 5, wherein the parallel addressing rules comprise:

8. The method according to any of claims 1-4, wherein the address mapping rule comprises:

determining the storage unit identification of the at least one group of input data according to the parallelism of each level of butterfly unit, the state information of the m counters and the maximum parallelism of the storage unit;

and determining the offset address of the storage unit of the at least one group of input data according to the base of each stage of butterfly unit and the state information of the m counters.

9. The method of claim 8, wherein the address mapping rule comprises:

10. The method of claim 8, further comprising:

decomposing the DFT operation into m-level butterfly operation according to the DFT data length, and determining the basis of the m-level butterfly unit;

determining the maximum parallelism of a storage unit according to the basis of the m-level butterfly unit;

and determining the parallelism of the m-level butterfly units according to the maximum parallelism of the storage units and the bases of the m-level butterfly units.

11. The method of claim 10, wherein the m-level butterfly units are relatively prime to any two levels of butterfly units; or when the bases of the two levels of butterfly units are not mutually prime in the m levels of butterfly units, the two levels of butterfly units are set to be two continuous levels.

12. The method of claim 10, wherein determining the maximum parallelism of the memory cells according to the m-level butterfly cell bases comprises:

taking the maximum base in the m-level butterfly unit bases as the maximum parallelism of the storage unit;

or, taking the product of the bases of at least two levels of butterfly units in the m-level butterfly unit bases as the maximum parallelism of the storage unit;

wherein the maximum parallelism is greater than or equal to each level of butterfly unit basis.

13. The method of claim 10, wherein determining the parallelism of the m-level butterfly units based on the maximum parallelism of the memory units and the bases of the m-level butterfly units comprises:

acquiring at least one parallelism in which the product of the product and the basis of each stage of butterfly unit is less than or equal to the maximum parallelism;

selecting one parallelism which can be evenly divided by the basis of the butterfly unit of the target level from the at least one parallelism as the parallelism of the butterfly unit of each level;

and the target-level butterfly unit is a first-level butterfly unit forming the maximum parallelism.

14. The method of claim 8, further comprising:

presetting a mapping relation of at least one DFT data length, the base of the m-level butterfly unit and the parallelism of the m-level butterfly unit;

and determining the basis of the m-level butterfly units and the parallelism of the m-level butterfly units according to the mapping relation and the DFT data length.

15. A DFT parallel processing apparatus, the apparatus comprising: the device comprises a processing unit, an address management unit and a plurality of storage units; wherein,

16. An electronic device, characterized in that the electronic device comprises: a processor and a memory configured to store a computer program operable on the processor, wherein the processor is configured to perform the steps of the method of any one of claims 1 to 14 when the computer program is executed by the processor.

17. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 14.