CN115080915A - Vectorization decomposition method, vectorization decomposition device, vectorization decomposition chip, vectorization chip module and storage medium - Google Patents
Vectorization decomposition method, vectorization decomposition device, vectorization decomposition chip, vectorization chip module and storage medium Download PDFInfo
- Publication number
- CN115080915A CN115080915A CN202210712417.5A CN202210712417A CN115080915A CN 115080915 A CN115080915 A CN 115080915A CN 202210712417 A CN202210712417 A CN 202210712417A CN 115080915 A CN115080915 A CN 115080915A
- Authority
- CN
- China
- Prior art keywords
- decomposition
- dft
- vectorization
- memory
- sub
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/14—Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Theoretical Computer Science (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Algebra (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Computing Systems (AREA)
- Complex Calculations (AREA)
Abstract
The application discloses a vectorization decomposition method, a vectorization decomposition device, a chip module and a storage medium. The method is applied to vectorization decomposition of N point data, and comprises the following steps: inputting N point data to a memory; respectively reading L point data from a memory, caching the L point data to M sub Discrete Fourier Transform (DFT) units, and enabling N to be M L; and performing L-point DFT operation in each sub-DFT unit of the M sub-DFT units to obtain M first vector quantization decomposition results. Corresponding devices, chips, chip modules and storage media are also disclosed. By adopting the scheme of the application, high-speed and low-delay DFT calculation is realized.
Description
Technical Field
The present application relates to the field of computers, and in particular, to a vectorization decomposition method, device, chip module, and storage medium.
Background
In the fifth generation (5) th generation, 5G) communication system needs to complete Discrete Fourier Transform (DFT) (power of non-2) transform and Fast Fourier Transform (FFT)/inverse Fast Fourier Transform (FFT) 2 power transform, which has very high requirement on delay and severe resource consumption. However, at present, there is no high-speed and low-delay calculation method.
Disclosure of Invention
The application provides a vectorization decomposition method, a vectorization decomposition device, a chip module and a storage medium, so as to realize high-speed and low-delay DFT calculation.
In a first aspect, a vectorization decomposition method is provided, which is applied to vectorization decomposition of N-point data, and includes:
inputting N point data to a memory;
respectively reading L point data from the memories, and caching the L point data to M sub Discrete Fourier Transform (DFT) units, wherein N is M L;
and performing L-point DFT operation in each sub DFT unit of the M sub DFT units to obtain M first vector quantization decomposition results.
In one possible implementation, the method further comprises:
and multiplying the M first vector quantization decomposition results by a twiddle factor respectively to obtain M second vector quantization decomposition results.
In another possible implementation, the method further comprises:
and performing radix-6 butterfly operation on the M second vectorization decomposition results to obtain a third vectorization decomposition result, and storing the third vectorization decomposition result in the memory.
In yet another possible implementation, the method further comprises:
and performing radix-8 butterfly operation on the M second direction quantization decomposition results to obtain a fourth direction quantization decomposition result, and storing the fourth direction quantization decomposition result in the memory.
In yet another possible implementation, the method further comprises:
storing the M second quantized decomposition results to the memory.
In yet another possible implementation, N is 24, M is 6, and L is 4; or
The N is 24, the M is 8, and the L is 3.
In yet another possible implementation, where N is 720, M is 8, and L is 90, and performing an L-point DFT operation in each sub-DFT unit to obtain M first vector quantization decomposition results includes:
and serially extracting 9 numbers every 10 points to perform 9-point DFT operation to obtain the M first vector quantization decomposition results.
In yet another possible implementation, where N is 720, M is 6, and L is 120, and performing an L-point DFT operation in each sub-DFT unit to obtain M first vector quantization decomposition results includes:
and serially extracting 12 numbers every 10 points to perform 12-point DFT operation to obtain the M first vector quantization decomposition results.
In a second aspect, there is provided a vectorization decomposition apparatus applied to vectorization decomposition of N-point data, the apparatus comprising:
an input unit for inputting the N-point data to the memory;
the cache unit is used for respectively reading L point data from the memory and caching the L point data to M sub Discrete Fourier Transform (DFT) units, wherein N is M L;
and the first operation unit is used for performing L-point DFT operation in each sub-DFT unit in the M sub-DFT units to obtain M first vector quantization decomposition results.
In one possible implementation, the apparatus further comprises:
and the second operation unit is used for multiplying the M first vector quantization decomposition results by twiddle factors respectively to obtain M second vector quantization decomposition results.
In another possible implementation, the apparatus further includes:
the third operation unit is used for carrying out radix-6 butterfly operation on the M second vectorization decomposition results to obtain a third vectorization decomposition result; a first storage unit to store the third vectorized decomposition result to the memory.
In yet another possible implementation, the apparatus further includes:
the fourth operation unit is used for carrying out radix-8 butterfly operation on the M second directional quantized decomposition results to obtain a fourth directional quantized decomposition result; a second storage unit, configured to store the fourth directional quantized decomposition result in the memory.
In yet another possible implementation, the apparatus further includes:
a third storage unit, configured to store the M second quantized decomposition results into the memory.
In yet another possible implementation, N is 24, M is 6, and L is 4; or
The N is 24, the M is 8, and the L is 3.
In yet another possible implementation, N is 720, M is 8, L is 90, and the first operation unit is configured to serially extract 9 numbers every 10 points to perform a 9-point DFT operation, so as to obtain the M first vector quantization decomposition results.
In yet another possible implementation, N is 720, M is 6, L is 120, and the first operation unit is configured to serially extract 12 numbers every 10 points to perform a 12-point DFT operation, so as to obtain the M first vector quantization decomposition results.
In a third aspect, there is provided a vectorization decomposition apparatus, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements, when executing the computer program:
inputting N point data to a memory;
respectively reading L point data from the memories, and caching the L point data to M sub Discrete Fourier Transform (DFT) units, wherein N is M L;
and performing L-point DFT operation in each sub DFT unit of the M sub DFT units to obtain M first vector quantization decomposition results.
In one possible implementation, the processor is further configured to implement:
and multiplying the M first vector quantization decomposition results by a twiddle factor respectively to obtain M second vector quantization decomposition results.
In another possible implementation, the processor is further configured to implement:
and performing radix-6 butterfly operation on the M second vectorization decomposition results to obtain a third vectorization decomposition result, and storing the third vectorization decomposition result in the memory.
In yet another possible implementation, the processor is further configured to implement:
and performing radix-8 butterfly operation on the M second direction quantization decomposition results to obtain a fourth direction quantization decomposition result, and storing the fourth direction quantization decomposition result in the memory.
In yet another possible implementation, the processor is further configured to implement:
storing the M second quantized decomposition results to the memory.
In yet another possible implementation, N is 24, M is 6, and L is 4; or
The N is 24, the M is 8, and the L is 3.
In yet another possible implementation, where N is 720, M is 8, and L is 90, the step of performing, by the processor, an L-point DFT operation in each sub-DFT unit to obtain M first vector quantization decomposition results includes:
and serially extracting 9 numbers every 10 points to perform 9-point DFT operation to obtain the M first vector quantization decomposition results.
In yet another possible implementation, where N is 720, M is 6, and L is 120, the step of performing, by the processor, an L-point DFT operation in each sub-DFT unit to obtain M first vector quantization decomposition results includes:
and serially extracting 12 numbers every 10 points to perform 12-point DFT operation to obtain the M first vector quantization decomposition results.
In a fourth aspect, a chip for performing the method as described in the first aspect or any one of the first aspect is provided.
In a fifth aspect, a chip module is provided, which includes an interface component and a chip, and is configured to execute the method according to the first aspect or any one of the first aspect.
A sixth aspect provides a computer readable storage medium having stored thereon a computer program or instructions which, when executed by vectorization decomposition apparatus, implements a method as described in the first aspect or any one of the first aspects.
The vectorization decomposition scheme provided by the application has the following beneficial effects:
inputting the N point data into a memory when vectorization decomposition is carried out on the N point data; respectively reading L point data from a memory, and caching the L point data to M sub-Discrete Fourier Transform (DFT) units, wherein N is M L; and performing L-point DFT operation in each sub-DFT unit of the M sub-DFT units to obtain M first vector quantization decomposition results. High-speed and low-delay DFT calculation is realized.
Drawings
Fig. 1 is a schematic flowchart of a vectorization decomposition method according to an embodiment of the present disclosure;
FIG. 2 is a schematic diagram of a generation process of a DFT-s-OFDM symbol in NR according to an embodiment of the present application;
fig. 3a is a schematic structural diagram of a vectorization decomposition apparatus according to an embodiment of the present disclosure;
FIG. 3b is a schematic structural diagram of another vectorization decomposition device according to an embodiment of the present disclosure;
FIG. 4 is a diagram illustrating a 720-point DFT operation according to an embodiment of the present disclosure;
fig. 5 is a schematic diagram of an input flow of a vectorization decomposition scheme according to an embodiment of the present application;
FIG. 6 is a schematic structural diagram of another vectorization decomposition device according to an embodiment of the present disclosure;
fig. 7 is a schematic structural diagram of another vectorization decomposition device according to an embodiment of the present application.
Detailed Description
For better understanding of the technical solutions of the present application, the following detailed descriptions of the embodiments of the present application are provided with reference to the accompanying drawings.
It should be clear that the described embodiments are only a part of the embodiments of the present application, not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The terminology used in the embodiments of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the examples of this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
The embodiment of the application provides a vectorization decomposition scheme, and N point data is input into a memory; respectively reading L point data from a memory, and caching the L point data to M sub-Discrete Fourier Transform (DFT) units, wherein N is M L; and performing L-point DFT operation in each sub-DFT unit of the M sub-DFT units to obtain M first vector quantization decomposition results. High-speed and low-delay DFT calculation is realized.
Referring to fig. 1, a schematic flow chart of a vectorization decomposition method according to an embodiment of the present application is shown, where the method includes the following steps:
s101, inputting N point data into a memory.
After the N-point data is acquired, the N-point data is input to the memory. The memory may be, for example, a Random Access Memory (RAM).
S102, reading L dot data from the memory, and buffering the L dot data in M sub-DFT units, where N is M × L.
As shown in fig. 2, a schematic diagram of a generation flow of a DFT-s-OFDM symbol in NR is shown. In fig. 2, for an Orthogonal Frequency Division Multiplexing (OFDM) waveform, a modulation symbol is directly subjected to subcarrier mapping, and a vector after subcarrier mapping is subjected to IFFT and Cyclic Prefix (CP) addition to form an OFDM symbol. For DFT-s-OFDM waveform, modulation symbol is firstly DFT and then sub-carrier mapping is carried out, and vector after sub-carrier mapping is formed into a DFT-s-OFDM symbol after IFFT and CP are added.
As shown in fig. 2, the DFT-s-OFDM waveform contains one additional DFT operation at the transmitting device (also referred to as the "transmitter") relative to the OFDM waveform. Correspondingly, an additional Inverse Discrete Fourier Transform (IDFT) operation is required at the receiving device (also called "receiver").
Therefore, in the 5G system, DFT transform needs to be completed, and a 6-4096-point FFT/IFFT processor needs to be employed to implement OFDM transform and inverse transform of the signal. For the transmitting device and the receiving device, the DFT processor is very critical, it needs to satisfy the ultra-low delay characteristic of the 5G technology, and the resource consumption needs to be as low as possible, the DFT processor of this embodiment can complete the fourier transform and compatibly complete the 6-4096-point FFT/IFFT transform, and it has very high challenge to implement.
The present embodiment is applied to vectorization decomposition of N-point data, and the N-point DFT is divided into M equal-length parts, assuming that the N-point DFT contains a factor M. The M equal-length parts correspond to M sub-DFT units (sub _ DFT). The M sub-DFT units have the same data stream form. Therefore, L points of data are read from the memory and buffered to M sub-DFT units, where N is M × L.
For example, the N-point DFT contains a factor of 6, so that the length of the DFT can be divided into 6 equal-length parts. These 6 equal length sections correspond to 6 sub-DFT units. The 6 sub-DFT units have the same data stream form.
Also for example, since the FFT/IFFT of the N-point data all have a factor of 8, the length of the DFT can be divided into 8 equal-length parts. These 8 equal length sections correspond to 8 sub-DFT units. The 8 sub-DFT units have the same data stream form.
As shown in fig. 3a, for a schematic structural diagram of a vectorization decomposition apparatus provided in an embodiment of the present application, a length of a DFT is divided into 8 equal length portions. These 8 equal length parts correspond to 8 sub-DFT units (sub _ DFT0 to sub _ DFT 7). The 8 sub-DFT units have the same data stream form. Before vectorization decomposition, data are sequentially input into the RAM from top to bottom and from left to right. When vectorization decomposition is performed, the same number of points of data are read from the RAM and cached in the 8 sub-DFT units, respectively.
As shown in fig. 3b, a schematic structural diagram of another vectorization decomposition apparatus provided in the embodiment of the present application is shown, where the length of DFT is divided into 6 equal-length portions. These 6 equal length parts correspond to 6 sub-DFT units (sub _ DFT0 to sub _ DFT 5). The 6 sub-DFT units have the same data stream form. Before vectorization decomposition, data are sequentially input into the RAM from top to bottom and from left to right. When vectorization decomposition is performed, data of the same number of points is read from the RAM and buffered in the 6 sub-DFT units.
As shown in fig. 3a, there are 8 sub-DFT units with the same structure. They work simultaneously, namely the vectorization process of DFT. As shown in fig. 3b, there are a total of 6 sub-DFT units of the same structure. They work simultaneously, namely the vectorization process of DFT.
Each sub-DFT unit operates in a string mode. The L-point data read from each memory is input to each sub-DFT unit, respectively.
S103, performing L-point DFT operation in each sub-DFT unit of the M sub-DFT units to obtain M first vector quantization decomposition results.
After each sub-DFT unit obtains L point data, L point DFT operation is carried out in the sub-DFT unit to obtain M first vector quantization decomposition results.
And the M sub-DFT units respectively perform L-point DFT operation in parallel.
After one or more rounds of operation, the DFT point number corresponding to each buffer is calculated.
Further, the method may further comprise the following steps (indicated by dashed lines in the figure):
and S104, multiplying the M first vector quantization decomposition results by the twiddle factors respectively to obtain M second vector quantization decomposition results.
After obtaining the first vector quantization decomposition result of each sub-DFT unit, the first vector quantization decomposition results may be multiplied by the twiddle factors, respectively, to obtain M second vector quantization decomposition results. The twiddle factor is also called the unit root of the N-point DFT operation. As shown in fig. 3a, after 8 first vector quantization decomposition results of 8 sub-DFT units are obtained, the 8 first vector quantization decomposition results may be multiplied by a rotation factor respectively to obtain 8 second vector quantization decomposition results. As shown in fig. 3b, after 6 first vector quantization decomposition results of 6 sub-DFT units are obtained, the 6 first vector quantization decomposition results may be multiplied by a rotation factor respectively to obtain 6 second vector quantization decomposition results.
As shown in fig. 3a and 3b, after the second quantized decomposition result is obtained, the following step S105a or S105b is alternatively performed. Therefore, after step S104, the following steps S105a or S105b may be alternatively performed:
s105a, performing radix-6 butterfly operation on the M second vectorization decomposition results to obtain a third vectorization decomposition result, and storing the third vectorization decomposition result in a memory.
The result of the last round of operation may be input directly into the radix-6 butterfly (rdx6) to reduce the write back and decimation of the data. It is obvious that rdx6 is a parallel operation structure, and the operation result is written back to the memory directly to obtain the final operation result.
For example, for a 6-4096-point FFT/IFFT, the last round of operation would be rdx6 operations by a factor of 6; for a DFT (not a power of 2) transform, its last round may be rdx6 operations.
Through the vectorization decomposition, the hardware structure is regularized, and data does not need to be packed for multiple times. Rdx6 of the last round of operation will implement resource multiplexing.
Exemplarily, step S105a may also be replaced by: and performing radix-8 butterfly operation on the M second direction quantization decomposition results to obtain a fourth direction quantization decomposition result, and storing the fourth direction quantization decomposition result into a memory.
The operation result can be directly input into the radix-8 butterfly operation node (rdx8) at the time of the last round of operation, and the write-back and extraction actions of data are reduced. It is obvious that rdx8 is a parallel operation structure, and the operation result is written back to the memory directly to obtain the final operation result.
For example, for a 6-4096-point FFT/IFFT, the last round of operation would be rdx8 operations by a factor of 8; for the DFT transform, its last round may be operation rdx 8.
Through the vectorization decomposition, the hardware structure is regularized, and data does not need to be packed for multiple times. Rdx8 of the last round of operation will implement resource multiplexing.
And S105, 105b, storing the M second quantized decomposition results into a memory.
As shown in fig. 3a or fig. 3b, if the rdx8 or rdx6 operation is not needed, the second quantized decomposition result may be directly written back after being obtained, and stored in the memory corresponding to each sub-DFT unit.
After the step S105a or S105b is executed, the vectorized decomposition result is written back to the memory all the way through the strobe.
The following is illustrated by way of example:
in one example, N-24. For a 24 point DFT, it can be decomposed into 3 × 8 and 4 × 6. Here, the DFT-vectorized decomposition operation is briefly described as decomposition into 4 × 6: and simultaneously reading 6 RAMs, and respectively and serially sending the read data into sub _ DFT0, sub _ DFT1, sub _ DFT2, sub _ DFT3, sub _ DFT4 and sub _ DFT 5. The 6 sub-DFT units respectively process the 4-point DFT operation in series, the operation result is multiplied by the twiddle factor and then is simultaneously sent to rdx6, and the parallel operation result is simultaneously written back to 6 RAMs. By adopting the method, the processing time of the system link is reduced to the maximum extent, and the resource consumption is saved.
In another example, N720. Fig. 4 is a schematic diagram of a 720-point DFT operation provided in the embodiment of the present application, which can be decomposed into 9 × 10 × 8 for the 720-point DFT. The DFT vectorization decomposition operation process is as follows: first, each RAM needs to complete 90-point DFT operation in series. Specifically, 9 numbers are serially extracted every 10 points to be subjected to 9-point DFT, 10 9-point DFT are totally performed, and the serial product of the result is calculated, multiplied by the twiddle factor and written back to the memory (such as RAM 0-RAM 7 shown in the figure). Then, 10-point data are sequentially and serially extracted to be subjected to 10-point DFT, 9 points are totally performed, the calculation result is multiplied by a twiddle factor and is sent to an original (private) rdx module to be subjected to 8-point parallel DFT operation, and the calculation result is written back to 8 pieces of RAM (such as RAM 0-RAM 7 shown in the figure) in parallel.
In yet another example, N720. For a 720 point DFT, it can be decomposed into 6 × 10 × 12. The DFT vectorization decomposition operation process is as follows: first, each RAM needs to complete 120-point DFT operation in series. Specifically, 12 numbers are serially extracted every 10 points to be subjected to 12-point DFT, 10 12-point DFT are totally performed, and the serial product of the result is calculated, multiplied by the twiddle factor and written back to the memory. And then, sequentially and serially extracting 12 points of data to perform 12-point DFT, performing 9 points of DFT in total, multiplying the calculation result by a twiddle factor, sending the result to a private rdx module to perform 6-point parallel DFT operation, and writing the operation result back to 6 RAM chips in parallel.
Fig. 5 is a schematic diagram of a progression flow of a vectorization decomposition scheme according to an embodiment of the present application. It is assumed that the length of DFT is divided into 8 equal-length parts, and the 8 equal-length parts correspond to 8 RAMs (RAM0 to RAM 7). Before vectorization decomposition, data are sequentially input into 8 RAMs from top to bottom and from left to right. In the above example, when data is sequentially serially extracted, the data is sequentially serially extracted in the order in which the data is input.
The above processing flows of the 24-point DFT and the 720-point DFT are merely examples, and the processing flows of other points are similar thereto.
As shown in Table 1 below, is an exemplary decomposition of some DFT points:
TABLE 1
It can be seen that the above DFT, after point decomposition, contains a factor of 6 or 8. 67 points, the minimum support points 6. Therefore, a similar DFT vector decomposition structure as described above can be used for vector decomposition.
According to the vectorization decomposition method provided by the embodiment of the application, N point data is input into a memory; respectively reading L point data from a memory, and caching the L point data into M caches, wherein N is M L; respectively inputting the L point data read from each buffer in the M buffers into a sub DFT unit corresponding to each buffer; and performing L-point DFT operation in each sub DFT unit to obtain M first vector quantization decomposition results. High-speed and low-delay DFT calculation is realized.
It is understood that, in order to implement the functions in the above embodiments, the vectorization decomposition apparatus includes a corresponding hardware structure and/or software module for executing each function. Those of skill in the art will readily appreciate that the various illustrative elements and method steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is performed as hardware or computer software driven hardware depends on the particular application scenario and design constraints imposed on the solution.
Fig. 6 and fig. 7 are schematic structural diagrams of a possible vectorization decomposition apparatus according to an embodiment of the present application. These vectorization decomposition means can be used to implement the functions of the vectorization decomposition means in the above method embodiments, and therefore, the advantageous effects of the above method embodiments can also be achieved. In the embodiment of the present application, the vectorization decomposition apparatus may be an electronic device, and may also be a module (e.g., a chip module) applied to the electronic device. Illustratively, the vectoring decomposition means may be a modem in the communication system.
Fig. 6 is a schematic structural diagram of another vectorization decomposition device according to an embodiment of the present application. The apparatus 600 comprises:
an input unit 601 for inputting N-point data to a memory;
a cache unit 602, configured to read L point data from the memory, respectively, and cache the L point data in M sub-DFT units, where N is M × L;
a first operation unit 603, configured to perform an L-point DFT operation in each of the M sub-DFT units to obtain M first vector quantization decomposition results.
In one possible implementation, the apparatus further comprises (indicated by the dashed line in the figure):
a second operation unit 604, configured to multiply the M first vector quantization decomposition results by twiddle factors, respectively, to obtain M second vector quantization decomposition results.
In another possible implementation, the apparatus further comprises (indicated by the dashed line in the figure):
a third operation unit 605, configured to perform radix-6 butterfly operation on the M second quantized decomposition results to obtain a third quantized decomposition result; a first storage unit 606 for storing the third vectorized decomposition result to the memory.
In yet another possible implementation, the apparatus further comprises (not shown in the figures):
the third operation unit is used for carrying out radix-8 butterfly operation on the M second directional quantized decomposition results to obtain a fourth directional quantized decomposition result; a second storage unit, configured to store the fourth directional quantized decomposition result in the memory.
In yet another possible implementation, the apparatus further comprises (indicated by the dashed line in the figure):
a third storage unit 607, configured to store the M second quantized decomposition results into the memory.
In yet another possible implementation, N is 24, M is 6, and L is 4; or
The N is 24, the M is 8, and the L is 3.
In yet another possible implementation, N is 720, M is 8, L is 90, and the first operation unit 603 is configured to serially extract 9 numbers every 10 points to perform a 9-point DFT operation, so as to obtain the M first vector quantization decomposition results.
In yet another possible implementation, N is 720, M is 6, L is 120, and the first operation unit 603 is configured to serially extract 12 numbers every 10 points to perform a 12-point DFT operation, so as to obtain the M first vector quantization decomposition results.
For the specific implementation of the above units, reference may be made to the description of the embodiment shown in fig. 1, and details are not repeated here.
According to the vectorization decomposition device provided by the embodiment of the application, N point data is input into a memory; respectively reading L point data from a memory, and caching the L point data to M sub-DFT units, wherein N is M L; and performing L-point DFT operation in each sub-DFT unit of the M sub-DFT units to obtain M first vector quantization decomposition results. High-speed and low-delay DFT calculation is realized.
Fig. 7 is a schematic structural diagram of another vectorization decomposition device according to an embodiment of the present application. The apparatus 700 includes at least a processor 701, an input device 702, an output device 703, and a computer storage medium 704. The processor 701, the input device 702, the output device 703, and the computer storage medium 704 in the apparatus may be connected by a bus or other means.
A computer storage medium 704 may be stored in the memory of the apparatus, the computer storage medium 704 being for storing a computer program comprising program instructions, the processor 701 being for executing the program instructions stored by the computer storage medium 704. The processor 701 is the computational core and control core of the apparatus, and is adapted to implement one or more instructions, and in particular to load and execute the one or more instructions to implement the corresponding method flows or the corresponding functions.
In one embodiment, the processor 701 according to the embodiment of the present application may be configured to load and execute the method steps according to the embodiment shown in fig. 1.
Specifically, the processor 701, when executing the computer program, implements:
inputting N point data to a memory;
respectively reading L point data from the memory, and caching the L point data to M sub DFT units, wherein N is M L;
and performing L-point DFT operation in each sub DFT unit of the M sub DFT units to obtain M first vector quantization decomposition results.
In one possible implementation, the processor 701 is further configured to implement:
and multiplying the M first vector quantization decomposition results by a twiddle factor respectively to obtain M second vector quantization decomposition results.
In another possible implementation, the processor 701 is further configured to implement:
and performing radix-6 butterfly operation on the M second vectorization decomposition results to obtain a third vectorization decomposition result, and storing the third vectorization decomposition result in the memory.
In yet another possible implementation, the processor 701 is further configured to implement:
and performing radix-8 butterfly operation on the M second direction quantization decomposition results to obtain a fourth direction quantization decomposition result, and storing the fourth direction quantization decomposition result in the memory.
In yet another possible implementation, the processor 701 is further configured to implement:
storing the M second quantized decomposition results to the each memory.
In yet another possible implementation, N is 24, M is 6, and L is 4; or
The N is 24, the M is 8, and the L is 3.
In yet another possible implementation, where N is 720, M is 8, and L is 90, the step of performing, by the processor 701, the L-point DFT operation in each sub-DFT unit to obtain M first vector quantization decomposition results includes:
and serially extracting 9 numbers every 10 points to perform 9-point DFT operation to obtain the M first vector quantization decomposition results.
In yet another possible implementation, where N is 720, M is 6, and L is 120, the step of performing, by the processor 701, the L-point DFT operation in each sub-DFT unit to obtain M first vector quantization decomposition results includes:
and serially extracting 12 numbers every 10 points to perform 12-point DFT operation to obtain the M first vector quantization decomposition results.
According to the vectorization decomposition device provided by the embodiment of the application, N point data is input into a memory; respectively reading L point data from a memory, and caching the L point data to M sub-DFT units, wherein N is M L; and performing L-point DFT operation in each sub-DFT unit of the M sub-DFT units to obtain M first vector quantization decomposition results. High-speed and low-delay DFT calculation is realized.
It should be noted that one or more of the above units or units may be implemented in software, hardware or a combination of both. When any of the above units or units are implemented in software, which is present as computer program instructions and stored in a memory, a processor may be used to execute the program instructions and implement the above method flows. The processor may be built in a system on chip (SoC) or an Application Specific Integrated Circuit (ASIC), or may be a separate semiconductor chip. The processor may further include a necessary hardware accelerator such as a Field Programmable Gate Array (FPGA), a Programmable Logic Device (PLD), or a logic circuit for implementing a dedicated logic operation, in addition to a core for executing software instructions to perform operations or processing.
When the above units or units are implemented in hardware, the hardware may be any one or any combination of a CPU, a microprocessor, a Digital Signal Processing (DSP) chip, a Micro Controller Unit (MCU), an artificial intelligence processor, an ASIC, an SoC, an FPGA, a PLD, a dedicated digital circuit, a hardware accelerator, or a non-integrated discrete device, which may run necessary software or is independent of software to perform the above method flow.
Each module/unit included in each apparatus and product described in the above embodiments may be a software module/unit or a hardware module/unit; or partly as software modules/units and partly as hardware modules/units. For example, for each device or product applied to or integrated into a chip, each module/unit included in the device or product may be implemented by hardware such as a circuit, or at least a part of the module/unit may be implemented by a software program running on a processor integrated within the chip, and the rest (if any) part of the module/unit may be implemented by hardware such as a circuit; for each device or product corresponding to or integrated with the chip module, each module/unit included in the device or product may be implemented by hardware such as a circuit, and different modules/units may be located in the same component (e.g., a chip, a circuit module, etc.) or different components of the chip module, or at least some of the modules/units may be implemented by a software program running on a processor integrated within the chip module, and the rest (if any) of the modules/units may be implemented by hardware such as a circuit; for each device and product applied to or integrated in the terminal, each module/unit included in the device and product may be implemented by hardware such as a circuit, different modules/units may be located in the same component (e.g., a chip, a circuit module, etc.) or different components in the terminal, or at least part of the modules/units may be implemented by a software program running on a processor integrated in the terminal, and the rest (if any) part of the modules/units may be implemented by hardware such as a circuit.
The method steps in the embodiments of the present application may be implemented by hardware, or may be implemented by software instructions executed by a processor. The software instructions may be comprised of corresponding software modules that may be stored in random access memory, flash memory, read only memory, programmable read only memory, erasable programmable read only memory, electrically erasable programmable read only memory, registers, a hard disk, a removable hard disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an ASIC. In addition, the ASIC may reside in an access network device or terminal. Of course, the processor and the storage medium may reside as discrete components in an access network device or terminal.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer programs or instructions. When the computer program or instructions are loaded and executed on a computer, the processes or functions described in the embodiments of the present application are performed in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, an access network device, a user device, or other programmable apparatus. The computer program or instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer program or instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire or wirelessly. The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that integrates one or more available media. The usable medium may be a magnetic medium, such as a floppy disk, a hard disk, a magnetic tape; optical media such as digital video disks; but also semiconductor media such as solid state disks.
In the embodiments of the present application, unless otherwise specified or conflicting with respect to logic, the terms and/or descriptions in different embodiments have consistency and may be mutually cited, and technical features in different embodiments may be combined to form a new embodiment according to their inherent logic relationship.
In this application, "at least one" means one or more, "a plurality" means two or more. "and/or" describes the association relationship of the associated object, indicating that there may be three relationships, for example, a and/or B, which may indicate: a exists alone, A and B exist simultaneously, and B exists alone, wherein A and B can be singular or plural. In the description of the text of the present application, the character "/" generally indicates that the former and latter associated objects are in an "or" relationship; in the formula of the present application, the character "/" indicates that the preceding and following related objects are in a relationship of "division".
It is to be understood that the various numerical references referred to in the embodiments of the present application are merely for descriptive convenience and are not intended to limit the scope of the embodiments of the present application. The sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of the processes should be determined by their functions and inherent logic.
Claims (14)
1. A vectorization decomposition method applied to vectorization decomposition of N-point data, the method comprising:
inputting N point data to a memory;
respectively reading L point data from the memories, and caching the L point data to M sub Discrete Fourier Transform (DFT) units, wherein N is M L;
and performing L-point DFT operation in each sub DFT unit of the M sub DFT units to obtain M first vector quantization decomposition results.
2. The method of claim 1, further comprising:
and multiplying the M first vector quantization decomposition results by a twiddle factor respectively to obtain M second vector quantization decomposition results.
3. The method of claim 2, further comprising:
performing radix-6 butterfly operation on the M second vectorial decomposition results to obtain a third vectorial decomposition result;
storing the third vectorized decomposition result to the memory.
4. The method of claim 2, further comprising:
performing radix-8 butterfly operation on the M second directional quantized decomposition results to obtain a fourth directional quantized decomposition result;
storing the fourth direction quantized decomposition result to the memory.
5. The method of claim 2, further comprising:
storing the M second quantized decomposition results to the memory.
6. A vectorization decomposition apparatus applied to vectorization decomposition of N-point data, the apparatus comprising:
an input unit for inputting the N-point data to the memory;
the cache unit is used for respectively reading L point data from the memory and caching the L point data to M sub Discrete Fourier Transform (DFT) units, wherein N is M L;
and the first operation unit is used for performing L-point DFT operation in each sub-DFT unit in the M sub-DFT units to obtain M first vector quantization decomposition results.
7. The apparatus of claim 6, further comprising:
and the second operation unit is used for multiplying the M first vector quantization decomposition results by twiddle factors respectively to obtain M second vector quantization decomposition results.
8. The apparatus of claim 7, further comprising:
the third operation unit is used for carrying out radix-6 butterfly operation on the M second vectorization decomposition results to obtain a third vectorization decomposition result;
a first storage unit to store the third vectorized decomposition result to the memory.
9. The apparatus of claim 7, further comprising:
the fourth operation unit is used for carrying out radix-8 butterfly operation on the M second directional quantized decomposition results to obtain a fourth directional quantized decomposition result;
a second storage unit, configured to store the fourth directional quantized decomposition result in the memory.
10. The apparatus of claim 7, further comprising:
a third storage unit, configured to store the M second quantized decomposition results into the memory.
11. A vectorization decomposition apparatus comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method according to any of claims 1 to 5 when executing the computer program.
12. A chip for performing the method of any one of claims 1-5.
13. A chip module comprising an interface component and a chip for performing the method of any one of claims 1-5.
14. A computer-readable storage medium, in which a computer program or instructions are stored which, when executed by a vectorization decomposition apparatus, implement the method according to any one of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210712417.5A CN115080915A (en) | 2022-06-22 | 2022-06-22 | Vectorization decomposition method, vectorization decomposition device, vectorization decomposition chip, vectorization chip module and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210712417.5A CN115080915A (en) | 2022-06-22 | 2022-06-22 | Vectorization decomposition method, vectorization decomposition device, vectorization decomposition chip, vectorization chip module and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115080915A true CN115080915A (en) | 2022-09-20 |
Family
ID=83252726
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210712417.5A Pending CN115080915A (en) | 2022-06-22 | 2022-06-22 | Vectorization decomposition method, vectorization decomposition device, vectorization decomposition chip, vectorization chip module and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115080915A (en) |
-
2022
- 2022-06-22 CN CN202210712417.5A patent/CN115080915A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9588991B2 (en) | Image search device, image search method, program, and computer-readable storage medium | |
US10152455B2 (en) | Data processing method and processor based on 3072-point fast Fourier transformation, and storage medium | |
EP2408158B1 (en) | Circuit and method for implementing fft/ifft transform | |
CN105740405B (en) | Method and device for storing data | |
CN111737638A (en) | Data processing method based on Fourier transform and related device | |
CN112383497B (en) | OFDM conversion method in 5G system and related product | |
CN117708164A (en) | Data storage method, device and equipment based on parallel processing database | |
CN111221501B (en) | Number theory conversion circuit for large number multiplication | |
US9268744B2 (en) | Parallel bit reversal devices and methods | |
CN115080915A (en) | Vectorization decomposition method, vectorization decomposition device, vectorization decomposition chip, vectorization chip module and storage medium | |
CN115344526B (en) | Hardware acceleration method and device of data flow architecture | |
KR20140142927A (en) | Mixed-radix pipelined fft processor and method using the same | |
CN113591022B (en) | Method and device for processing read-write scheduling of decomposable data | |
WO2022100584A1 (en) | Twice fft and ifft method, and related product | |
CN115878949A (en) | Signal processing method and related equipment | |
CN114297570A (en) | FFT realizing device and method for communication system | |
CN116805027A (en) | DFT multiplexing method and device, communication equipment and storage medium | |
US8601045B2 (en) | Apparatus and method for split-radix-2/8 fast fourier transform | |
CN112306453A (en) | FFT operation control device | |
CN104572578B (en) | Novel method for significantly improving FFT performance in microcontrollers | |
Xiu-fang et al. | Design and Implement of FFT Processor for OFDMA system using FPGA | |
CN115391727B (en) | Calculation method, device and equipment of neural network model and storage medium | |
CN118277710A (en) | FFT/IFFT processing method and processing device, electronic equipment, chip and storage medium | |
CN116775510B (en) | Data access method, device, server and computer readable storage medium | |
CN112163187B (en) | Ultra-long point high-performance FFT (fast Fourier transform) computing device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |