CN115080915A - Vectorized decomposition method, device, chip, chip module and storage medium - Google Patents

Vectorized decomposition method, device, chip, chip module and storage medium Download PDF

Info

Publication number
CN115080915A
CN115080915A CN202210712417.5A CN202210712417A CN115080915A CN 115080915 A CN115080915 A CN 115080915A CN 202210712417 A CN202210712417 A CN 202210712417A CN 115080915 A CN115080915 A CN 115080915A
Authority
CN
China
Prior art keywords
decomposition
dft
vectorized
memory
sub
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210712417.5A
Other languages
Chinese (zh)
Inventor
顾明飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Spreadtrum Semiconductor Chengdu Co Ltd
Original Assignee
Spreadtrum Semiconductor Chengdu Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Spreadtrum Semiconductor Chengdu Co Ltd filed Critical Spreadtrum Semiconductor Chengdu Co Ltd
Priority to CN202210712417.5A priority Critical patent/CN115080915A/en
Publication of CN115080915A publication Critical patent/CN115080915A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/14Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Algebra (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Complex Calculations (AREA)

Abstract

本申请公开了一种向量化分解方法、装置、芯片、芯片模组及存储介质。该方法应用于N点数据的向量化分解,该方法包括:输入N点数据到存储器;从存储器分别读取L点数据,缓存到M个子离散傅里叶变换DFT单元,N=M*L;以及在M个子DFT单元中的每个子DFT单元中进行L点DFT运算,得到M个第一向量化分解结果。还公开了相应的装置、芯片、芯片模组及存储介质。采用本申请的方案,实现了高速、低时延的DFT计算。

Figure 202210712417

The present application discloses a vectorized decomposition method, device, chip, chip module and storage medium. The method is applied to the vectorized decomposition of N-point data, and the method includes: inputting N-point data into a memory; respectively reading L-point data from the memory, and buffering them into M sub-discrete Fourier transform DFT units, N=M*L; and performing L-point DFT operations in each of the M sub-DFT units to obtain M first vectorized decomposition results. Corresponding devices, chips, chip modules and storage media are also disclosed. By adopting the solution of the present application, high-speed and low-latency DFT calculation is realized.

Figure 202210712417

Description

向量化分解方法、装置、芯片、芯片模组及存储介质Vectorized decomposition method, device, chip, chip module and storage medium

技术领域technical field

本申请涉及计算机领域,尤其涉及一种向量化分解方法、装置、芯片、芯片模组及存储介质。The present application relates to the field of computers, and in particular, to a vectorized decomposition method, device, chip, chip module and storage medium.

背景技术Background technique

在第五代(5th generation,5G)通信系统中需要完成离散傅里叶变换(discreteFourier transform,DFT)(非2的幂次方)变换,及快速傅里叶变换(fast Fouriertransform,FFT)/快速傅里叶逆变换(inverse fast Fourier transform)2的幂次方变换,其对时延要求极高,资源消耗比较严苛。然而,目前没有高速、低时延的计算方式。In the fifth generation ( 5th generation, 5G) communication system, it is necessary to complete discrete Fourier transform (discrete Fourier transform, DFT) (non-power of 2) transform, and fast Fourier transform (fast Fourier transform, FFT) / Inverse fast Fourier transform (inverse fast Fourier transform) is a power transform of 2, which requires extremely high latency and strict resource consumption. However, there is currently no high-speed, low-latency computing method.

发明内容SUMMARY OF THE INVENTION

本申请提供了一种向量化分解方法、装置、芯片、芯片模组及存储介质,以实现高速、低时延的DFT计算。The present application provides a vectorized decomposition method, device, chip, chip module and storage medium, so as to realize high-speed and low-latency DFT calculation.

第一方面,提供了一种向量化分解方法,应用于N点数据的向量化分解,所述方法包括:In a first aspect, a vectorized decomposition method is provided, which is applied to the vectorized decomposition of N-point data, and the method includes:

输入N点数据到存储器;Input N point data to the memory;

从所述存储器分别读取L点数据,缓存到M个子离散傅里叶变换DFT单元,其中,N=M*L;L points of data are respectively read from the memory, and buffered into M sub-discrete Fourier transform DFT units, where N=M*L;

在所述M个子DFT单元中的每个子DFT单元中进行L点DFT运算,得到M个第一向量化分解结果。L-point DFT operations are performed in each of the M sub-DFT units to obtain M first vectorized decomposition results.

在一种可能的实现中,所述方法还包括:In a possible implementation, the method further includes:

将所述M个第一向量化分解结果分别乘以旋转因子,得到M个第二向量化分解结果。The M first vectorized decomposition results are respectively multiplied by a twiddle factor to obtain M second vectorized decomposition results.

在另一种可能的实现中,所述方法还包括:In another possible implementation, the method further includes:

将所述M个第二向量化分解结果进行基6蝶形运算,得到第三向量化分解结果,并将所述第三向量化分解结果存储到所述存储器。Perform a radix-6 butterfly operation on the M second vectorized decomposition results to obtain a third vectorized decomposition result, and store the third vectorized decomposition result in the memory.

在又一种可能的实现中,所述方法还包括:In yet another possible implementation, the method further includes:

将所述M个第二向量化分解结果进行基8蝶形运算,得到第四向量化分解结果,并将所述第四向量化分解结果存储到所述存储器。Perform a radix-8 butterfly operation on the M second vectorized decomposition results to obtain a fourth vectorized decomposition result, and store the fourth vectorized decomposition result in the memory.

在又一种可能的实现中,所述方法还包括:In yet another possible implementation, the method further includes:

将所述M个第二向量化分解结果存储到所述存储器。The M second vectorized decomposition results are stored in the memory.

在又一种可能的实现中,所述N为24,所述M为6,所述L为4;或In yet another possible implementation, the N is 24, the M is 6, and the L is 4; or

所述N为24,所述M为8,所述L为3。The N is 24, the M is 8, and the L is 3.

在又一种可能的实现中,所述N为720,所述M为8,所述L为90,所述在所述每个子DFT单元中进行L点DFT运算,得到M个第一向量化分解结果,包括:In yet another possible implementation, the N is 720, the M is 8, and the L is 90, and the L-point DFT operation is performed in each sub-DFT unit to obtain M first vectorization Decomposition results, including:

每隔10个点串行抽取9个数进行9点DFT运算,得到所述M个第一向量化分解结果。9 numbers are serially extracted every 10 points to perform a 9-point DFT operation to obtain the M first vectorized decomposition results.

在又一种可能的实现中,所述N为720,所述M为6,所述L为120,所述在所述每个子DFT单元中进行L点DFT运算,得到M个第一向量化分解结果,包括:In yet another possible implementation, the N is 720, the M is 6, the L is 120, and the L-point DFT operation is performed in each sub-DFT unit to obtain M first vectorization Decomposition results, including:

每隔10个点串行抽取12个数进行12点DFT运算,得到所述M个第一向量化分解结果。Serially extract 12 numbers every 10 points to perform a 12-point DFT operation to obtain the M first vectorized decomposition results.

第二方面,提供了一种向量化分解装置,应用于N点数据的向量化分解,所述装置包括:In a second aspect, a vectorized decomposition device is provided, which is applied to the vectorized decomposition of N-point data, and the device includes:

输入单元,用于输入N点数据到存储器;Input unit, used to input N point data to the memory;

缓存单元,用于从所述存储器分别读取L点数据,缓存到M个子离散傅里叶变换DFT单元,其中,N=M*L;a cache unit, configured to respectively read L point data from the memory, and cache them into M sub-discrete Fourier transform DFT units, where N=M*L;

第一运算单元,用于在所述M个子DFT单元中的每个子DFT单元中进行L点DFT运算,得到M个第一向量化分解结果。A first operation unit, configured to perform L-point DFT operation in each of the M sub-DFT units to obtain M first vectorized decomposition results.

在一种可能的实现中,所述装置还包括:In a possible implementation, the apparatus further includes:

第二运算单元,用于将所述M个第一向量化分解结果分别乘以旋转因子,得到M个第二向量化分解结果。The second operation unit is configured to multiply the M first vectorized decomposition results by a twiddle factor, respectively, to obtain M second vectorized decomposition results.

在另一种可能的实现中,所述装置还包括:In another possible implementation, the apparatus further includes:

第三运算单元,用于将所述M个第二向量化分解结果进行基6蝶形运算,得到第三向量化分解结果;第一存储单元,用于将所述第三向量化分解结果存储到所述存储器。a third operation unit, configured to perform a radix-6 butterfly operation on the M second vectorized decomposition results to obtain a third vectorized decomposition result; a first storage unit, configured to store the third vectorized decomposition result to the memory.

在又一种可能的实现中,所述装置还包括:In yet another possible implementation, the apparatus further includes:

第四运算单元,用于将所述M个第二向量化分解结果进行基8蝶形运算,得到第四向量化分解结果;第二存储单元,用于将所述第四向量化分解结果存储到所述存储器。The fourth operation unit is used for performing radix 8 butterfly operation on the M second vectorization decomposition results to obtain the fourth vectorization decomposition result; the second storage unit is used for storing the fourth vectorization decomposition result to the memory.

在又一种可能的实现中,所述装置还包括:In yet another possible implementation, the apparatus further includes:

第三存储单元,用于将所述M个第二向量化分解结果存储到所述存储器。A third storage unit, configured to store the M second vectorization decomposition results in the memory.

在又一种可能的实现中,所述N为24,所述M为6,所述L为4;或In yet another possible implementation, the N is 24, the M is 6, and the L is 4; or

所述N为24,所述M为8,所述L为3。The N is 24, the M is 8, and the L is 3.

在又一种可能的实现中,所述N为720,所述M为8,所述L为90,所述第一运算单元,用于每隔10个点串行抽取9个数进行9点DFT运算,得到所述M个第一向量化分解结果。In yet another possible implementation, the N is 720, the M is 8, the L is 90, and the first operation unit is used to serially extract 9 numbers every 10 points to perform 9 points DFT operation to obtain the M first vectorized decomposition results.

在又一种可能的实现中,所述N为720,所述M为6,所述L为120,所述第一运算单元,用于每隔10个点串行抽取12个数进行12点DFT运算,得到所述M个第一向量化分解结果。In yet another possible implementation, the N is 720, the M is 6, the L is 120, and the first operation unit is used to serially extract 12 numbers every 10 points to perform 12 points DFT operation to obtain the M first vectorized decomposition results.

第三方面,提供了一种向量化分解装置,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现:In a third aspect, a vectorization decomposition device is provided, comprising a memory, a processor, and a computer program stored on the memory and running on the processor, and the processor implements when the computer program is executed:

输入N点数据到存储器;Input N point data to the memory;

从所述存储器分别读取L点数据,缓存到M个子离散傅里叶变换DFT单元,其中,N=M*L;L points of data are respectively read from the memory, and buffered into M sub-discrete Fourier transform DFT units, where N=M*L;

在所述M个子DFT单元中的每个子DFT单元中进行L点DFT运算,得到M个第一向量化分解结果。L-point DFT operations are performed in each of the M sub-DFT units to obtain M first vectorized decomposition results.

在一种可能的实现中,所述处理器还用于实现:In a possible implementation, the processor is also used to implement:

将所述M个第一向量化分解结果分别乘以旋转因子,得到M个第二向量化分解结果。The M first vectorized decomposition results are respectively multiplied by a twiddle factor to obtain M second vectorized decomposition results.

在另一种可能的实现中,所述处理器还用于实现:In another possible implementation, the processor is also used to implement:

将所述M个第二向量化分解结果进行基6蝶形运算,得到第三向量化分解结果,并将所述第三向量化分解结果存储到所述存储器。Perform a radix-6 butterfly operation on the M second vectorized decomposition results to obtain a third vectorized decomposition result, and store the third vectorized decomposition result in the memory.

在又一种可能的实现中,所述处理器还用于实现:In yet another possible implementation, the processor is further used to implement:

将所述M个第二向量化分解结果进行基8蝶形运算,得到第四向量化分解结果,并将所述第四向量化分解结果存储到所述存储器。Perform a radix-8 butterfly operation on the M second vectorized decomposition results to obtain a fourth vectorized decomposition result, and store the fourth vectorized decomposition result in the memory.

在又一种可能的实现中,所述处理器还用于实现:In yet another possible implementation, the processor is further used to implement:

将所述M个第二向量化分解结果存储到所述存储器。The M second vectorized decomposition results are stored in the memory.

在又一种可能的实现中,所述N为24,所述M为6,所述L为4;或In yet another possible implementation, the N is 24, the M is 6, and the L is 4; or

所述N为24,所述M为8,所述L为3。The N is 24, the M is 8, and the L is 3.

在又一种可能的实现中,所述N为720,所述M为8,所述L为90,所述处理器执行所述在所述每个子DFT单元中进行L点DFT运算,得到M个第一向量化分解结果的步骤,包括:In yet another possible implementation, the N is 720, the M is 8, and the L is 90, and the processor performs the L-point DFT operation in each sub-DFT unit to obtain M A first step of quantizing the decomposition result, including:

每隔10个点串行抽取9个数进行9点DFT运算,得到所述M个第一向量化分解结果。9 numbers are serially extracted every 10 points to perform a 9-point DFT operation to obtain the M first vectorized decomposition results.

在又一种可能的实现中,所述N为720,所述M为6,所述L为120,所述处理器执行所述在所述每个子DFT单元中进行L点DFT运算,得到M个第一向量化分解结果的步骤,包括:In yet another possible implementation, the N is 720, the M is 6, and the L is 120, and the processor performs the L-point DFT operation in each sub-DFT unit to obtain M A first step of quantizing the decomposition result, including:

每隔10个点串行抽取12个数进行12点DFT运算,得到所述M个第一向量化分解结果。Serially extract 12 numbers every 10 points to perform a 12-point DFT operation to obtain the M first vectorized decomposition results.

第四方面,提供了一种芯片,所述芯片,用于执行如第一方面或第一方面的任一种实现所述的方法。In a fourth aspect, a chip is provided, the chip is configured to perform the method according to the first aspect or any one of the first aspects.

第五方面,提供了一种芯片模组,包括接口组件和芯片,所述芯片,用于执行如第一方面或第一方面的任一种实现所述的方法。In a fifth aspect, a chip module is provided, including an interface component and a chip, and the chip is configured to execute the method according to the first aspect or any one of the first aspects.

第六方面,提供了一种计算机可读存储介质,所述存储介质中存储有计算机程序或指令,当所述计算机程序或指令被向量化分解装置执行时,实现如第一方面或第一方面的任一种实现所述的方法。In a sixth aspect, a computer-readable storage medium is provided, and a computer program or instruction is stored in the storage medium. When the computer program or instruction is executed by the vectorization decomposition device, the first aspect or the first aspect is implemented. Any of the methods described above are implemented.

采用本申请提供的向量化分解方案,具有如下有益效果:Adopting the vectorized decomposition scheme provided by this application has the following beneficial effects:

在对N点数据进行向量化分解时,输入N点数据到存储器;从存储器分别读取L点数据,缓存到M个子离散傅里叶变换DFT单元,其中,N=M*L;以及在M个子DFT单元中的每个子DFT单元中进行L点DFT运算,得到M个第一向量化分解结果。实现了高速、低时延的DFT计算。When vectorizing and decomposing N-point data, input N-point data to the memory; respectively read L-point data from the memory, and buffer them into M sub-discrete Fourier transform DFT units, where N=M*L; L-point DFT operations are performed in each of the sub-DFT units to obtain M first vectorized decomposition results. High-speed, low-latency DFT calculation is realized.

附图说明Description of drawings

图1为本申请实施例提供的一种向量化分解方法的流程示意图;1 is a schematic flowchart of a vectorized decomposition method provided by an embodiment of the present application;

图2为本申请实施例提供的一种为NR中一个DFT-s-OFDM符号的生成流程示意图;FIG. 2 is a schematic flowchart of a generation process of a DFT-s-OFDM symbol in NR provided by an embodiment of the present application;

图3a为本申请实施例提供的一种向量化分解装置的结构示意图;FIG. 3a is a schematic structural diagram of a vectorization decomposition apparatus provided by an embodiment of the present application;

图3b为本申请实施例提供的另一种向量化分解装置的结构示意图;3b is a schematic structural diagram of another vectorized decomposition apparatus provided by an embodiment of the present application;

图4为本申请实施例提供的一种720点DFT运算的示意图;4 is a schematic diagram of a 720-point DFT operation provided by an embodiment of the present application;

图5为本申请实施例提供的一种向量化分解方案的进数流程示意图;5 is a schematic diagram of an incoming number flow diagram of a vectorized decomposition scheme provided by an embodiment of the present application;

图6为本申请实施例提供的另一种向量化分解装置的结构示意图;FIG. 6 is a schematic structural diagram of another vectorization decomposition apparatus provided by an embodiment of the present application;

图7为本申请实施例提供的又一种向量化分解装置的结构示意图。FIG. 7 is a schematic structural diagram of another vectorization decomposition apparatus provided by an embodiment of the present application.

具体实施方式Detailed ways

为了更好的理解本申请的技术方案,下面结合附图对本申请实施例进行详细描述。In order to better understand the technical solutions of the present application, the embodiments of the present application are described in detail below with reference to the accompanying drawings.

应当明确,所描述的实施例仅仅是本申请的一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其它实施例,都属于本申请保护的范围。It should be clear that the described embodiments are only a part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative work fall within the protection scope of the present application.

在本申请实施例中使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本申请。在本申请实施例和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。The terms used in the embodiments of the present application are only for the purpose of describing specific embodiments, and are not intended to limit the present application. As used in the embodiments of this application and the appended claims, the singular forms "a," "the," and "the" are intended to include the plural forms as well, unless the context clearly dictates otherwise.

本申请实施例提供一种向量化分解方案,输入N点数据到存储器;从存储器分别读取L点数据,缓存到M个子离散傅里叶变换DFT单元,其中,N=M*L;以及在M个子DFT单元中的每个子DFT单元中进行L点DFT运算,得到M个第一向量化分解结果。实现了高速、低时延的DFT计算。The embodiment of the present application provides a vectorized decomposition scheme, inputting N point data into a memory; respectively reading L point data from the memory and buffering them into M sub-discrete Fourier transform DFT units, where N=M*L; L-point DFT operations are performed in each of the M sub-DFT units to obtain M first vectorized decomposition results. High-speed, low-latency DFT calculation is realized.

请参见图1,为本申请实施例提供的一种向量化分解方法的流程示意图,该方法可以包括以下步骤:Please refer to FIG. 1, which is a schematic flowchart of a vectorized decomposition method provided by an embodiment of the present application, and the method may include the following steps:

S101、输入N点数据到存储器。S101. Input N point data into a memory.

在获取到N点数据后,将N点数据输入到存储器。该存储器例如可以是随机存取存储器(random access memory,RAM)。After the N-point data is acquired, the N-point data is input into the memory. The memory may be, for example, random access memory (RAM).

S102、从存储器分别读取L点数据,缓存到M个子DFT单元,其中,N=M*L。S102: Read L point data from the memory respectively, and buffer them into M sub-DFT units, where N=M*L.

如图2所示,为NR中一个DFT-s-OFDM符号的生成流程示意图。在图2中,对于正交频分复用(orthogonal frequency division multiplexing,OFDM)波形,调制符号直接进行子载波映射,子载波映射后的向量经IFFT和添加循环前缀(cyclic prefix,CP)后形成一个OFDM符号。而对于DFT-s-OFDM波形,调制符号首先进行DFT,而后进行子载波映射,子载波映射后的向量经IFFT和添加CP后形成一个DFT-s-OFDM符号。As shown in FIG. 2 , it is a schematic diagram of the generation process of a DFT-s-OFDM symbol in NR. In Figure 2, for orthogonal frequency division multiplexing (OFDM) waveforms, modulation symbols are directly mapped to subcarriers, and the subcarrier mapped vectors are formed by IFFT and adding a cyclic prefix (CP) an OFDM symbol. For the DFT-s-OFDM waveform, the modulation symbol is first subjected to DFT, and then sub-carrier mapping is performed. The sub-carrier mapped vector is subjected to IFFT and CP is added to form a DFT-s-OFDM symbol.

如图2所示,相对于OFDM波形,DFT-s-OFDM波形在发送设备(又称“发射机”)包含一次额外的DFT操作。对应的,在接收设备(又称“接收机”)需要进行一次额外的离散傅里叶逆变换(inverse discrete Fourier transformation,IDFT)操作。As shown in Figure 2, the DFT-s-OFDM waveform contains an additional DFT operation at the transmitting device (aka "transmitter") relative to the OFDM waveform. Correspondingly, an additional inverse discrete Fourier transformation (IDFT) operation needs to be performed at the receiving device (also called a "receiver").

因此,在5G系统中需要完成DFT变换,并且需要采用6点~4096点FFT/IFFT处理器来实现信号的OFDM变换及反变换。对于发射设备和接收设备来说,DFT处理器相当关键,其需要满足5G技术的超低时延特性,而且资源消耗需要尽可能的少,本实施例的DFT处理器可以完成傅里叶变换并且兼容完成6点~4096点FFT/IFFT变换,实现起来具有很高的挑战性。Therefore, DFT transformation needs to be completed in the 5G system, and a 6-point to 4096-point FFT/IFFT processor needs to be used to realize the OFDM transformation and inverse transformation of the signal. For the transmitting device and the receiving device, the DFT processor is very critical. It needs to meet the ultra-low delay characteristics of the 5G technology, and the resource consumption needs to be as little as possible. The DFT processor in this embodiment can complete the Fourier transform and Compatible with the completion of 6-point to 4096-point FFT/IFFT transformation, which is very challenging to implement.

本实施例应用于N点数据的向量化分解,假设该N点DFT都含有因子M,故可以将N点DFT分成M个相同长度部分。M个相同长度部分对应M个子DFT单元(sub_DFT)。该M个子DFT单元具有相同的数据流形式。因此,从存储器分别读取L点数据,缓存到M个子DFT单元,其中,N=M*L。This embodiment is applied to the vectorized decomposition of N-point data, and it is assumed that the N-point DFT contains a factor M, so the N-point DFT can be divided into M parts of the same length. M parts of the same length correspond to M sub-DFT units (sub_DFT). The M sub-DFT units have the same data stream form. Therefore, L points of data are respectively read from the memory and buffered into M sub-DFT units, where N=M*L.

例如,该N点DFT都含有因子6,故可以将DFT的长度分成6个相同长度部分。这6个相同长度部分对应6个子DFT单元。该6个子DFT单元具有相同的数据流形式。For example, the N-point DFT contains a factor of 6, so the length of the DFT can be divided into 6 parts of the same length. The 6 parts of the same length correspond to 6 sub-DFT units. The 6 sub-DFT units have the same data stream form.

又例如,N点数据的FFT/IFFT都含有因子8,故可以将DFT的长度分成8个相同长度部分。这8个相同长度部分对应8个子DFT单元。该8个子DFT单元具有相同的数据流形式。For another example, the FFT/IFFT of N-point data both contain a factor of 8, so the length of the DFT can be divided into 8 parts of the same length. These 8 parts of the same length correspond to 8 sub-DFT units. The 8 sub-DFT units have the same data stream form.

如图3a所示,为本申请实施例提供的一种向量化分解装置的结构示意图,DFT的长度分成8个相同长度部分。这8个相同长度部分对应8个子DFT单元(sub_DFT0~sub_DFT7)。该8个子DFT单元具有相同的数据流形式。在进行向量化分解前,数据按照从上到下、从左到右的顺序依次输入到RAM中。在进行向量化分解时,从RAM分别读取相同点数的数据缓存到该8个子DFT单元中。As shown in FIG. 3a , which is a schematic structural diagram of a vectorization decomposition apparatus provided by an embodiment of the present application, the length of the DFT is divided into 8 parts of the same length. The eight parts of the same length correspond to eight sub-DFT units (sub_DFT0 to sub_DFT7). The 8 sub-DFT units have the same data stream form. Before the vectorization decomposition, the data is input into RAM sequentially from top to bottom and from left to right. When performing vectorization decomposition, the data of the same number of points is read from RAM and buffered into the 8 sub-DFT units.

如图3b所示,为本申请实施例提供的另一种向量化分解装置的结构示意图,DFT的长度分成6个相同长度部分。这6个相同长度部分对应6个子DFT单元(sub_DFT0~sub_DFT5)。该6个子DFT单元具有相同的数据流形式。在进行向量化分解前,数据按照从上到下、从左到右的顺序依次输入到RAM中。在进行向量化分解时,从RAM分别读取相同点数的数据缓存到该6个子DFT单元。As shown in FIG. 3b , which is a schematic structural diagram of another vectorization decomposition apparatus provided by an embodiment of the present application, the length of the DFT is divided into 6 parts of the same length. The six parts of the same length correspond to six sub-DFT units (sub_DFT0 to sub_DFT5). The 6 sub-DFT units have the same data stream form. Before the vectorization decomposition, the data is input into RAM sequentially from top to bottom and from left to right. When performing vectorization decomposition, the data of the same number of points is read from RAM and buffered into the 6 sub-DFT units.

如图3a所示,一共有8个相同结构的子DFT单元。它们同时工作,就是DFT的向量化处理。如图3b所示,一共有6个相同结构的子DFT单元。它们同时工作,就是DFT的向量化处理。As shown in Figure 3a, there are a total of 8 sub-DFT units with the same structure. They work at the same time, which is the vectorization of DFT. As shown in Figure 3b, there are a total of 6 sub-DFT units with the same structure. They work at the same time, which is the vectorization of DFT.

每个子DFT单元工作在串型模式。将上述从每个存储器读取的L点数据分别输入每个子DFT单元。Each sub-DFT unit operates in serial mode. The above-mentioned L point data read from each memory is input to each sub-DFT unit, respectively.

S103、在M个子DFT单元中的每个子DFT单元中进行L点DFT运算,得到M个第一向量化分解结果。S103. Perform L-point DFT operations in each of the M sub-DFT units to obtain M first vectorized decomposition results.

每个子DFT单元获取L点数据后,在该子DFT单元中进行L点DFT运算,得到M个第一向量化分解结果。After each sub-DFT unit acquires L-point data, L-point DFT operations are performed in the sub-DFT unit to obtain M first vectorized decomposition results.

M个子DFT单元并行分别进行L点DFT运算。The M sub-DFT units perform L-point DFT operations respectively in parallel.

经过一轮或多轮运算,每一个缓存器对应的DFT点数计算完毕。After one or more rounds of operations, the DFT points corresponding to each buffer are calculated.

进一步地,该方法还可以包括以下步骤(图中以虚线表示):Further, the method may also include the following steps (represented by dotted lines in the figure):

S104、将M个第一向量化分解结果分别乘以旋转因子,得到M个第二向量化分解结果。S104: Multiply the M first vectorized decomposition results by the twiddle factors to obtain M second vectorized decomposition results.

在获得每个子DFT单元的第一向量化分解结果后,可以将第一向量化分解结果分别乘以旋转因子,得到M个第二向量化分解结果。其中,旋转因子又称为N点DFT运算的单位根。如图3a所示,在获得8个子DFT单元的8个第一向量化分解结果后,可以将8个第一向量化分解结果分别乘以旋转因子,得到8个第二向量化分解结果。如图3b所示,在获得6个子DFT单元的6个第一向量化分解结果后,可以将6个第一向量化分解结果分别乘以旋转因子,得到6个第二向量化分解结果。After obtaining the first vectorized decomposition result of each sub-DFT unit, the first vectorized decomposition result may be multiplied by a twiddle factor to obtain M second vectorized decomposition results. Among them, the twiddle factor is also called the unit root of the N-point DFT operation. As shown in Fig. 3a, after obtaining the 8 first vectorized decomposition results of the 8 sub-DFT units, the 8 first vectorized decomposition results can be multiplied by a twiddle factor respectively to obtain 8 second vectorized decomposition results. As shown in Fig. 3b, after obtaining the six first vectorized decomposition results of the six sub-DFT units, the six first vectorized decomposition results can be multiplied by a twiddle factor respectively to obtain six second vectorized decomposition results.

如图3a和图3b所示,在获得第二向量化分解结果后,择一执行下述步骤S105a或S105b。因此,在步骤S104之后,可以择一执行下述步骤S105a或S105b:As shown in FIG. 3a and FIG. 3b , after obtaining the second vectorized decomposition result, one of the following steps S105a or S105b is executed. Therefore, after step S104, the following steps S105a or S105b can be executed alternatively:

S105a、将M个第二向量化分解结果进行基6蝶形运算,得到第三向量化分解结果,并将第三向量化分解结果存储到存储器。S105a: Perform a radix-6 butterfly operation on the M second vectorized decomposition results to obtain a third vectorized decomposition result, and store the third vectorized decomposition result in a memory.

在最后一轮运算的时候可以将运算结果直接输入到基6蝶形运算结(rdx6)中,减少数据的回写和抽取动作。显然,rdx6都是并行运算结构,其运算结果直接回写存储器,得最终运算结果。In the last round of operation, the operation result can be directly input into the radix 6 butterfly operation node (rdx6), which reduces the write-back and extraction of data. Obviously, rdx6 is a parallel operation structure, and the operation result is directly written back to the memory to obtain the final operation result.

例如,对于6点~4096点FFT/IFFT,最后一轮运算将进行因子6的rdx6运算;对于DFT(非2的幂次方)变换,其最后一轮可以为rdx6的运算。For example, for a 6-point to 4096-point FFT/IFFT, the last round of operation will be an rdx6 operation with a factor of 6; for a DFT (not a power of 2) transformation, the last round of operation can be an operation of rdx6.

经过上述向量化分解,硬件结构将规律化,数据无需多次打包。最后一轮运算的rdx6将实现资源复用。After the above vectorized decomposition, the hardware structure will be regularized, and the data does not need to be packaged multiple times. The rdx6 of the last round of operation will realize resource multiplexing.

示例性地,步骤S105a也可以替换为:将M个第二向量化分解结果进行基8蝶形运算,得到第四向量化分解结果,并将第四向量化分解结果存储到存储器。Exemplarily, step S105a may also be replaced by: performing a radix-8 butterfly operation on the M second vectorization decomposition results to obtain a fourth vectorization decomposition result, and storing the fourth vectorization decomposition result in a memory.

在最后一轮运算的时候可以将运算结果直接输入到基8蝶形运算结(rdx8)中,减少数据的回写和抽取动作。显然,rdx8都是并行运算结构,其运算结果直接回写存储器,得最终运算结果。In the last round of operation, the operation result can be directly input into the radix 8 butterfly operation node (rdx8), which reduces the write-back and extraction of data. Obviously, rdx8 is a parallel operation structure, and the operation result is directly written back to the memory to obtain the final operation result.

例如,对于6点~4096点FFT/IFFT,最后一轮运算将进行因子8的rdx8运算;对于DFT变换,其最后一轮可以为rdx8的运算。For example, for a 6-point to 4096-point FFT/IFFT, the last round of operation will be an rdx8 operation with a factor of 8; for a DFT transform, the last round of operation can be an operation of rdx8.

经过上述向量化分解,硬件结构将规律化,数据无需多次打包。最后一轮运算的rdx8将实现资源复用。After the above vectorized decomposition, the hardware structure will be regularized, and the data does not need to be packaged multiple times. The rdx8 of the last round of operation will realize resource multiplexing.

S105b、将M个第二向量化分解结果存储到存储器。S105b: Store the M second vectorization decomposition results in the memory.

如图3a或图3b所示,如果无需进行rdx8或者rdx6运算,则可以在获得第二向量化分解结果后,直接进行数据回写,将第二向量化分解结果存储到每个子DFT单元对应的存储器。As shown in Figure 3a or Figure 3b, if the rdx8 or rdx6 operation is not required, after obtaining the second vectorized decomposition result, the data can be written back directly, and the second vectorized decomposition result can be stored in the corresponding sub-DFT unit. memory.

其中,执行步骤S105a或S105b后,选通一路将向量化分解结果回写到存储器。Wherein, after step S105a or S105b is performed, one channel is selected to write back the vectorized decomposition result to the memory.

下面进行举例说明:An example is given below:

在一个示例中,N=24。对于24点DFT,其可以分解成3*8和4*6。这里以分解成4*6来简要说明DFT向量化分解运算过程:同时读取6个RAM,将读取的数据分别串行送入sub_DFT0,sub_DFT1,sub_DFT2,sub_DFT3,sub_DFT4,sub_DFT5。这6个子DFT单元分别串行处理4点DFT运算,运算结果乘以旋转因子后同时送入rdx6,其并行运算结果同时回写6个RAM。采用该方法,最大限度地减少了系统链路的处理时间,节约了资源消耗。In one example, N=24. For a 24-point DFT, it can be decomposed into 3*8 and 4*6. Here is a brief description of the DFT vectorization decomposition operation process by decomposing into 4*6: read 6 RAMs at the same time, and serially send the read data to sub_DFT0, sub_DFT1, sub_DFT2, sub_DFT3, sub_DFT4, sub_DFT5. The 6 sub-DFT units process 4-point DFT operations serially respectively. The operation result is multiplied by the twiddle factor and then sent to rdx6 at the same time, and the parallel operation result is written back to 6 RAMs at the same time. By adopting this method, the processing time of the system link is minimized and resource consumption is saved.

在另一个示例中,N=720。如图4所示,为本申请实施例提供的一种720点DFT运算的示意图,对于720点DFT,其可以分解成9*10*8。其DFT向量化分解运算过程如下:首先,每一个RAM都要先串行完成90点DFT运算。具体地,先每隔10个点串行抽取9个数做9点DFT,共做10个9点DFT,计算结果串行乘以旋转因子并回写存储器(如图中所示的RAM0~RAM7)。然后,顺序串行抽取10点数据做10点DFT,共做9个10点,计算结果乘以旋转因子并送入原始的(primitive)rdx模块做8点并行DFT运算,运算结果并行回写8片RAM(如图中所示的RAM0~RAM7)。In another example, N=720. As shown in FIG. 4 , which is a schematic diagram of a 720-point DFT operation provided by an embodiment of the present application, for a 720-point DFT, it can be decomposed into 9*10*8. Its DFT vectorization decomposition operation process is as follows: First, each RAM must complete the 90-point DFT operation serially. Specifically, firstly extract 9 numbers every 10 points to do 9-point DFT, and do 10 9-point DFT in total, multiply the calculation result by the twiddle factor serially and write back to the memory (RAM0~RAM7 as shown in the figure). ). Then, sequentially and serially extract 10 points of data to do 10-point DFT, and do 9 10-point DFTs in total. The calculation results are multiplied by the twiddle factor and sent to the original (primitive) rdx module for 8-point parallel DFT operation, and the operation results are written back in parallel 8 On-chip RAM (RAM0 to RAM7 as shown in the figure).

在又一个示例中,N=720。对于720点DFT,其可以分解成6*10*12。其DFT向量化分解运算过程如下:首先,每一个RAM都要先串行完成120点DFT运算。具体地,先每隔10个点串行抽取12个数做12点DFT,共做10个12点DFT,计算结果串行乘以旋转因子并回写存储器。然后,顺序串行抽取12点数据做12点DFT,共做9个10点,计算结果乘以旋转因子并送入primitive rdx模块做6点并行DFT运算,运算结果并行回写6片RAM。In yet another example, N=720. For a 720-point DFT, it can be decomposed into 6*10*12. Its DFT vectorization decomposition operation process is as follows: First, each RAM must first complete the 120-point DFT operation serially. Specifically, 12 numbers are serially extracted every 10 points to do 12-point DFT, and 10 12-point DFTs are performed in total. The calculation result is serially multiplied by the twiddle factor and written back to the memory. Then, sequentially and serially extract 12-point data to do 12-point DFT, and do 9 10-point DFT in total. The calculation result is multiplied by the twiddle factor and sent to the primitive rdx module for 6-point parallel DFT operation, and the operation result is written back to 6 slices of RAM in parallel.

如图5所示,为本申请实施例提供的一种向量化分解方案的进数流程示意图。假设DFT的长度分成8个相同长度部分,8个相同长度部分对应8个RAM(RAM0~RAM7)。在进行向量化分解前,数据按照从上到下、从左到右的顺序依次输入到8个RAM中。在上述示例中,在顺序串行抽取数据时,按照上述数据输入的顺序,顺序串行抽取数据。As shown in FIG. 5 , it is a schematic diagram of an incoming flow diagram of a vectorized decomposition solution provided by an embodiment of the present application. It is assumed that the length of the DFT is divided into 8 parts of the same length, and the 8 parts of the same length correspond to 8 RAMs (RAM0 to RAM7). Before the vectorized decomposition, the data is input into 8 RAMs in order from top to bottom and left to right. In the above example, when the data is sequentially and serially extracted, the data is sequentially and serially extracted according to the order in which the data is input.

上面24点DFT和720点DFT的处理流程仅为示例,其它点数的处理流程与此类似。The processing flow of the above 24-point DFT and 720-point DFT is just an example, and the processing flow of other points is similar.

如下表1所示,为示例的一些DFT点数的分解:As shown in Table 1 below, the decomposition of some DFT points for the example:

表1Table 1

Figure BDA0003708563620000061
Figure BDA0003708563620000061

Figure BDA0003708563620000071
Figure BDA0003708563620000071

可以看出,上述DFT进行点数分解后,都含有因子6或8。67种点数,最小支持点数6。因此,可以采用上述类似的DFT向量分解结构进行向量分解。It can be seen that the above DFT contains a factor of 6 or 8 after the point decomposition. There are 67 types of points, and the minimum number of supported points is 6. Therefore, a similar DFT vector decomposition structure described above can be used for vector decomposition.

根据本申请实施例提供的一种向量化分解方法,输入N点数据到存储器;从存储器分别读取L点数据,缓存到M个缓存器,其中,N=M*L;将从M个缓存器中的每个缓存器读取的L点数据分别输入与每个缓存器对应的子DFT单元;以及在每个子DFT单元中进行L点DFT运算,得到M个第一向量化分解结果。实现了高速、低时延的DFT计算。According to a vectorization decomposition method provided by an embodiment of the present application, input data of N points into a memory; read data of L points from the memory, respectively, and buffer them into M buffers, where N=M*L; The L-point data read by each buffer in the buffer is respectively input to the sub-DFT unit corresponding to each buffer; and L-point DFT operation is performed in each sub-DFT unit to obtain M first vectorized decomposition results. High-speed, low-latency DFT calculation is realized.

可以理解的是,为了实现上述实施例中的功能,向量化分解装置包括了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本申请中所公开的实施例描述的各示例的单元及方法步骤,本申请能够以硬件或硬件和计算机软件相结合的形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用场景和设计约束条件。It can be understood that, in order to realize the functions in the above embodiments, the vectorization decomposition apparatus includes corresponding hardware structures and/or software modules for performing each function. Those skilled in the art should easily realize that the units and method steps of each example described in conjunction with the embodiments disclosed in the present application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a function is performed by hardware or computer software-driven hardware depends on the specific application scenarios and design constraints of the technical solution.

图6和图7为本申请实施例提供的可能的向量化分解装置的结构示意图。这些向量化分解装置可以用于实现上述方法实施例中向量化分解装置的功能,因此也能实现上述方法实施例所具备的有益效果。在本申请的实施例中,该向量化分解装置可以是电子设备,还可以是应用于电子设备的模块(如芯片、芯片模组)。示例性地,该向量化分解装置可以是通信系统中的调制解调器。FIG. 6 and FIG. 7 are schematic structural diagrams of a possible vectorization decomposition apparatus provided by an embodiment of the present application. These vectorization decomposition apparatuses can be used to implement the functions of the vectorization decomposition apparatuses in the foregoing method embodiments, and thus can also achieve the beneficial effects possessed by the foregoing method embodiments. In the embodiments of the present application, the vectorized decomposition apparatus may be an electronic device, or may be a module (eg, a chip or a chip module) applied to the electronic device. Exemplarily, the vectorization decomposition device may be a modem in a communication system.

请参见图6,为本申请实施例提供的另一种向量化分解装置的结构示意图。该装置600包括:Please refer to FIG. 6 , which is a schematic structural diagram of another vectorized decomposition apparatus provided by an embodiment of the present application. The apparatus 600 includes:

输入单元601,用于输入N点数据到存储器;Input unit 601, for inputting N point data to memory;

缓存单元602,用于从所述存储器分别读取L点数据,缓存到M个子DFT单元,其中,N=M*L;a buffering unit 602, configured to read data of L points from the memory respectively, and buffer them into M sub-DFT units, where N=M*L;

第一运算单元603,用于在所述M个子DFT单元中的每个子DFT单元中进行L点DFT运算,得到M个第一向量化分解结果。The first operation unit 603 is configured to perform L-point DFT operation in each of the M sub-DFT units to obtain M first vectorized decomposition results.

在一种可能的实现中,所述装置还包括(图中以虚线表示):In a possible implementation, the apparatus further includes (indicated by dotted lines in the figure):

第二运算单元604,用于将所述M个第一向量化分解结果分别乘以旋转因子,得到M个第二向量化分解结果。The second operation unit 604 is configured to multiply the M first vectorized decomposition results by a twiddle factor, respectively, to obtain M second vectorized decomposition results.

在另一种可能的实现中,所述装置还包括(图中以虚线表示):In another possible implementation, the apparatus further includes (indicated by dotted lines in the figure):

第三运算单元605,用于将所述M个第二向量化分解结果进行基6蝶形运算,得到第三向量化分解结果;第一存储单元606,用于将所述第三向量化分解结果存储到所述存储器。The third operation unit 605 is configured to perform a radix-6 butterfly operation on the M second vectorized decomposition results to obtain a third vectorized decomposition result; the first storage unit 606 is configured to perform the third vectorized decomposition The results are stored to the memory.

在又一种可能的实现中,所述装置还包括(图中未示出):In yet another possible implementation, the apparatus further includes (not shown in the figure):

第三运算单元,用于将所述M个第二向量化分解结果进行基8蝶形运算,得到第四向量化分解结果;第二存储单元,用于将所述第四向量化分解结果存储到所述存储器。a third operation unit, configured to perform a radix-8 butterfly operation on the M second vectorized decomposition results to obtain a fourth vectorized decomposition result; a second storage unit, configured to store the fourth vectorized decomposition result to the memory.

在又一种可能的实现中,所述装置还包括(图中以虚线表示):In yet another possible implementation, the apparatus further includes (indicated by dotted lines in the figure):

第三存储单元607,用于将所述M个第二向量化分解结果存储到所述存储器。The third storage unit 607 is configured to store the M second vectorization decomposition results in the memory.

在又一种可能的实现中,所述N为24,所述M为6,所述L为4;或In yet another possible implementation, the N is 24, the M is 6, and the L is 4; or

所述N为24,所述M为8,所述L为3。The N is 24, the M is 8, and the L is 3.

在又一种可能的实现中,所述N为720,所述M为8,所述L为90,所述第一运算单元603,用于每隔10个点串行抽取9个数进行9点DFT运算,得到所述M个第一向量化分解结果。In yet another possible implementation, the N is 720, the M is 8, the L is 90, and the first operation unit 603 is configured to serially extract 9 numbers every 10 points to perform 9 Point DFT operation to obtain the M first vectorized decomposition results.

在又一种可能的实现中,所述N为720,所述M为6,所述L为120,所述第一运算单元603,用于每隔10个点串行抽取12个数进行12点DFT运算,得到所述M个第一向量化分解结果。In yet another possible implementation, the N is 720, the M is 6, the L is 120, and the first operation unit 603 is configured to serially extract 12 numbers every 10 points to perform 12 Point DFT operation to obtain the M first vectorized decomposition results.

有关上述各单元的具体实现可参考前述图1所示的实施例的描述,在此不再赘述。For the specific implementation of the above units, reference may be made to the description of the embodiment shown in FIG. 1 , which will not be repeated here.

根据本申请实施例提供的一种向量化分解装置,输入N点数据到存储器;从存储器分别读取L点数据,缓存到M个子DFT单元,其中,N=M*L;以及在M个子DFT单元中的每个子DFT单元中进行L点DFT运算,得到M个第一向量化分解结果。实现了高速、低时延的DFT计算。According to a vectorization decomposition device provided by an embodiment of the present application, N points of data are input into a memory; L points of data are respectively read from the memory, and buffered into M sub-DFT units, where N=M*L; L-point DFT operations are performed in each sub-DFT unit in the unit to obtain M first vectorized decomposition results. High-speed, low-latency DFT calculation is realized.

请参见图7,为本申请实施例提供的又一种向量化分解装置的结构示意图。该装置700至少包括处理器701、输入设备702、输出设备703以及计算机存储介质704。其中,装置内的处理器701、输入设备702、输出设备703以及计算机存储介质704可通过总线或其他方式连接。Please refer to FIG. 7 , which is a schematic structural diagram of another vectorized decomposition apparatus provided by an embodiment of the present application. The apparatus 700 includes at least a processor 701 , an input device 702 , an output device 703 and a computer storage medium 704 . The processor 701 , the input device 702 , the output device 703 and the computer storage medium 704 in the apparatus may be connected through a bus or other means.

计算机存储介质704可以存储在装置的存储器中,所述计算机存储介质704用于存储计算机程序,所述计算机程序包括程序指令,所述处理器701用于执行所述计算机存储介质704存储的程序指令。处理器701是装置的计算核心以及控制核心,其适于实现一条或多条指令,具体适于加载并执行一条或多条指令从而实现相应方法流程或相应功能。A computer storage medium 704 may be stored in the memory of the device, the computer storage medium 704 for storing a computer program including program instructions, and the processor 701 for executing the program instructions stored in the computer storage medium 704 . The processor 701 is the computing core and the control core of the device, which is suitable for implementing one or more instructions, and specifically suitable for loading and executing one or more instructions to implement corresponding method processes or corresponding functions.

在一个实施例中,本申请实施例所述的处理器701可以用于加载并执行如图1所示实施例中的方法步骤。In one embodiment, the processor 701 described in this embodiment of the present application may be configured to load and execute the method steps in the embodiment shown in FIG. 1 .

具体地,所述处理器701执行所述计算机程序时实现:Specifically, when the processor 701 executes the computer program, it realizes:

输入N点数据到存储器;Input N point data to the memory;

从所述存储器分别读取L点数据,缓存到M个子DFT单元,其中,N=M*L;L points of data are respectively read from the memory and buffered into M sub-DFT units, where N=M*L;

在所述M个子DFT单元中的每个子DFT单元中进行L点DFT运算,得到M个第一向量化分解结果。L-point DFT operations are performed in each of the M sub-DFT units to obtain M first vectorized decomposition results.

在一种可能的实现中,所述处理器701还用于实现:In a possible implementation, the processor 701 is further configured to implement:

将所述M个第一向量化分解结果分别乘以旋转因子,得到M个第二向量化分解结果。The M first vectorized decomposition results are respectively multiplied by a twiddle factor to obtain M second vectorized decomposition results.

在另一种可能的实现中,所述处理器701还用于实现:In another possible implementation, the processor 701 is further configured to implement:

将所述M个第二向量化分解结果进行基6蝶形运算,得到第三向量化分解结果,并将所述第三向量化分解结果存储到所述存储器。Perform a radix-6 butterfly operation on the M second vectorized decomposition results to obtain a third vectorized decomposition result, and store the third vectorized decomposition result in the memory.

在又一种可能的实现中,所述处理器701还用于实现:In yet another possible implementation, the processor 701 is further configured to implement:

将所述M个第二向量化分解结果进行基8蝶形运算,得到第四向量化分解结果,并将所述第四向量化分解结果存储到所述存储器。Perform a radix-8 butterfly operation on the M second vectorized decomposition results to obtain a fourth vectorized decomposition result, and store the fourth vectorized decomposition result in the memory.

在又一种可能的实现中,所述处理器701还用于实现:In yet another possible implementation, the processor 701 is further configured to implement:

将所述M个第二向量化分解结果存储到所述每个存储器。The M second vectorized decomposition results are stored to each of the memories.

在又一种可能的实现中,所述N为24,所述M为6,所述L为4;或In yet another possible implementation, the N is 24, the M is 6, and the L is 4; or

所述N为24,所述M为8,所述L为3。The N is 24, the M is 8, and the L is 3.

在又一种可能的实现中,所述N为720,所述M为8,所述L为90,所述处理器701执行所述在所述每个子DFT单元中进行L点DFT运算,得到M个第一向量化分解结果的步骤,包括:In yet another possible implementation, the N is 720, the M is 8, and the L is 90, and the processor 701 performs the L-point DFT operation in each sub-DFT unit to obtain The M steps of the first vectorized decomposition result include:

每隔10个点串行抽取9个数进行9点DFT运算,得到所述M个第一向量化分解结果。9 numbers are serially extracted every 10 points to perform a 9-point DFT operation to obtain the M first vectorized decomposition results.

在又一种可能的实现中,所述N为720,所述M为6,所述L为120,所述处理器701执行所述在所述每个子DFT单元中进行L点DFT运算,得到M个第一向量化分解结果的步骤,包括:In yet another possible implementation, the N is 720, the M is 6, and the L is 120, and the processor 701 performs the L-point DFT operation in each sub-DFT unit to obtain The M steps of the first vectorized decomposition result include:

每隔10个点串行抽取12个数进行12点DFT运算,得到所述M个第一向量化分解结果。Serially extract 12 numbers every 10 points to perform a 12-point DFT operation to obtain the M first vectorized decomposition results.

根据本申请实施例提供的一种向量化分解装置,输入N点数据到存储器;从存储器分别读取L点数据,缓存到M个子DFT单元,其中,N=M*L;以及在M个子DFT单元中的每个子DFT单元中进行L点DFT运算,得到M个第一向量化分解结果。实现了高速、低时延的DFT计算。According to a vectorization decomposition device provided by an embodiment of the present application, N points of data are input into a memory; L points of data are respectively read from the memory, and buffered into M sub-DFT units, where N=M*L; L-point DFT operations are performed in each sub-DFT unit in the unit to obtain M first vectorized decomposition results. High-speed, low-latency DFT calculation is realized.

需要说明的是,以上单元或单元的一个或多个可以软件、硬件或二者结合来实现。当以上任一单元或单元以软件实现的时候,所述软件以计算机程序指令的方式存在,并被存储在存储器中,处理器可以用于执行所述程序指令并实现以上方法流程。该处理器可以内置于片上系统(system on chip,SoC)或专用集成电路(application specificintegrated circuit,ASIC),也可是一个独立的半导体芯片。该处理器内处理用于执行软件指令以进行运算或处理的核外,还可进一步包括必要的硬件加速器,如现场可编程门阵列(field programmable gate array,FPGA)、可编程逻辑器件(programmable logicdevice,PLD)、或者实现专用逻辑运算的逻辑电路。It should be noted that, one or more of the above units or units may be implemented by software, hardware or a combination of both. When any of the above units or units are implemented in software, the software exists in the form of computer program instructions and is stored in the memory, and the processor can be used to execute the program instructions and implement the above method flow. The processor may be built in a system on chip (system on chip, SoC) or an application specific integrated circuit (application specific integrated circuit, ASIC), or may be an independent semiconductor chip. The internal processing of the processor is used for executing software instructions to perform operations or processing, and may further include necessary hardware accelerators, such as field programmable gate array (FPGA), programmable logic device (programmable logic device). , PLD), or a logic circuit that implements dedicated logic operations.

当以上单元或单元以硬件实现的时候,该硬件可以是CPU、微处理器、数字信号处理(digital signal processing,DSP)芯片、微控制单元(microcontroller unit,MCU)、人工智能处理器、ASIC、SoC、FPGA、PLD、专用数字电路、硬件加速器或非集成的分立器件中的任一个或任一组合,其可以运行必要的软件或不依赖于软件以执行以上方法流程。When the above units or units are implemented in hardware, the hardware may be a CPU, a microprocessor, a digital signal processing (DSP) chip, a microcontroller unit (MCU), an artificial intelligence processor, an ASIC, Any or any combination of SoCs, FPGAs, PLDs, dedicated digital circuits, hardware accelerators, or non-integrated discrete devices that may or may not run the necessary software to perform the above method flows.

关于上述实施例中描述的各个装置、产品包含的各个模块/单元,其可以是软件模块/单元,也可以是硬件模块/单元;或者也可以部分是软件模块/单元,部分是硬件模块/单元。例如,对于应用于或集成于芯片的各个装置、产品,其包含的各个模块/单元可以都采用电路等硬件的方式实现,或者,至少部分模块/单元可以采用软件程序的方式实现,该软件程序运行于芯片内部集成的处理器,剩余的(如果有)部分模块/单元可以采用电路等硬件方式实现;对于应于或集成于芯片模组的各个装置、产品,其包含的各个模块/单元可以都采用电路等硬件的方式实现,不同的模块/单元可以位于芯片模组的同一组件(例如芯片、电路模块等)或者不同组件中,或者,至少部分模块/单元可以采用软件程序的方式实现,该软件程运行于芯片模组内部集成的处理器,剩余的(如果有)部分模块/单元可以采用电路等硬件方式实现;对于应用于或集成于终端的各个装置、产品,其包含的各个模块/单元可以都采用电路等硬件的方式实现,不同的模块/单元可以位于终端内同一组件(例如,芯片、电路模块等)或者不同组件中,或者,至少部分模块/单元可以采用软件程序的方式实现,该软件程序运行于终端内部集成的处理器,剩余的(如果有)部分模块/单元可以采用电路等硬件方式实现。Regarding each module/unit included in each device and product described in the above-mentioned embodiments, it may be a software module/unit or a hardware module/unit; or may be partly a software module/unit and partly a hardware module/unit . For example, for each device or product applied to or integrated in a chip, each module/unit included therein may be implemented by hardware such as circuits, or at least some of the modules/units may be implemented by a software program. Running on the processor integrated inside the chip, the remaining (if any) part of the modules/units can be implemented by hardware such as circuits; for each device and product corresponding to or integrated in the chip module, the modules/units contained therein can be They are all implemented by hardware such as circuits, and different modules/units can be located in the same component of the chip module (such as chips, circuit modules, etc.) or in different components, or at least some of the modules/units can be implemented by software programs. The software program runs on the processor integrated inside the chip module, and the remaining (if any) part of the modules/units can be implemented by hardware such as circuits; for each device and product applied to or integrated in the terminal, each module contained in it The units/units may all be implemented in hardware such as circuits, and different modules/units may be located in the same component (eg, chip, circuit module, etc.) or in different components in the terminal, or at least some of the modules/units may be implemented by software programs Realization, the software program runs on the processor integrated inside the terminal, and the remaining (if any) part of the modules/units can be implemented in hardware such as circuits.

本申请的实施例中的方法步骤可以通过硬件的方式来实现,也可以由处理器执行软件指令的方式来实现。软件指令可以由相应的软件模块组成,软件模块可以被存放于随机存取存储器、闪存、只读存储器、可编程只读存储器、可擦除可编程只读存储器、电可擦除可编程只读存储器、寄存器、硬盘、移动硬盘、CD-ROM或者本领域熟知的任何其它形式的存储介质中。一种示例性的存储介质耦合至处理器,从而使处理器能够从该存储介质读取信息,且可向该存储介质写入信息。当然,存储介质也可以是处理器的组成部分。处理器和存储介质可以位于ASIC中。另外,该ASIC可以位于接入网设备或终端中。当然,处理器和存储介质也可以作为分立组件存在于接入网设备或终端中。The method steps in the embodiments of the present application may be implemented in a hardware manner, or may be implemented in a manner in which a processor executes software instructions. Software instructions may be composed of corresponding software modules, and software modules may be stored in random access memory, flash memory, read-only memory, programmable read-only memory, erasable programmable read-only memory, electrically erasable programmable read-only memory memory, registers, hard disk, removable hard disk, CD-ROM or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor, such that the processor can read information from, and write information to, the storage medium. Of course, the storage medium can also be an integral part of the processor. The processor and storage medium may reside in an ASIC. Alternatively, the ASIC may be located in an access network device or terminal. Of course, the processor and the storage medium may also exist in the access network device or terminal as discrete components.

在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机程序或指令。在计算机上加载和执行所述计算机程序或指令时,全部或部分地执行本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、接入网设备、用户设备或者其它可编程装置。所述计算机程序或指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机程序或指令可以从一个网站站点、计算机、服务器或数据中心通过有线或无线方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是集成一个或多个可用介质的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,例如,软盘、硬盘、磁带;也可以是光介质,例如,数字视频光盘;还可以是半导体介质,例如,固态硬盘。In the above-mentioned embodiments, it may be implemented in whole or in part by software, hardware, firmware or any combination thereof. When implemented in software, it can be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer programs or instructions. When the computer program or instructions are loaded and executed on a computer, the processes or functions described in the embodiments of the present application are executed in whole or in part. The computer may be a general purpose computer, special purpose computer, computer network, access network equipment, user equipment, or other programmable apparatus. The computer program or instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer program or instructions may be downloaded from a website site, computer, A server or data center transmits by wire or wireless to another website site, computer, server or data center. The computer-readable storage medium may be any available media that can be accessed by a computer or a data storage device such as a server, data center, etc. that integrates one or more available media. The usable media may be magnetic media, such as floppy disks, hard disks, magnetic tapes; optical media, such as digital video discs; and semiconductor media, such as solid-state drives.

在本申请的各个实施例中,如果没有特殊说明以及逻辑冲突,不同的实施例之间的术语和/或描述具有一致性、且可以相互引用,不同的实施例中的技术特征根据其内在的逻辑关系可以组合形成新的实施例。In the various embodiments of the present application, if there is no special description or logical conflict, the terms and/or descriptions between different embodiments are consistent and can be referred to each other, and the technical features in different embodiments are based on their inherent Logical relationships can be combined to form new embodiments.

本申请中,“至少一个”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B的情况,其中A,B可以是单数或者复数。在本申请的文字描述中,字符“/”,一般表示前后关联对象是一种“或”的关系;在本申请的公式中,字符“/”,表示前后关联对象是一种“相除”的关系。In this application, "at least one" means one or more, and "plurality" means two or more. "And/or", which describes the association relationship of the associated objects, indicates that there can be three kinds of relationships, for example, A and/or B, which can indicate: the existence of A alone, the existence of A and B at the same time, and the existence of B alone, where A, B can be singular or plural. In the text description of this application, the character "/" generally indicates that the related objects are a kind of "or" relationship; in the formula of this application, the character "/" indicates that the related objects are a kind of "division" Relationship.

可以理解的是,在本申请的实施例中涉及的各种数字编号仅为描述方便进行的区分,并不用来限制本申请的实施例的范围。上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定。It can be understood that, the various numbers and numbers involved in the embodiments of the present application are only for the convenience of description, and are not used to limit the scope of the embodiments of the present application. The size of the sequence numbers of the above processes does not imply the sequence of execution, and the execution sequence of each process should be determined by its function and internal logic.

Claims (14)

1. A vectorization decomposition method applied to vectorization decomposition of N-point data, the method comprising:
inputting N point data to a memory;
respectively reading L point data from the memories, and caching the L point data to M sub Discrete Fourier Transform (DFT) units, wherein N is M L;
and performing L-point DFT operation in each sub DFT unit of the M sub DFT units to obtain M first vector quantization decomposition results.
2. The method of claim 1, further comprising:
and multiplying the M first vector quantization decomposition results by a twiddle factor respectively to obtain M second vector quantization decomposition results.
3. The method of claim 2, further comprising:
performing radix-6 butterfly operation on the M second vectorial decomposition results to obtain a third vectorial decomposition result;
storing the third vectorized decomposition result to the memory.
4. The method of claim 2, further comprising:
performing radix-8 butterfly operation on the M second directional quantized decomposition results to obtain a fourth directional quantized decomposition result;
storing the fourth direction quantized decomposition result to the memory.
5. The method of claim 2, further comprising:
storing the M second quantized decomposition results to the memory.
6. A vectorization decomposition apparatus applied to vectorization decomposition of N-point data, the apparatus comprising:
an input unit for inputting the N-point data to the memory;
the cache unit is used for respectively reading L point data from the memory and caching the L point data to M sub Discrete Fourier Transform (DFT) units, wherein N is M L;
and the first operation unit is used for performing L-point DFT operation in each sub-DFT unit in the M sub-DFT units to obtain M first vector quantization decomposition results.
7. The apparatus of claim 6, further comprising:
and the second operation unit is used for multiplying the M first vector quantization decomposition results by twiddle factors respectively to obtain M second vector quantization decomposition results.
8. The apparatus of claim 7, further comprising:
the third operation unit is used for carrying out radix-6 butterfly operation on the M second vectorization decomposition results to obtain a third vectorization decomposition result;
a first storage unit to store the third vectorized decomposition result to the memory.
9. The apparatus of claim 7, further comprising:
the fourth operation unit is used for carrying out radix-8 butterfly operation on the M second directional quantized decomposition results to obtain a fourth directional quantized decomposition result;
a second storage unit, configured to store the fourth directional quantized decomposition result in the memory.
10. The apparatus of claim 7, further comprising:
a third storage unit, configured to store the M second quantized decomposition results into the memory.
11. A vectorization decomposition apparatus comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method according to any of claims 1 to 5 when executing the computer program.
12. A chip for performing the method of any one of claims 1-5.
13. A chip module comprising an interface component and a chip for performing the method of any one of claims 1-5.
14. A computer-readable storage medium, in which a computer program or instructions are stored which, when executed by a vectorization decomposition apparatus, implement the method according to any one of claims 1 to 5.
CN202210712417.5A 2022-06-22 2022-06-22 Vectorized decomposition method, device, chip, chip module and storage medium Pending CN115080915A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210712417.5A CN115080915A (en) 2022-06-22 2022-06-22 Vectorized decomposition method, device, chip, chip module and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210712417.5A CN115080915A (en) 2022-06-22 2022-06-22 Vectorized decomposition method, device, chip, chip module and storage medium

Publications (1)

Publication Number Publication Date
CN115080915A true CN115080915A (en) 2022-09-20

Family

ID=83252726

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210712417.5A Pending CN115080915A (en) 2022-06-22 2022-06-22 Vectorized decomposition method, device, chip, chip module and storage medium

Country Status (1)

Country Link
CN (1) CN115080915A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101763337A (en) * 2008-12-25 2010-06-30 上海明波通信技术有限公司 N-point FFT/IFFT/IFFT/IFFT method and device
CN102004720A (en) * 2010-11-09 2011-04-06 无锡中星微电子有限公司 Variable-length fast fourier transform circuit and implementation method
CN103020015A (en) * 2012-11-30 2013-04-03 桂林卡尔曼通信技术有限公司 Realization method for fast computation of discrete Fourier transform with non-second power points
CN104268124A (en) * 2014-09-26 2015-01-07 中国人民解放军国防科学技术大学 FFT (Fast Fourier Transform) implementing device and method
CN112822139A (en) * 2021-02-04 2021-05-18 展讯半导体(成都)有限公司 Data input and data conversion method and device
CN114356409A (en) * 2021-01-29 2022-04-15 展讯半导体(成都)有限公司 Method, device and apparatus for processing zero point DFT based on modulo-6 and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101763337A (en) * 2008-12-25 2010-06-30 上海明波通信技术有限公司 N-point FFT/IFFT/IFFT/IFFT method and device
CN102004720A (en) * 2010-11-09 2011-04-06 无锡中星微电子有限公司 Variable-length fast fourier transform circuit and implementation method
CN103020015A (en) * 2012-11-30 2013-04-03 桂林卡尔曼通信技术有限公司 Realization method for fast computation of discrete Fourier transform with non-second power points
CN104268124A (en) * 2014-09-26 2015-01-07 中国人民解放军国防科学技术大学 FFT (Fast Fourier Transform) implementing device and method
CN114356409A (en) * 2021-01-29 2022-04-15 展讯半导体(成都)有限公司 Method, device and apparatus for processing zero point DFT based on modulo-6 and storage medium
CN112822139A (en) * 2021-02-04 2021-05-18 展讯半导体(成都)有限公司 Data input and data conversion method and device

Similar Documents

Publication Publication Date Title
Lee et al. Balanced binary-tree decomposition for area-efficient pipelined FFT processing
CN103970718B (en) Device and method is realized in a kind of fast Fourier transform
Xia et al. A memory-based FFT processor design with generalized efficient conflict-free address schemes
WO2017000756A1 (en) Data processing method and processor based on 3072-pointfast fourier transformation, and storage medium
WO2022100578A1 (en) Ofdm transformation method in 5g system and related product
CN102609396A (en) Discrete Fourier transform processing device and method in data rights management (DRM) system
CN101587469A (en) Rapid Fourier transform device with variable length
CN103699515A (en) FFT (fast Fourier transform) parallel processing device and FFT parallel processing method
CN102855222A (en) Method and device for mapping addresses of FFT (fast Fourier transform) of parallel branch butterfly unit
CN115033293A (en) Zero-knowledge proof hardware accelerator, generating method, electronic device and storage medium
WO2022100584A1 (en) Twice fft and ifft method, and related product
CN109844774B (en) Parallel deconvolution computing method, single-engine computing method and related products
CN115080915A (en) Vectorized decomposition method, device, chip, chip module and storage medium
CN118675632A (en) Molecular electrostatic force determination system, method, electronic device, storage medium, and product
US9268744B2 (en) Parallel bit reversal devices and methods
KR20140142927A (en) Mixed-radix pipelined fft processor and method using the same
CN113434811B (en) A 2048-point FFT processor IP core
CN115033840A (en) Modulated signal processing, apparatus and electronic equipment
CN106227698B (en) The storage of mixed base DFT/IDFT butterfly coefficient and read method and system
CN115344526A (en) Hardware acceleration method and device of data flow architecture
CN114297570A (en) A kind of FFT realization device for communication system and its realization method
CN116805027A (en) DFT multiplexing method and device, communication equipment and storage medium
Patil et al. An area efficient and low power implementation of 2048 point FFT/IFFT processor for mobile WiMAX
CN114116012B (en) Method and device for realizing vectorization of FFT code bit inversion algorithm based on shuffling operation
CN115242592B (en) Method and device for processing data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination