CN105846873A - Triangular systolic array structure QR decomposition device based on advanced iteration and decomposition method thereof - Google Patents

Triangular systolic array structure QR decomposition device based on advanced iteration and decomposition method thereof Download PDF

Info

Publication number
CN105846873A
CN105846873A CN201610173392.0A CN201610173392A CN105846873A CN 105846873 A CN105846873 A CN 105846873A CN 201610173392 A CN201610173392 A CN 201610173392A CN 105846873 A CN105846873 A CN 105846873A
Authority
CN
China
Prior art keywords
signal
module
output
multiplier
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610173392.0A
Other languages
Chinese (zh)
Other versions
CN105846873B (en
Inventor
邢座程
刘苍
原略超
唐川
张洋
王庆林
王�锋
汤先拓
危乐
吕朝
董永旺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN201610173392.0A priority Critical patent/CN105846873B/en
Publication of CN105846873A publication Critical patent/CN105846873A/en
Application granted granted Critical
Publication of CN105846873B publication Critical patent/CN105846873B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B7/00Radio transmission systems, i.e. using radiation field
    • H04B7/02Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas
    • H04B7/04Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas
    • H04B7/0413MIMO systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B7/00Radio transmission systems, i.e. using radiation field
    • H04B7/02Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas
    • H04B7/04Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas
    • H04B7/08Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the receiving station
    • H04B7/0837Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the receiving station using pre-detection combining
    • H04B7/0842Weighted combining
    • H04B7/0848Joint weighting
    • H04B7/0854Joint weighting using error minimizing algorithms, e.g. minimum mean squared error [MMSE], "cross-correlation" or matrix inversion

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Complex Calculations (AREA)

Abstract

一种基于超前迭代的三角脉动阵列结构QR分解装置及分解方法,用来对n×n的矩阵A进行QR分解,它包括对角处理模块,迭代处理模块和三角处理模块;第一个对角处理模块从外部接收到矩阵A的第一个列向量a1,结果q1和r11作为QR分解模块的输出,并将q1输出到下一步的三角处理模块,产生的rjj 2信号输出到第一步中的所有迭代处理模块;第j‑1个迭代处理模块将外部接收到矩阵A的第j个列向量aj,矩阵A的第一个列向量a1和第一个对角处理模块输出的rjj 2作为输入,得到下一次迭代矩阵A1的第j个列向量aj 1;以此类推,最后通过三角处理模块处理后得到QR分解模块的输出信号rn‑1,n。分解方法基于上述分解装置来实施。本发明具有原理简单、分解速度快、效率高等优点。

A triangular pulsation array structure QR decomposition device and decomposition method based on advanced iteration, used for QR decomposition of n×n matrix A, which includes a diagonal processing module, an iterative processing module and a triangular processing module; the first diagonal The processing module receives the first column vector a 1 of the matrix A from the outside, the results q 1 and r 11 are used as the output of the QR decomposition module, and q 1 is output to the next step of the triangular processing module, and the generated r jj 2 signal output to all iterative processing modules in the first step; the j‑1th iterative processing module will externally receive the jth column vector a j of matrix A, the first column vector a 1 of matrix A and the first diagonal The r jj 2 output by the processing module is used as input to obtain the j-th column vector a j 1 of the next iteration matrix A 1 ; and so on, and finally the output signal r n-1 of the QR decomposition module is obtained after being processed by the triangular processing module, n . The decomposition method is carried out based on the above-mentioned decomposition device. The invention has the advantages of simple principle, fast decomposition speed and high efficiency.

Description

基于超前迭代的三角脉动阵列结构QR分解装置及分解方法QR decomposition device and decomposition method of triangular pulsation array structure based on advanced iteration

技术领域technical field

本发明主要涉及到无线通信系统基带信号处理领域,特指一种基于超前迭代的三角脉动阵列结构QR分解装置及分解方法。The invention mainly relates to the field of baseband signal processing of wireless communication systems, in particular to a triangular pulsation array structure QR decomposition device and decomposition method based on advanced iteration.

背景技术Background technique

正交频分复用(OFDM,orthogonal frequency division multiplexing)技术和多输入多输出技术(MIMO,multiple input multiple output)技术因其具有高频谱利用率和高传输速率得到广泛的关注,近年来关于预编码技术的一系列研究进展使得基于MIMO-OFDM技术的多用户无线通信系统可以实现同时为多个用户服务。然而基于MIMO-OFDM技术的多用户无线通信系统基带信号处理算法计算复杂度大大增加,对基带信号处理器的设计提出了前所未有的挑战。Orthogonal frequency division multiplexing (OFDM, orthogonal frequency division multiplexing) technology and multiple input multiple output technology (MIMO, multiple input multiple output) technology have received extensive attention because of their high spectrum utilization and high transmission rate. A series of research progresses in coding technology make it possible for a multi-user wireless communication system based on MIMO-OFDM technology to serve multiple users at the same time. However, the computational complexity of baseband signal processing algorithms in multi-user wireless communication systems based on MIMO-OFDM technology is greatly increased, which poses unprecedented challenges to the design of baseband signal processors.

在基于MIMO-OFDM无线通信系统的基带信号处理链路中,预编码算法和MIMO检测算法是较为复杂的两个基带信号处理算法,近年来得到研究者的广泛关注。1983年,Costa在其经典论文“Writing on dirty paper”(“脏纸编码”)中提出的脏纸编码算法被认为是性能最好的非线性预编码算法,但是其计算复杂度特别高,几乎不可能在硬件电路上实时地执行,2005年Wei Yu等人在其论文“Trellis and Convolutional Precoding forTransmitter-Based Interference Presubtraction”(“基于网格和卷积预编码的发射机干扰预消除”)中将THP(Tomlinson-Harashima Precoding)算法用于非线性预编码并取得了较好的干扰消除效果,虽然其性能较脏纸编码算法有所降低,但是其计算复杂度大大降低,使得硬件实现非线性预编码算法成为可能,在THP算法中计算复杂度最高的部分是对信道矩阵H执行QR分解的部分,高效快速的QR分解部件有助于提高THP预编码算法整体性能。最大似然估计算法是MIMO检测所有算法中检测精度最高的算法,然而其计算复杂度相当高,因此,M.Shabany等人在“A 0.13μm CMOS 655Mb/s 4×4 64-QAM k-best MIMOdetector”(“在0.13μm CMOS工艺下使用64-QAM调制方式时655Mb/s的4×4MIMO检测器设计”)中使用最大似然估计算法的近似算法球形检测(SD)算法进行MIMO检测,取得了很好的检测效果,QR分解作为SD算法的瓶颈之一,制约着其执行速度。In the baseband signal processing chain based on MIMO-OFDM wireless communication system, precoding algorithm and MIMO detection algorithm are two relatively complex baseband signal processing algorithms, which have attracted extensive attention of researchers in recent years. In 1983, the dirty paper coding algorithm proposed by Costa in his classic paper "Writing on dirty paper" ("Dirty paper coding") is considered to be the best nonlinear precoding algorithm, but its computational complexity is extremely high, almost It is impossible to execute in real time on a hardware circuit. In 2005, Wei Yu et al. in their paper "Trellis and Convolutional Precoding for Transmitter-Based Interference Presubtraction" ("Grid and Convolutional Precoding for Transmitter-Based Interference Presubtraction") will THP (Tomlinson-Harashima Precoding) algorithm is used for nonlinear precoding and has achieved good interference elimination effect. Although its performance is lower than that of dirty paper coding algorithm, its computational complexity is greatly reduced, which makes the hardware realize nonlinear precoding. The encoding algorithm becomes possible. The part with the highest computational complexity in the THP algorithm is the part that performs QR decomposition on the channel matrix H. The efficient and fast QR decomposition part helps to improve the overall performance of the THP precoding algorithm. The maximum likelihood estimation algorithm is the algorithm with the highest detection accuracy among all MIMO detection algorithms, but its computational complexity is quite high. Therefore, M.Shabany et al. MIMOdetector" ("655Mb/s 4×4 MIMO detector design when using 64-QAM modulation mode in 0.13μm CMOS process") uses the approximate algorithm of the maximum likelihood estimation algorithm Sphere Detection (SD) algorithm for MIMO detection, obtained As a result, the QR decomposition is one of the bottlenecks of the SD algorithm, which restricts its execution speed.

由于QR分解在基于MIMO-OFDM技术的多用户基带信号处理器中得到广泛的应用,且很多情况下是制约处理速度的瓶颈,因此,在很多基带信号处理器的设计中将QR分解作为一个重要的运算部件进行优化。所谓QR分解,就是将n×n的矩阵A分解为n×n的酉矩阵Q和n×n的上三角矩阵R,当前的QR分解算法主要分为三类,分别基于Householder变换、Given旋转以及MGS(modified Gram-Schmidt)算法,由于基于Householder变换的QR分解很难用硬件实现,所以使用较少,基于Given旋转的QR分解算法虽然大大降低了所使用的硬件资源,但是其所需的执行时间较长,不符合通信系统实时性的要求,基于MGS算法的QR分解因占用硬件资源较少且执行时间较短符合通信系统的实际需求。Since QR decomposition is widely used in multi-user baseband signal processors based on MIMO-OFDM technology, and in many cases is the bottleneck restricting the processing speed, therefore, QR decomposition is an important factor in the design of many baseband signal processors. The computing unit is optimized. The so-called QR decomposition is to decompose the n×n matrix A into the n×n unitary matrix Q and the n×n upper triangular matrix R. The current QR decomposition algorithms are mainly divided into three categories, which are based on Householder transformation, Given rotation and MGS (modified Gram-Schmidt) algorithm, because the QR decomposition based on Householder transformation is difficult to implement with hardware, so it is used less, although the QR decomposition algorithm based on Given rotation greatly reduces the hardware resources used, but its required execution The time is long, which does not meet the real-time requirements of the communication system. The QR decomposition based on the MGS algorithm meets the actual needs of the communication system because it occupies less hardware resources and has a shorter execution time.

有从业者R.-H.Chang等人发表文章“Iterative QR decompositionarchitecture using the modified Gram-Schmidt algorithm for MIMO systems”(“MIMO系统中基于MGS算法的迭代QR分解结构”)提出了一种基于MGS算法的三角脉动阵列结构QR分解硬件电路,完成一个n(n为大于等于2的正整数)阶方阵的QR分解,所提出的三角脉动阵列结构QR分解电路只需2n-1个时间单元。在具体应用时,使用R.-H.Chang等人提出的三角脉动阵列结构QR分解电路对一个4×4的矩阵A进行QR分解,对于一个4×4的矩阵,使用基于MGS算法的迭代结构进行QR分解需要七步即可完成,每一步需要一个时间单元,共需要七个时间单元。由此可见,虽然R.-H.Chang等人提出的三角脉动阵列结构的QR分解方法大大降低了计算时间,但是实际通信系统的基带信号处理中希望得到速度更快的QR分解结构。且目前也仅仅只有涉及4×4矩阵的QR分解的文献,并未有公布的n×n矩阵的QR分解硬件电路。Practitioners R.-H.Chang and others published the article "Iterative QR decomposition architecture using the modified Gram-Schmidt algorithm for MIMO systems" ("Iterative QR decomposition structure based on MGS algorithm in MIMO systems") proposed an MGS-based algorithm The triangular systolic array structure QR decomposition hardware circuit in the paper can complete the QR decomposition of a square matrix of order n (n is a positive integer greater than or equal to 2), and the proposed triangular systolic array structure QR decomposition circuit only needs 2n-1 time units. In specific applications, use the triangular pulsation array structure QR decomposition circuit proposed by R.-H.Chang et al. to perform QR decomposition on a 4×4 matrix A. For a 4×4 matrix, use an iterative structure based on the MGS algorithm It takes seven steps to complete the QR decomposition, and each step requires one time unit, and a total of seven time units are needed. It can be seen that although the QR decomposition method of the triangular systolic array structure proposed by R.-H.Chang et al. greatly reduces the calculation time, it is desirable to obtain a faster QR decomposition structure in the baseband signal processing of the actual communication system. And at present, there are only literatures related to QR decomposition of 4×4 matrix, and there is no published QR decomposition hardware circuit of n×n matrix.

发明内容Contents of the invention

本发明要解决的技术问题就在于:针对现有技术存在的技术问题,本发明提供一种原理简单、易实现、分解速度快、效率高的基于超前迭代的三角脉动阵列结构QR分解装置及分解方法。The technical problem to be solved by the present invention is: aiming at the technical problems existing in the prior art, the present invention provides a triangular pulsation array structure QR decomposition device and decomposition device based on advanced iteration, which is simple in principle, easy to implement, fast in decomposition speed and high in efficiency. method.

为解决上述技术问题,本发明采用以下技术方案:In order to solve the problems of the technologies described above, the present invention adopts the following technical solutions:

一种基于超前迭代的三角脉动阵列结构QR分解装置,用来对n×n的矩阵A进行QR分解,它包括对角处理模块,迭代处理模块和三角处理模块;其中,n个对角处理模块,(n-1)+(n-2)+……+1=n×(n-1)/2个迭代处理模块,当n为偶数时,采用n/2+(n-2)+(n-4)+(n-6)+……+2=n2/4个三角处理模块,当n为奇数时,采用(n-1)+(n-3)+(n-5)+……+2=(n+1)(n-1)/4个三角处理模块;第一个对角处理模块从外部接收到矩阵A的第一个列向量a1,计算结果q1和r11作为整个QR分解模块的输出,并将q1输出到下一步的三角处理模块,在计算过程中计算产生的rjj 2信号输出到第一步中的所有迭代处理模块;第j-1个迭代处理模块将外部接收到矩阵A的第j个列向量aj,其中j大于等于2小于等于n-1,矩阵A的第一个列向量a1和第一个对角处理模块输出的rjj 2作为输入,计算得到下一次迭代矩阵A1的第j个列向量aj 1,其中a1 1作为第二个对角处理模块的输入,A1的其余列向量作为第二步迭代处理模块的输入的同时作为第三步三角处理模块的输入;以此类推,最后通过三角处理模块处理后得到QR分解模块的输出信号rn-1,。A triangular pulsation array structure QR decomposition device based on advanced iteration, used for QR decomposition of n×n matrix A, it includes a diagonal processing module, an iterative processing module and a triangular processing module; wherein, n diagonal processing modules , (n-1)+(n-2)+...+1=n×(n-1)/2 iterative processing modules, when n is an even number, use n/2+(n-2)+( n-4)+(n-6)+...+2=n2/4 triangle processing modules, when n is an odd number, use (n-1)+(n-3)+(n-5)+... ...+2=(n+1)(n-1)/4 triangle processing modules; the first diagonal processing module receives the first column vector a 1 of the matrix A from the outside, and calculates the results q 1 and r 11 As the output of the entire QR decomposition module, and q 1 is output to the next step of the triangular processing module, and the r jj 2 signal generated during the calculation is output to all iterative processing modules in the first step; the j-1th iteration The processing module will externally receive the jth column vector a j of matrix A, where j is greater than or equal to 2 and less than or equal to n-1, the first column vector a 1 of matrix A and r jj output by the first diagonal processing module 2 As input, calculate the jth column vector a j 1 of the next iteration matrix A 1 , where a 1 1 is used as the input of the second diagonal processing module, and the remaining column vectors of A 1 are used as the second step iterative processing module At the same time as the input of the triangular processing module in the third step; and so on, the output signal r n - 1 of the QR decomposition module is obtained after being processed by the triangular processing module.

作为本发明分解装置的进一步改进:第i个对角处理模块对从第i-1步输出的ai i-1信号进行计算得到QR分解模块的输出rii和qi,并计算得到了rii 2,其中qi向量作为下一步三角处理模块的输入,rii 2作为第i步中所有迭代处理模块的输入,迭代处理模块从第i-1步接收到列向量ai i-1信号和ai1 i-1信号,并从对角处理模块得到rii 2信号作为输入,处理之后得到下一次的迭代矩阵Ai的第i1个列向量,其中ai+1 i作为第i+1个对角处理模块的输入,Ai的其余列向量作为下一步迭代模块输入的同时作为第i+2步三角处理模块的输入,三角处理模块从第i-1步接收到输入信号qi-1的同时从第i-2步接收到ai2 i-2信号和ai2+1 i-2信号,处理之后得到QR分解模块的输出信号ri-1,i2信号和ri-1,i2+1As a further improvement of the decomposition device of the present invention: the i-th diagonal processing module calculates the a i i-1 signal output from the i-1 step to obtain the output r ii and q i of the QR decomposition module, and calculates r ii 2 , where the q i vector is used as the input of the triangular processing module in the next step, r ii 2 is used as the input of all iterative processing modules in the i-th step, and the iterative processing module receives the column vector a i i-1 signal from the i-1th step and a i1 i-1 signal, and get r ii 2 signal from the diagonal processing module as input, after processing, get the i1th column vector of the next iteration matrix A i , where a i+1 i is the i+1th The input of a diagonal processing module, the remaining column vectors of A i are used as the input of the next iteration module and at the same time as the input of the triangular processing module in the i+2 step, the triangular processing module receives the input signal q i- from the i-1 step 1 and receive a i2 i-2 signal and a i2+1 i-2 signal from step i-2 at the same time, after processing, the output signal r i-1, i2 signal and r i-1, i2 of the QR decomposition module are obtained +1 ;

第n个对角处理模块对从n-1步输出的an n-1信号进行处理得到QR分解模块输出信号rnn和qn,第k4个三角处理模块从n-1步接收到信号qn-1并从n-2步接收到信号an n-2,处理后得到QR分解模块的输出信号rn-1,nThe nth diagonal processing module processes the a n n-1 signal output from the n-1 step to obtain the output signals r nn and q n of the QR decomposition module, and the k4th triangular processing module receives the signal q from the n-1 step n-1 and receive the signal a n n-2 from step n-2 , and get the output signal r n-1,n of the QR decomposition module after processing.

作为本发明分解装置的进一步改进:所述对角处理模块包括乘法器、加法器、根号运算器模块及除法器,乘法器e从外部接收到输入向量aj的第e个元素,其中e大于等于1小于等于n,对其进行自乘处理后输出到加法器,加法器从乘法器1到乘法器n接收到信号,进行累加处理后输出到根号运算器模块的同时将其作为整个模块的输出信号rjj 2,根号运算器模块从加法器接收到信号之后,进行开平方处理后输出到除法器1到除法器n作为除法器1到除法器n的除数,同时作为整个模块的输出信号rjj,除法器e1从外部接收到输入向量aj的第e1个元素作为被除数,并将从根号运算器接收到的信号作为除数,其中e1大于等于1小于等于n,运算结果作为整个模块输出向量qj2的第e1个元素。As a further improvement of the decomposition device of the present invention: the diagonal processing module includes a multiplier, an adder, a root operator module and a divider, and the multiplier e receives the eth element of the input vector a j from the outside, where e Greater than or equal to 1 and less than or equal to n, it is self-multiplied and output to the adder, the adder receives the signal from multiplier 1 to multiplier n, and after cumulative processing, it is output to the root operator module and it is used as the whole The output signal r jj 2 of the module, after the root operator module receives the signal from the adder, it performs the square root processing and outputs it to the divider 1 to the divider n as the divisor of the divider 1 to the divider n, and at the same time as the whole module The output signal r jj , the divider e1 receives the e1th element of the input vector a j from the outside as the dividend, and the signal received from the root operator as the divisor, where e1 is greater than or equal to 1 and less than or equal to n, the operation result The e1th element of the vector q j2 is output as the entire module.

作为本发明分解装置的进一步改进:所述迭代处理模块包括第一共享硬件,第一共享硬件包含了一个多路选择器和乘法器到乘法器n,多路选择器为乘法器1到乘法器n选择不同的输入作为乘数,多路选择器从外部的aj3 p向量和除法器的输出信号接收到输入进行选择后输出结果到乘法器1到乘法器n,当使能信号为‘0’时,乘法器e2从多路选择器接收到的信号作为一个乘数,其中e2大于等于1小于等于n,从外部接收的aj p向量的第e2个元素作为另一个乘数,进行相乘运算后将结果输出到加法器模块,加法器模块从乘法器1到乘法器n接收到输入信号,进行累加处理之后输出到除法器模块,除法器从加法器模块接收到的信号作为被除数,从外部接收到的信号rjj 2作为除数,进行相除运算后输出到多路选择器1的输入,当使能信号为‘1’时,乘法器e2将运算结果输出到减法器e3,其中e3大于等于1小于等于n,减法器e3从乘法器e2接收到信号作为减数,从外部接收到aj3 p信号的第e3个元素作为被减数,进行相减处理后结果作为整个模块输出信号aj3 p+1向量的第e3个元素。As a further improvement of the decomposition device of the present invention: the iterative processing module includes the first shared hardware, the first shared hardware includes a multiplexer and multiplier to multiplier n, and the multiplexer is multiplier 1 to multiplier n selects a different input as a multiplier, the multiplexer receives the input from the external a j3 p vector and the output signal of the divider to select and output the result to multiplier 1 to multiplier n, when the enable signal is '0 ', the multiplier e2 receives the signal from the multiplexer as a multiplier, where e2 is greater than or equal to 1 and less than or equal to n, and the e2th element of the a j p vector received from the outside is used as another multiplier for phase After the multiplication operation, the result is output to the adder module. The adder module receives the input signal from multiplier 1 to multiplier n, performs accumulation processing and then outputs to the divider module. The signal received by the divider from the adder module is used as the dividend. The signal r jj 2 received from the outside is used as the divisor, and then output to the input of multiplexer 1 after the division operation. When the enable signal is '1', the multiplier e2 outputs the operation result to the subtractor e3, where e3 is greater than or equal to 1 and less than or equal to n, the subtractor e3 receives the signal from the multiplier e2 as the subtrahend, and receives the e3 element of the a j3 p signal from the outside as the minuend, and the result after subtraction is output as the entire module The e3th element of the signal a j3 p+1 vector.

作为本发明分解装置的进一步改进:所述三角处理模块包括第二共享硬件,多路选择器1的输入分别为aj3向量的n个元素和aj3+1向量的n个元素,当多路选择器使能信号为‘0’时,多路选择器1选通aj3向量的元素输出到乘法器1到乘法器n,多路选择器使能信号为‘1’时,多路选择器1选通aj3+1向量的元素输出到乘法器1到乘法器n,乘法器e4从多路选择器接收到的数据作为一个乘数,从外部接收到qj2向量的第e4个元素作为另一个乘数,进行相乘运算后输出到加法器,加法器从乘法器接收到信号之后进行累加运算,当多路选择器使能信号为‘0’时,累加器输出信号作为三角处理模块的输出信号rj2,j3,当多路选择器使能信号为‘1’时,累加器输出信号作为三角处理模块的输出信号rj2,j3+1As a further improvement of the decomposition device of the present invention: the triangular processing module includes the second shared hardware, and the input of the multiplexer 1 is respectively n elements of a j3 vector and n elements of a j3+1 vector, when multiplex When the selector enable signal is '0', the multiplexer 1 selects the elements of the a j3 vector and outputs to multiplier 1 to multiplier n, when the multiplexer enable signal is '1', the multiplexer 1 strobes the elements of a j3+1 vector and outputs to multiplier 1 to multiplier n, multiplier e4 receives data from the multiplexer as a multiplier, and receives the e4th element of q j2 vector from the outside as The other multiplier is multiplied and output to the adder. The adder receives the signal from the multiplier and performs accumulation operation. When the multiplexer enable signal is '0', the output signal of the accumulator is used as a triangular processing module The output signal r j2,j3 of the multiplexer, when the enable signal of the multiplexer is '1', the output signal of the accumulator is used as the output signal r j2,j3+1 of the triangular processing module.

一种基于上述分解装置的QR分解方法,其步骤为:A kind of QR decomposition method based on above-mentioned decomposition device, its steps are:

步骤S1:矩阵A的n个列向量a1,……an作为QR分解模块的输入信号,a1作为第一个对角处理模块的输入,对角处理模块的输出为r11,和q1,迭代处理模块计算下一次的迭代矩阵,其输入为a1和aj,其中1<j<n+1,j为正整数,输出为下一次的迭代矩阵aj 1Step S1: The n column vectors a 1 ,...a n of the matrix A are used as the input signal of the QR decomposition module, a 1 is used as the input of the first diagonal processing module, and the output of the diagonal processing module is r 11 , and q 1 , the iteration processing module calculates the next iteration matrix, its input is a 1 and a j , wherein 1<j<n+1, j is a positive integer, and the output is the next iteration matrix a j 1 ;

步骤S2~Sj步:j大于等于2小于n,将第j-1步输入的信号aj j-2,……,an j-2以及第j-1步输出的信号qj-1,aj j-1,……,an j-1作为第二步的输入信号,其中aj j-1作为对角处理模块的输入,用于计算rjj和qj,第k3个对角处理模块的输入信号为qj-1,aj3 j-2和aj3+1 j-2;当n-j为奇数时,j3大于等于j小于等于n-1正整数,当n-j为偶数时,j3为大于等于j小于等于n的正整数;用于计算rj-1,j3和rj-1,j3+1,与第一步类似,迭代处理模块用来计算下一次的迭代矩阵,其输入为aj j-1,……,an j-1,输出为aj+1 j,……,an j; (2)Step S2~Sj step: j is greater than or equal to 2 and less than n, the input signal a j j-2 ,..., a n j-2 of the j-1th step and the output signal q j-1 of the j-1th step, a j j-1 ,..., a n j-1 is used as the input signal of the second step, where a j j-1 is used as the input of the diagonal processing module to calculate r jj and q j , the k3th diagonal The input signals of the processing module are q j-1 , a j3 j-2 and a j3+1 j-2 ; when nj is an odd number, j3 is greater than or equal to j less than or equal to n-1 positive integer, when nj is an even number, j3 It is a positive integer greater than or equal to j and less than or equal to n; it is used to calculate r j-1, j3 and r j-1, j3+1 , similar to the first step, the iterative processing module is used to calculate the next iteration matrix, and its input is a j j-1 ,..., a n j-1 , the output is a j+1 j ,..., a n j ; (2)

步骤Sn:将第n-1步的输入an n-2以及第n-1步的输出qn-1和an n-1作为输入,其中an n-1作为block1的输入,block1的输出为rn,n和qn,qn-1和an n-2作为block3的输入,block3的输出为rn-1,nStep Sn: Take the input a n n-2 of step n-1 and the output q n-1 and a n n-1 of step n-1 as input, where a n n-1 is used as the input of block1, and the output of block1 The output is r n,n and q n , q n-1 and a n n-2 are used as the input of block3, and the output of block3 is r n-1,n .

与现有技术相比,本发明的优点在于:本发明的基于超前迭代的三角脉动阵列结构QR分解装置及分解方法,原理简单、易实现,可以显著加快QR分解的速度;对于一个n×n的进行QR分解,本发明所提结构仅需要n个时间单元即可完成,而使用R.-H.Chang等人提出的三角脉动阵列结构需要2n-1个时间单元,如对于前述的4×4的矩阵A,采用本发明进行QR分解,只需要4个时间单元即可完成,相比7个,少了3个时间单元。Compared with the prior art, the present invention has the advantages of: the triangular pulsation array structure QR decomposition device and decomposition method based on the advanced iteration of the present invention has a simple principle and is easy to implement, and can significantly accelerate the speed of QR decomposition; for an n×n For QR decomposition, the proposed structure of the present invention only needs n time units to complete, while using the triangular pulsation array structure proposed by R.-H.Chang et al. requires 2n-1 time units, such as for the aforementioned 4× The matrix A of 4 can be decomposed by using the present invention in only 4 time units, which is 3 time units less than 7.

附图说明Description of drawings

图1是本发明分解装置的拓扑结构示意图。Fig. 1 is a schematic diagram of the topological structure of the decomposition device of the present invention.

图2是本发明在具体应用实例中对角处理模块的结构原理示意图。Fig. 2 is a schematic diagram of the structure and principle of the diagonal processing module in a specific application example of the present invention.

图3是本发明在具体应用实例中迭代处理模块的结构原理示意图。Fig. 3 is a schematic diagram of the structural principle of the iterative processing module in a specific application example of the present invention.

图4是本发明在具体应用实例中三角处理模块的结构原理示意图。Fig. 4 is a schematic diagram of the structure and principle of the triangle processing module in a specific application example of the present invention.

具体实施方式detailed description

以下将结合说明书附图和具体实施例对本发明做进一步详细说明。The present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

如图1所示,本发明基于超前迭代的三角脉动阵列结构QR分解装置,用来对n×n的矩阵A进行QR分解,它包括对角处理模块,迭代处理模块和三角处理模块;其中,n个对角处理模块,(n-1)+(n-2)+……+1=n×(n-1)/2个迭代处理模块,当n为偶数时,需要n/2+(n-2)+(n-4)+(n-6)+……+2=n2/4个三角处理模块,当n为奇数时,需要(n-1)+(n-3)+(n-5)+……+2=(n+1)(n-1)/4个三角处理模块组成。As shown in Fig. 1, the triangular pulsation array structure QR decomposition device based on advanced iteration of the present invention is used for carrying out QR decomposition to the matrix A of n * n, and it comprises diagonal processing module, iterative processing module and triangular processing module; Wherein, n diagonal processing modules, (n-1)+(n-2)+...+1=n×(n-1)/2 iterative processing modules, when n is an even number, n/2+( n-2)+(n-4)+(n-6)+...+2=n2/4 triangular processing modules, when n is an odd number, (n-1)+(n-3)+( n-5)+...+2=(n+1)(n-1)/4 triangular processing modules.

第一个对角处理模块从外部接收到矩阵A的第一个列向量a1,计算结果q1和r11作为整个QR分解模块的输出,并将q1输出到下一步的三角处理模块,在计算过程中计算产生的rjj 2信号输出到第一步中的所有迭代处理模块;第j-1(j大于等于2小于等于n-1)个迭代处理模块将外部接收到矩阵A的第j个列向量aj,矩阵A的第一个列向量a1和第一个对角处理模块输出的rjj 2作为输入,计算得到下一次迭代矩阵A1的第j个列向量aj 1,其中a1 1作为第二个对角处理模块的输入,A1的其余列向量作为第二步迭代处理模块的输入的同时作为第三步三角处理模块的输入。迭代处理模块需要rjj 2信号时,第一个对角处理模块已经将rjj 2信号计算完成,所以对角处理模块和迭代处理模块可以并行执行;The first diagonal processing module receives the first column vector a 1 of the matrix A from the outside, calculates the results q 1 and r 11 as the output of the entire QR decomposition module, and outputs q 1 to the next step of the triangular processing module, In the calculation process, the r jj 2 signal generated by the calculation is output to all iterative processing modules in the first step; the j-1th (j is greater than or equal to 2 and less than or equal to n-1) iterative processing module will externally receive the th j column vector a j , the first column vector a 1 of the matrix A and r jj 2 output by the first diagonal processing module are used as input, and the jth column vector a j 1 of the next iteration matrix A 1 is calculated , where a 1 1 is used as the input of the second diagonal processing module, and the remaining column vectors of A 1 are used as the input of the second step iterative processing module and at the same time as the input of the third step triangular processing module. When the iterative processing module needs the r jj 2 signal, the first diagonal processing module has already calculated the r jj 2 signal, so the diagonal processing module and the iterative processing module can be executed in parallel;

第i个对角处理模块对从第i-1步输出的ai i-1信号进行计算得到QR分解模块的输出rii和qi,并计算得到了rii 2,其中qi向量作为下一步三角处理模块的输入,rii 2作为第i步中所有迭代处理模块的输入,迭代处理模块从第i-1步接收到列向量ai i-1信号和ai1 i-1信号,并从对角处理模块得到rii 2信号作为输入,处理之后得到下一次的迭代矩阵Ai的第i1个列向量,其中ai+1 i作为第i+1个对角处理模块的输入,Ai的其余列向量作为下一步迭代模块输入的同时作为第i+2步三角处理模块的输入,三角处理模块从第i-1步接收到输入信号qi-1的同时从第i-2步接收到ai2 i-2信号和ai2+1 i-2信号,处理之后得到QR分解模块的输出信号ri-1,i2信号和ri-1,i2+1The i-th diagonal processing module calculates the a i i-1 signal output from the i-1 step to obtain the output r ii and q i of the QR decomposition module, and calculates r ii 2 , where the q i vector is as the following The input of the one-step triangular processing module, r ii 2 is used as the input of all iterative processing modules in the i-th step, and the iterative processing module receives the column vector a i i-1 signal and a i1 i-1 signal from the i-1th step, and The r ii 2 signal is obtained from the diagonal processing module as input, and after processing, the i1th column vector of the next iterative matrix A i is obtained, where a i+1 i is used as the input of the i+1th diagonal processing module, A The rest of the column vectors of i are used as the input of the next iterative module and at the same time as the input of the triangular processing module of the i+2 step, and the triangular processing module receives the input signal q i-1 from the i-1 step and simultaneously starts Receive a i2 i-2 signal and a i2+1 i-2 signal, and obtain the output signal r i-1, i2 signal and r i-1, i2+1 of the QR decomposition module after processing;

第n个对角处理模块对从n-1步输出的an n-1信号进行处理得到QR分解模块输出信号rnn和qn,第k4个三角处理模块从n-1步接收到信号qn-1并从n-2步接收到信号an n-2,处理后得到QR分解模块的输出信号rn-1,nThe nth diagonal processing module processes the a n n-1 signal output from the n-1 step to obtain the output signals r nn and q n of the QR decomposition module, and the k4th triangular processing module receives the signal q from the n-1 step n-1 and receive the signal a n n-2 from step n-2 , and get the output signal r n-1,n of the QR decomposition module after processing.

如图2所示,在具体应用实例中,对角处理模块包括乘法器、加法器、根号运算器模块及除法器,乘法器e(e大于等于1小于等于n)从外部接收到输入向量aj的第e个元素,对其进行自乘处理后输出到加法器,加法器从乘法器1到乘法器n接收到信号,进行累加处理后输出到根号运算器模块的同时将其作为整个模块的输出信号rjj 2,根号运算器模块从加法器接收到信号之后,进行开平方处理后输出到除法器1到除法器n作为除法器1到除法器n的除数,同时作为整个模块的输出信号rjj,除法器e1(e1大于等于1小于等于n)从外部接收到输入向量aj的第e1个元素作为被除数,并将从根号运算器接收到的信号作为除数,运算结果作为整个模块输出向量qj2的第e1个元素。As shown in Figure 2, in a specific application example, the diagonal processing module includes a multiplier, an adder, a root operator module and a divider, and the multiplier e (e is greater than or equal to 1 and less than or equal to n) receives an input vector from the outside The e-th element of a j is self-multiplied and then output to the adder. The adder receives signals from multiplier 1 to multiplier n, performs cumulative processing, and outputs it to the root operator module while using it as The output signal r jj 2 of the whole module, after the root operator module receives the signal from the adder, performs square root processing and outputs it to divider 1 to divider n as the divisor of divider 1 to divider n, and at the same time as the whole The output signal r jj of the module, the divider e1 (e1 is greater than or equal to 1 and less than or equal to n) receives the e1th element of the input vector a j as the dividend from the outside, and uses the signal received from the root operator as the divisor, and operates The result is the e1th element of the entire module output vector qj2 .

上述对角处理模块用于计算Q矩阵的第j2个列向量qj2,R矩阵的对角线元素rjj以及对角线元素的平方rjj 2,其中rjj 2将会被用于迭代处理模块的输入,由于对角处理模块输出rjj 2的时刻和迭代处理模块需要用到rjj 2的时刻相同,所以两个模块可以并行执行,从而提高了QR分解的速度。The above diagonal processing module is used to calculate the j2th column vector q j2 of the Q matrix, the diagonal element r jj of the R matrix and the square of the diagonal element r jj 2 , where r jj 2 will be used for iterative processing The input of the module, because the moment when the diagonal processing module outputs r jj 2 is the same as the moment when the iterative processing module needs to use r jj 2 , the two modules can be executed in parallel, thereby improving the speed of QR decomposition.

如图3所示,在具体应用实例中,迭代处理模块包括第一共享硬件,第一共享硬件包含了一个多路选择器和乘法器1到乘法器n,多路选择器为乘法器1到乘法器n选择不同的输入作为乘数,多路选择器从外部的aj3 p向量和除法器的输出信号接收到输入进行选择后输出结果到乘法器1到乘法器n,当使能信号为‘0’时,乘法器e2(e2大于等于1小于等于n)从多路选择器接收到的信号作为一个乘数,从外部接收的aj p向量的第e2个元素作为另一个乘数,进行相乘运算后将结果输出到加法器模块,加法器模块从乘法器1到乘法器n接收到输入信号,进行累加处理之后输出到除法器模块,除法器从加法器模块接收到的信号作为被除数,从外部接收到的信号rjj 2作为除数,进行相除运算后输出到多路选择器1的输入,当使能信号为‘1’时,乘法器e2将运算结果输出到减法器e3(e3大于等于1小于等于n),减法器e3从乘法器e2接收到信号作为减数,从外部接收到aj3 p信号的第e3个元素作为被减数,进行相减处理后结果作为整个模块输出信号aj3 p+1向量的第e3个元素。As shown in Figure 3, in a specific application example, the iterative processing module includes the first shared hardware, the first shared hardware includes a multiplexer and multiplier 1 to multiplier n, and the multiplexer is multiplier 1 to multiplier n The multiplier n selects different inputs as the multiplier, and the multiplexer receives the input from the external a j3 p vector and the output signal of the divider to select and output the result to multiplier 1 to multiplier n, when the enable signal is When '0', the signal received by the multiplier e2 (e2 is greater than or equal to 1 and less than or equal to n) from the multiplexer is used as a multiplier, and the e2th element of the a j p vector received from the outside is used as another multiplier, After the multiplication operation, the result is output to the adder module. The adder module receives input signals from multiplier 1 to multiplier n, performs accumulation processing, and then outputs to the divider module. The signal received by the divider from the adder module is used as Divisor, the signal r jj 2 received from the outside is used as the divisor, after the division operation, it is output to the input of multiplexer 1. When the enable signal is '1', the multiplier e2 outputs the operation result to the subtractor e3 (e3 is greater than or equal to 1 and less than or equal to n), the subtractor e3 receives the signal from the multiplier e2 as the subtrahend, and receives the e3th element of the a j3 p signal from the outside as the minuend, and the result after subtraction is used as the whole The block outputs the e3th element of the signal a j3 p+1 vector.

上述迭代处理模块用于计算下一次迭代矩阵的第j3列,图中所示需要用到第一共享硬件模块的两个位置相互独立,所以可以通过硬件的分时共享技术节约硬件资源。The aforementioned iterative processing module is used to calculate the j3th column of the next iterative matrix. The two positions shown in the figure that require the use of the first shared hardware module are independent of each other, so hardware resources can be saved through the time-sharing technology of hardware.

如图4所示,在具体应用实例中,三角处理模块包括第二共享硬件,多路选择器1的输入分别为aj3向量的n个元素和aj3+1向量的n个元素,当多路选择器使能信号为‘0’时,多路选择器1选通aj3向量的元素输出到乘法器1到乘法器n,多路选择器使能信号为‘1’时,多路选择器1选通aj3+1向量的元素输出到乘法器1到乘法器n,乘法器e4从多路选择器接收到的数据作为一个乘数,从外部接收到qj2向量的第e4个元素作为另一个乘数,进行相乘运算后输出到加法器,加法器从乘法器接收到信号之后进行累加运算,当多路选择器使能信号为‘0’时,累加器输出信号作为三角处理模块的输出信号rj2,j3,当多路选择器使能信号为‘1’时,累加器输出信号作为三角处理模块的输出信号rj2,j3+1As shown in Figure 4, in a specific application example, the triangular processing module includes the second shared hardware, and the input of the multiplexer 1 is respectively n elements of a j3 vector and n elements of a j3+1 vector, when multiple When the enable signal of the multiplexer is '0', the multiplexer 1 selects the elements of the a j3 vector and outputs to multiplier 1 to multiplier n, and when the enable signal of the multiplexer is '1', the multiplexer selects Element 1 gating a j3+1 vector is output to multiplier 1 to multiplier n, multiplier e4 receives the data from the multiplexer as a multiplier, and receives the e4th element of q j2 vector from the outside As another multiplier, it is multiplied and output to the adder. The adder receives the signal from the multiplier and performs accumulation operation. When the multiplexer enable signal is '0', the output signal of the accumulator is processed as a triangle. The output signal r j2,j3 of the module, when the enable signal of the multiplexer is '1', the output signal of the accumulator is used as the output signal r j2,j3+1 of the triangular processing module.

上述三角处理模块用于计算矩阵R位于坐标[j2,j3]和坐标[j2,j3+1]处的元素,图4与图2、图3的对比可知,计算坐标[j2,j3]处元素值的时间小于第二基本模块和第一基本模块执行时间的50%,因此在本发明中将计算坐标[j2,j3]处元素值的硬件资源分时复用,达到节约硬件资源的目的。The above triangular processing module is used to calculate the elements of matrix R located at coordinates [j2, j3] and coordinates [j2, j3+1]. The comparison between Figure 4 and Figure 2 and Figure 3 shows that the elements at coordinates [j2, j3] are calculated The value time is less than 50% of the execution time of the second basic module and the first basic module, so in the present invention, the hardware resources for calculating element values at coordinates [j2, j3] are time-divisionally multiplexed to achieve the purpose of saving hardware resources.

本发明进一步提供一种基于上述分解装置的分解方法,对一个n×n的矩阵A使用上述分解装置的电路进行QR分解共需要经过n步,其具体步骤为:The present invention further provides a decomposition method based on the above-mentioned decomposition device. It needs n steps to perform QR decomposition on an n×n matrix A using the circuit of the above-mentioned decomposition device. The specific steps are:

步骤S1:矩阵A的n个列向量a1,……an作为QR分解模块的输入信号,a1作为第一个对角处理模块的输入,对角处理模块的输出为r11,和q1,迭代处理模块计算下一次的迭代矩阵,其输入为a1和aj(1<j<n+1,j为正整数),输出为下一次的迭代矩阵aj 1。第一步中的各输出信号的值如式(1)所示;Step S1: The n column vectors a 1 ,...a n of the matrix A are used as the input signal of the QR decomposition module, a 1 is used as the input of the first diagonal processing module, and the output of the diagonal processing module is r 11 , and q 1. The iteration processing module calculates the next iteration matrix, its input is a 1 and a j (1<j<n+1, j is a positive integer), and the output is the next iteration matrix a j 1 . The value of each output signal in the first step is shown in formula (1);

rr 1111 == (( aa 1111 )) 22 ++ (( aa 21twenty one )) 22 ...... … ++ (( aa nno 11 )) 22 qq 1111 == aa 1111 rr 1111 ,, ...... … ,, qq nno 11 == aa nno 11 rr 1111 aa 11 11 == 00 aa 22 11 == aa 22 -- rr 1212 qq 11 == aa 22 -- qq 11 TT aa 22 qq 11 == aa 22 -- aa 11 TT aa 22 aa 11 rr 1111 22 ...... … aa nno 11 == aa 33 -- aa 11 TT aa nno aa 11 rr 1111 22 -- -- -- (( 11 ))

从步骤S1可以发现本发明与传统的QR分解方法最大的不同在于,本发明超前一步计算出了下一次的迭代矩阵,传统的QR分解之所以在第二步计算下一次迭代矩阵是因为迭代矩阵的计算需要使用到第一步的输出结果,本发明通过对传统方法的改进使用第一步的输入计算下一次的迭代矩阵,大大提高了QR分解速度。From step S1, it can be found that the biggest difference between the present invention and the traditional QR decomposition method is that the present invention calculates the next iteration matrix one step ahead, and the reason why the traditional QR decomposition calculates the next iteration matrix in the second step is because the iteration matrix The calculation needs to use the output result of the first step, and the present invention uses the input of the first step to calculate the next iterative matrix by improving the traditional method, which greatly improves the QR decomposition speed.

步骤S2~Sj步:j大于等于2小于n,将第j-1步输入的信号aj j-2,……,an j-2以及第j-1步输出的信号qj-1,aj j-1,……,an j-1作为第二步的输入信号,其中aj j-1作为对角处理模块的输入,用于计算rjj和qj,第k3个对角处理模块的输入信号为qj-1,aj3 j-2和aj3+1 j-2(当n-j为奇数时,j3大于等于j小于等于n-1正整数,当n-j为偶数时,j3为大于等于j小于等于n的正整数),用于计算rj-1,j3和rj-1,j3+1,与第一步类似,迭代处理模块用来计算下一次的迭代矩阵,其输入为aj j-1,……,an j-1,输出为aj+1 j,……,an j,第j步中各输出如式(2)所示;Step S2~Sj step: j is greater than or equal to 2 and less than n, the input signal a j j-2 ,..., a n j-2 of the j-1th step and the output signal q j-1 of the j-1th step, a j j-1 ,..., a n j-1 is used as the input signal of the second step, where a j j-1 is used as the input of the diagonal processing module to calculate r jj and q j , the k3th diagonal The input signals of the processing module are q j-1 , a j3 j-2 and a j3+1 j-2 (when nj is an odd number, j3 is greater than or equal to j less than or equal to n-1 positive integer, when nj is an even number, j3 is a positive integer greater than or equal to j and less than or equal to n), used to calculate r j-1, j3 and r j-1, j3+1 , similar to the first step, the iterative processing module is used to calculate the next iteration matrix, its The input is a j j-1 ,..., a n j-1 , the output is a j+1 j ,..., a n j , and each output in step j is shown in formula (2);

rr jj ,, jj == (( aa 11 ,, jj jj -- 11 )) 22 ++ (( aa 22 ,, jj jj -- 11 )) 22 ++ ...... … ++ (( aa nno ,, jj jj -- 11 )) 22 qq 11 ,, jj == aa 11 ,, jj jj -- 11 rr jj ,, jj ,, .......... ,, qq nno ,, jj == aa nno ,, jj jj -- 11 rr jj ,, jj rr jj -- 11 ,, jj 33 == qq jj -- 11 TT aa jj 33 jj -- 22 == qq 11 ,, jj -- 11 aa 11 ,, jj 33 ++ ...... … ++ qq nno ,, jj -- 11 aa nno ,, jj 33 ,, jj &le;&le; jj 33 &le;&le; nno ;; jj 33 &Element;&Element; NN aa ii 44 jj == aa ii 44 jj -- 11 -- (( aa jj jj -- 11 )) TT aa ii 44 jj -- 11 aa jj jj -- 11 rr jj jj 22 ,, jj &le;&le; ii 44 &le;&le; nno ;; ii 44 &Element;&Element; NN

步骤Sn:将第n-1步的输入an n-2以及第n-1步的输出qn-1和an n-1作为输入,其中an n-1作为block1的输入,block1的输出为rn,n和qn,qn-1和an n-2作为block3的输入,block3的输出为rn-1,n,第n步中各输出如式(3)所示;Step Sn: Take the input a n n-2 of step n-1 and the output q n-1 and a n n-1 of step n-1 as input, where a n n-1 is used as the input of block1, and the output of block1 The output is r n,n and q n , q n-1 and a n n-2 are used as the input of block3, the output of block3 is r n-1,n , and each output in the nth step is shown in formula (3);

rr nno ,, nno == (( aa 11 ,, nno nno -- 11 )) 22 ++ (( aa 22 ,, nno nno -- 11 )) 22 ++ ...... … ++ (( aa nno ,, nno nno -- 11 )) 22 qq 11 ,, nno == aa 11 ,, nno nno -- 11 rr nno ,, nno ,, .......... ,, qq nno ,, nno == aa nno ,, nno nno -- 11 rr nno ,, nno rr nno -- 11 ,, nno == qq nno -- 11 TT aa nno nno -- 22 == qq 11 ,, nno -- 11 aa 11 ,, nno nno -- 22 ++ ...... … ++ qq nno ,, nno -- 11 aa nno ,, nno nno -- 22 -- -- -- (( 33 ))

由上可知,对于一个n×n的进行QR分解,本发明所提结构仅需要n个时间单元即可完成,而使用R.-H.Chang等人提出的三角脉动阵列结构需要2n-1个时间单元,如对于前述的4×4的矩阵A,采用本发明进行QR分解,只需要4个时间单元即可完成,相比7个,少了3个时间单元。因此,本发明所提基于超前迭代的三角脉动阵列结构QR分解可以显著加快QR分解的速度。It can be seen from the above that for an n×n QR decomposition, the proposed structure of the present invention only needs n time units to complete, while using the triangular pulsation array structure proposed by R.-H.Chang et al. requires 2n-1 For the time unit, for the aforementioned 4×4 matrix A, the QR decomposition using the present invention only needs 4 time units, which is 3 time units less than 7. Therefore, the triangular systolic array structure QR decomposition based on advanced iteration proposed by the present invention can significantly speed up the QR decomposition.

以上仅是本发明的优选实施方式,本发明的保护范围并不仅局限于上述实施例,凡属于本发明思路下的技术方案均属于本发明的保护范围。应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明原理前提下的若干改进和润饰,应视为本发明的保护范围。The above are only preferred implementations of the present invention, and the protection scope of the present invention is not limited to the above-mentioned embodiments, and all technical solutions under the idea of the present invention belong to the protection scope of the present invention. It should be pointed out that for those skilled in the art, some improvements and modifications without departing from the principle of the present invention should be regarded as the protection scope of the present invention.

Claims (6)

1. a triangle systolic array architecture QR decomposer based on advanced iterative, is used for that the matrix A of n × n is carried out QR and divides Solve, it is characterised in that it includes diagonal angle processing module, iterative processing module and triangulation process module;Wherein, n diagonal angle processes Module, (n-1)+(n-2)+...+1=n × (n-1)/2 iterative processing module, when n is even number, employing n/2+ (n-2)+ (n-4)+(n-6)+...+2=n2/ 4 triangulation process modules, when n is odd number, employing (n-1)+(n-3)+(n-5)+... + 2=(n+1) (n-1)/4 triangulation process module;First diagonal angle processing module is received externally first row of matrix A Vector a1, result of calculation q1And r11As the output of whole QR decomposing module, and by q1Output is to next step triangulation process mould Block, calculates the r of generation during calculatingjj 2Signal exports all iterative processing module in the first step;-1 iteration of jth Processing module will be externally received jth column vector a of matrix Aj, wherein j more than or equal to 2 less than or equal to n-1, the of matrix A One column vector a1R with first diagonal angle processing module outputjj 2As input, it is calculated next iteration matrix A 1 Jth column vector aj 1, wherein a1 1As the input of second diagonal angle processing module, A1Remaining column vector as second step iteration As the input of the 3rd step triangulation process module while the input of processing module;By that analogy, finally by triangulation process mould Block obtains output signal r of QR decomposing module after processingn-1,n
Triangle systolic array architecture QR decomposer based on advanced iterative the most according to claim 1, it is characterised in that I-th diagonal angle processing module is to a from the i-th-1 step outputi i-1Signal carries out being calculated the output r of QR decomposing moduleiiAnd qi, And it has been calculated rii 2, wherein qiVector is as the input of next step triangulation process module, rii 2As iteration all in the i-th step The input of processing module, iterative processing module receives column vector a from the i-th-1 stepi i-1Signal and ai1 i-1Signal, and at diagonal angle Reason module obtains rii 2Signal, as input, obtains Iterative Matrix A next time after processiThe i-th 1 column vectors, wherein ai+1 iAs the input of i+1 diagonal angle processing module, AiRemaining column vector as next step iteration module input while make Being the input of the i-th+2 step triangulation process module, triangulation process module receives input signal q from the i-th-1 stepi-1While from I-2 step receives ai2 i-2Signal and ai2+1 i-2Signal, obtains output signal r of QR decomposing module after processi-1,i2Signal and ri-1,i2+1
N-th diagonal angle processing module is to a from n-1 step outputn n-1Signal carries out process and obtains QR decomposing module output signal rnn And qn, 4 triangulation process modules of kth receive signal q from n-1 stepn-1And receive signal a from n-2 stepn n-2, obtain after process Output signal r of QR decomposing modulen-1,n
Triangle systolic array architecture QR decomposer based on advanced iterative the most according to claim 1 and 2, its feature exists In, described diagonal angle processing module includes multiplier, adder, radical sign operator block and divider, and multiplier e is from external reception To input vector ajThe e element, wherein e is more than or equal to 1 less than or equal to n, and after it is carried out involution process, output is to addition Device, adder receives signal from multiplier 1 to multiplier n, and after carrying out accumulation process, the same of radical sign operator block is arrived in output Time as output signal r of whole modulejj 2, radical sign operator block, after adder receives signal, carries out out flat Side process after output to divider 1 to divider n as the divisor of divider 1 to divider n, defeated simultaneously as whole module Go out signal rjj, divider e1 is received externally input vector ajThe e1 element as dividend, and will be from radical sign computing The signal that device receives is as divisor, and wherein e1 is more than or equal to 1 less than or equal to n, and operation result is as whole module output vector qj2The e1 element.
Triangle systolic array architecture QR decomposer based on advanced iterative the most according to claim 1 and 2, its feature exists In, described iterative processing module includes that first shares hardware, and first shares hardware contains a MUX and multiplier To multiplier n, MUX is that multiplier 1 selects different inputs as multiplier to multiplier n, and MUX is from outside Aj3 pThe output signal of vector sum divider receives and outputs results to multiplier 1 after input selects to multiplier n, when Enabling signal when be ' 0 ', the signal that multiplier e2 receives from MUX is as a multiplier, and wherein e2 is more than or equal to 1 Less than or equal to n, from a of external receptionj pThe e2 element of vector is as another multiplier, after carrying out multiplication operation, result is defeated Going out to adder Module, adder Module receives input signal from multiplier 1 to multiplier n, defeated after carrying out accumulation process Going out to divider module, the signal that divider receives from adder Module is as dividend, the signal r being received externallyjj 2 As divisor, after carrying out division operation, the input of MUX 1 is arrived in output, and when enabling signal and being ' 1 ', multiplier e2 will transport Calculating result and export subtractor e3, wherein e3 receives signal work less than or equal to n, subtractor e3 from multiplier e2 more than or equal to 1 For subtrahend, it is received externally aj3 pThe e3 element of signal is as minuend, and after carrying out subtracting each other process, result is as whole mould Block output signal aj3 p+1The e3 element of vector.
Triangle systolic array architecture QR decomposer based on advanced iterative the most according to claim 4, it is characterised in that Described triangulation process module includes that second shares hardware, and the input of MUX 1 is respectively aj3N element of vector and aj3+1 N element of vector, when MUX enable signal is ' 0 ', MUX 1 gates aj3The element of vector exports to be taken advantage of Musical instruments used in a Buddhist or Taoist mass 1 is to multiplier n, and when MUX enable signal is ' 1 ', MUX 1 gates aj3+1The element of vector exports The data that multiplier 1 receives to multiplier n, multiplier e4 from MUX, as a multiplier, are received externally qj2 The e4 element of vector is as another multiplier, and after carrying out multiplication operation, output receives to adder, adder from multiplier Carrying out accumulating operation after signal, when MUX enable signal is ' 0 ', accumulator output signal is as triangulation process Output signal r of modulej2,j3, when MUX enable signal is ' 1 ', accumulator output signal is as triangulation process module Output signal rj2,j3+1
6. one kind based on the QR decomposition method of any one decomposer in the claims 1~5, it is characterised in that step For:
Step S1: n column vector a of matrix A1,……anAs the input signal of QR decomposing module, a1As first diagonal angle The input of processing module, diagonal angle processing module is output as r11, and q1, iterative processing module calculates Iterative Matrix next time, Its input is a1And aj, wherein 1 < j < n+1, j is positive integer, is output as Iterative Matrix a next timej 1
Step S2~Sj step: j are less than n, signal a jth-1 step inputted more than or equal to 2j j-2..., an j-2And jth-1 step The signal q of outputj-1, aj j-1..., an j-1As the input signal of second step, wherein aj j-1Defeated as diagonal angle processing module Enter, be used for calculating rjjAnd qj, the input signal of 3 diagonal angle processing modules of kth is qj-1, aj3 j-2And aj3+1 j-2;When n-j is odd number Time, j3 is more than or equal to j less than or equal to n-1 positive integer, and when n-j is even number, j3 is the positive integer being less than or equal to n more than or equal to j; For calculating rJ-1, j3And rJ-1, j3+1, similar with the first step, iterative processing module is used for the Iterative Matrix calculated next time, and it is defeated Enter for aj j-1..., an j-1, it is output as aj+1 j..., an j
Step Sn: by the input a of the (n-1)th stepn n-2And the (n-1)th output q of stepn-1And an n-1As input, wherein an n-1As The input of block1, block1 is output as rn,nAnd qn, qn-1And an n-2As the input of block3, block3 is output as rn-1,n
CN201610173392.0A 2016-03-24 2016-03-24 Triangle systolic array architecture QR decomposer and decomposition method based on advanced iterative Active CN105846873B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610173392.0A CN105846873B (en) 2016-03-24 2016-03-24 Triangle systolic array architecture QR decomposer and decomposition method based on advanced iterative

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610173392.0A CN105846873B (en) 2016-03-24 2016-03-24 Triangle systolic array architecture QR decomposer and decomposition method based on advanced iterative

Publications (2)

Publication Number Publication Date
CN105846873A true CN105846873A (en) 2016-08-10
CN105846873B CN105846873B (en) 2018-12-18

Family

ID=56583444

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610173392.0A Active CN105846873B (en) 2016-03-24 2016-03-24 Triangle systolic array architecture QR decomposer and decomposition method based on advanced iterative

Country Status (1)

Country Link
CN (1) CN105846873B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113779501A (en) * 2021-08-23 2021-12-10 华控清交信息科技(北京)有限公司 Data processing method and device and data processing device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101170525A (en) * 2006-10-25 2008-04-30 中兴通讯股份有限公司 MLSE simplification detection method and its device based on blocked QR decomposition
US20090154608A1 (en) * 2007-12-18 2009-06-18 Electronics And Telecommunications Research Institute Receiving apparatus and method for mimo system
CN101674160A (en) * 2009-10-22 2010-03-17 复旦大学 Signal detection method and device for multiple-input-multiple-output wireless communication system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101170525A (en) * 2006-10-25 2008-04-30 中兴通讯股份有限公司 MLSE simplification detection method and its device based on blocked QR decomposition
US20090154608A1 (en) * 2007-12-18 2009-06-18 Electronics And Telecommunications Research Institute Receiving apparatus and method for mimo system
CN101674160A (en) * 2009-10-22 2010-03-17 复旦大学 Signal detection method and device for multiple-input-multiple-output wireless communication system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
朱勇旭,吴斌,周玉梅,蔡菁菁,夏凯锋: ""用于MIMO-OFDM系统QR分解的分布式脉动阵列处理算法"", 《电子与信息学报》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113779501A (en) * 2021-08-23 2021-12-10 华控清交信息科技(北京)有限公司 Data processing method and device and data processing device
CN113779501B (en) * 2021-08-23 2024-06-04 华控清交信息科技(北京)有限公司 Data processing method and device for data processing

Also Published As

Publication number Publication date
CN105846873B (en) 2018-12-18

Similar Documents

Publication Publication Date Title
CN110533164B (en) Winograd convolution splitting method for convolution neural network accelerator
CN101504638B (en) A Variable Number of Points Pipeline FFT Processor
WO2017088458A1 (en) Pipeline-level computation apparatus, data processing method and network-on-chip chip
CN105426345A (en) Matrix inverse operation method
CN103677738A (en) Method and device for achieving low delay basic transcendental function based on mixed model CORDIC algorithmic
CN110688088A (en) General nonlinear activation function computing device and method for neural network
CN108733627A (en) A kind of FPGA implementation method that positive definite matrix Cholesky is decomposed
CN108805273A (en) Door control unit accelerates the hardware circuit implementation of operation in a kind of LSTM
CN110361691A (en) Coherent DOA based on nonuniform noise estimates FPGA implementation method
CN104680236A (en) FPGA implementation method of kernel function extreme learning machine classifier
CN111443893A (en) N-time root calculation device and method based on CORDIC algorithm
CN104063847A (en) FPGA based guide filter and achieving method thereof
Chen et al. A throughput-optimized channel-oriented processing element array for convolutional neural networks
CN108736935A (en) A kind of general down and out options method for extensive mimo system signal detection
CN105846873B (en) Triangle systolic array architecture QR decomposer and decomposition method based on advanced iterative
Wang et al. Bitnet. cpp: Efficient edge inference for ternary llms
CN111860792B (en) A hardware implementation device and method for activation function
CN113837365A (en) Model for realizing sigmoid function approximation, FPGA circuit and working method
CN110555519B (en) A low-complexity convolutional neural network architecture based on symbolic stochastic computation
CN105847200A (en) Iteration structure QR decomposition device based on advanced iteration and QR decomposition method thereof
CN212569855U (en) Hardware implementation device for activating function
CN211577939U (en) Special calculation array for neural network
CN108429573B (en) Control method of MMSE detection circuit based on time hiding
Laxman et al. Area and Power Efficient Design of Novel Karatsuba Double MAC (K-DMAC)
Sun et al. An implementation of FFT processor

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant