CN105846873B - Triangle systolic array architecture QR decomposer and decomposition method based on advanced iterative - Google Patents

Triangle systolic array architecture QR decomposer and decomposition method based on advanced iterative Download PDF

Info

Publication number
CN105846873B
CN105846873B CN201610173392.0A CN201610173392A CN105846873B CN 105846873 B CN105846873 B CN 105846873B CN 201610173392 A CN201610173392 A CN 201610173392A CN 105846873 B CN105846873 B CN 105846873B
Authority
CN
China
Prior art keywords
signal
module
output
multiplier
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610173392.0A
Other languages
Chinese (zh)
Other versions
CN105846873A (en
Inventor
邢座程
刘苍
原略超
唐川
张洋
王庆林
王�锋
汤先拓
危乐
吕朝
董永旺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN201610173392.0A priority Critical patent/CN105846873B/en
Publication of CN105846873A publication Critical patent/CN105846873A/en
Application granted granted Critical
Publication of CN105846873B publication Critical patent/CN105846873B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B7/00Radio transmission systems, i.e. using radiation field
    • H04B7/02Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas
    • H04B7/04Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas
    • H04B7/0413MIMO systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B7/00Radio transmission systems, i.e. using radiation field
    • H04B7/02Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas
    • H04B7/04Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas
    • H04B7/08Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the receiving station
    • H04B7/0837Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the receiving station using pre-detection combining
    • H04B7/0842Weighted combining
    • H04B7/0848Joint weighting
    • H04B7/0854Joint weighting using error minimizing algorithms, e.g. minimum mean squared error [MMSE], "cross-correlation" or matrix inversion

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Complex Calculations (AREA)

Abstract

A kind of triangle systolic array architecture QR decomposer and decomposition method based on advanced iterative is used to carry out QR decomposition to the matrix A of n × n, it includes diagonal processing module, iterative processing module and triangulation process module;First diagonal processing module is received externally first column vector a of matrix A1, as a result q1And r11As the output of QR decomposing module, and by q1It is output to the triangulation process module of next step, the r of generationjj 2Signal is output to all iterative processing modules in the first step;- 1 iterative processing module of jth will be externally received j-th of column vector a of matrix Aj, first column vector a of matrix A1With the r of first diagonal processing module outputjj 2As input, next iteration matrix A is obtained1J-th of column vector aj 1;And so on, finally by the output signal r for obtaining QR decomposing module after triangulation process resume modulen‑1,n.Decomposition method is implemented based on above-mentioned decomposer.The present invention has many advantages, such as that principle is simple, decomposition rate is fast, high-efficient.

Description

Triangle systolic array architecture QR decomposer and decomposition method based on advanced iterative
Technical field
Present invention relates generally to wireless communication system base band signal process fields, refer in particular to a kind of three based on advanced iterative Angle systolic array architecture QR decomposer and decomposition method.
Background technique
Orthogonal frequency division multiplexing (OFDM, orthogonal frequency division multiplexing) technology and more Input multi output technology (MIMO, multiple input multiple output) technology because of it with high spectrum utilization and High transfer rate receives extensive attention, and makes in recent years about a series of progress of precoding technique based on MIMO- The multi-user wireless communication system of OFDM technology may be implemented simultaneously as multiple user services.However it is based on MIMO-OFDM technology Multi-user wireless communication system base band signal process algorithm computation complexity greatly increase, the design to baseband signal processor Propose unprecedented challenge.
In the base band signal process link based on MIMO-OFDM wireless communication system, precoding algorithms and MIMO detection Algorithm is more complicated two base band signal process algorithms, obtains the extensive concern of researcher in recent years.Nineteen eighty-three, Costa The dirty paper code algorithm of proposition is considered as in its classic paper " Writing on dirty paper " (" dirty paper code ") The best nonlinear precoding algorithm of performance, but its computation complexity is especially high, as a consequence it is hardly possible on hardware circuit in real time Ground executes, and Wei Yu in 2005 et al. is in its paper " Trellis and Convolutional Precoding for Transmitter-Based the Interference Presubtraction " (" transmitter based on grid and convolution precoding Interference is pre- to be eliminated ") in THP (Tomlinson-Harashima Precoding) algorithm for nonlinear precoding and is obtained Preferable interference eradicating efficacy, although its performance decreases compared with dirty paper code algorithm, its computation complexity drops significantly It is low, make it possible hardware realization nonlinear precoding algorithm, the highest part of computation complexity is to letter in THP algorithm Road matrix H executes the part that QR is decomposed, and efficiently quickly QR Knock-Down Component helps to improve THP precoding algorithms overall performance. Maximum- likelihood estimation is that MIMO detects the highest algorithm of detection accuracy in all algorithms, however its computation complexity is suitable Height, therefore, M.Shabany et al. is in " 0.13 μm of CMOS 655Mb/s of A, 4 × 4 64-QAM k-best MIMO Detector " (" uses 4 × 4MIMO detector of 655Mb/s when 64-QAM modulation system to set under 0.13 μm of CMOS technology Meter ") in using maximum- likelihood estimation approximate algorithm spherical shape detection (SD) algorithm carry out MIMO detection, achieve well Detection effect, QR decompose one of the bottleneck as SD algorithm, restrict it and execute speed.
It is widely used in multi-user's baseband signal processor based on MIMO-OFDM technology since QR is decomposed, And be the bottleneck for restricting processing speed in many cases, therefore, QR is decomposed in the design of many baseband signal processors and is made It is optimized for an important arithmetic unit.So-called QR is decomposed, and the matrix A of n × n is exactly decomposed into the unitary matrice Q of n × n With the upper triangular matrix R of n × n, current QR decomposition algorithm is broadly divided into three classes, be based respectively on Householder transformation, Given rotation and MGS (modified Gram-Schmidt) algorithm, since the QR converted based on Householder is decomposed very Difficult hardware realization, so using less although the QR decomposition algorithm based on Given rotation greatly reduces used hardware Resource, but the execution time needed for it is longer, does not meet the requirement of communication system real-time, the QR based on MGS algorithm is decomposed Because occupancy hardware resource is less and executes the time shorter actual demand for meeting communication system.
There is practitioner R.-H.Chang et al. to publish an article " Iterative QR decomposition architecture using the modified Gram-Schmidt algorithm for MIMO systems” (" based on the Iterative QR Decomposition structure of MGS algorithm in mimo system ") proposes a kind of triangle systolic arrays based on MGS algorithm Structure QR decomposes hardware circuit, and the QR for completing a n (n is the positive integer more than or equal to 2) rank square matrix is decomposed, the triangle proposed Systolic array architecture QR decomposition circuit only needs 2n-1 time quantum.In specific application, it is proposed using R.-H.Chang et al. Triangle systolic array architecture QR decomposition circuit QR decomposition is carried out to one 4 × 4 matrix A one 4 × 4 matrix is made Carrying out QR decomposition with the iteration structure based on MGS algorithm needs seven steps can be completed, and each step needs a time quantum, needs altogether Want seven time quantums.Although it can be seen that the QR decomposition method for the triangle systolic array architecture that R.-H.Chang et al. is proposed It greatly reduces and calculates the time, but intentionally get the faster QR of speed in the base band signal process of practical communication system and decompose knot Structure.And the document that the QR for also only only relating to 4 × 4 matrixes at present is decomposed, there is not n × n matrix QR of announcement to decompose hard Part circuit.
Summary of the invention
The technical problem to be solved in the present invention is that, for technical problem of the existing technology, the present invention provides one Kind principle is simple, easily realize, decomposition rate is fast, the high-efficient triangle systolic array architecture QR decomposer based on advanced iterative And decomposition method.
In order to solve the above technical problems, the invention adopts the following technical scheme:
A kind of triangle systolic array architecture QR decomposer based on advanced iterative is used to carry out QR to the matrix A of n × n It decomposes, it includes diagonal processing module, iterative processing module and triangulation process module;Wherein, n diagonal processing modules, (n-1) + (n-2)+...+1=n × (n-1)/2 iterative processing module, when n is even number, using n/2+ (n-2)+(n-4)+(n-6) + ...+2=n2/4 triangulation process modules, when n is odd number, using (n-1)+(n-3)+(n-5)+...+2=(n+1) (n-1)/4 triangulation process module;First diagonal processing module is received externally first column vector a of matrix A1, meter Calculate result q1And r11As the output of entire QR decomposing module, and by q1It is output to the triangulation process module of next step, is being calculated The r of generation is calculated in the processjj 2Signal is output to all iterative processing modules in the first step;- 1 iterative processing module of jth will It is externally received j-th of column vector a of matrix Aj, wherein j is more than or equal to 2 and is less than or equal to n-1, first column vector a of matrix A1 With the r of first diagonal processing module outputjj 2As input, next iteration matrix A is calculated1J-th of column vector aj 1, wherein a1 1As the input of second diagonal processing module, A1Remaining column vector as second step iterative processing module Input while input as third step triangulation process module;And so on, it is obtained finally by after triangulation process resume module To the output signal r of QR decomposing modulen-1,。
Further improvement as decomposer of the present invention: i-th of diagonal processing module is to a exported from the (i-1)-th stepi i-1 Signal carries out the output r that QR decomposing module is calculatediiAnd qi, and r has been calculatedii 2, wherein qiVector is as in next step three The input of angle processing module, rii 2As the input of all iterative processing modules in the i-th step, iterative processing module connects from the (i-1)-th step Receive column vector ai i-1Signal and ai1 i-1Signal, and r is obtained from diagonal processing moduleii 2Signal obtains after processing as input Iterative Matrix A next timeiThe i-th 1 column vectors, wherein ai+1 iAs the input of the diagonal processing module of i+1, AiIts Input while remaining column vector is inputted as next step iteration module as the i-th+2 step triangulation process module, triangulation process mould Block receives input signal q from the (i-1)-th stepi-1While from the i-th -2 step receive ai2 i-2Signal and ai2+1 i-2Signal handles it The output signal r of QR decomposing module is obtained afterwardsi-1,i2Signal and ri-1,i2+1
N-th of diagonal processing module is to a from n-1 step outputn n-1Signal is handled to obtain QR decomposing module output letter Number rnnAnd qn, 4 triangulation process modules of kth from n-1 step receive signal qn-1And signal a is received from n-2 stepn n-2, after processing Obtain the output signal r of QR decomposing modulen-1,n
Further improvement as decomposer of the present invention: the diagonal processing module includes multiplier, adder, radical sign Operator block and divider, multiplier e are received externally input vector ajE-th of element, wherein e is more than or equal to 1 small In being equal to n, it is output to adder after involution processing is carried out to it, adder receives signal from multiplier 1 to multiplier n, into As the output signal r of entire module while being output to radical sign operator block after row accumulation processjj 2, radical sign operation After device module receives signal from adder, divider 1 is output to divider n as divider after carrying out extraction of square root processing 1 arrives the divisor of divider n, while the output signal r as entire modulejj, divider e1 is received externally input vector aj The e1 element as dividend, and using the signal received from radical sign arithmetic unit as divisor, wherein it is small to be more than or equal to 1 by e1 In being equal to n, operation result is as entire module output vector qj2The e1 element.
Further improvement as decomposer of the present invention: the iterative processing module include the first shared hardware, first Shared hardware contains a multiple selector and multiplier to multiplier n, and multiple selector is that multiplier 1 is selected to multiplier n Different inputs is selected as multiplier, a of the multiple selector from outsidej3 pThe output signal of vector sum divider receive input into Multiplier 1 is outputted results to after row selection to multiplier n, when enable signal is ' 0 ', multiplier e2 is received from multiple selector The signal arrived is as a multiplier, and wherein e2 is more than or equal to 1 and is less than or equal to n, from external received aj pThe e2 element of vector As another multiplier, result is output to adder Module after progress multiplication operation, adder Module is from multiplier 1 to multiplication Device n receives input signal, carries out being output to divider module after accumulation process, divider is received from adder Module Signal is as dividend, the signal r that is received externallyjj 2As divisor, multiple selector 1 is output to after carrying out division operation Input, when enable signal is ' 1 ', operation result is output to subtracter e3 by multiplier e2, and wherein e3 is more than or equal to and 1 is less than Equal to n, subtracter e3 receives signal as subtrahend from multiplier e2, is received externally aj3 pThe e3 element of signal is made For minuend, result is carried out after subtracting each other processing as entire module output signal aj3 p+1The e3 element of vector.
Further improvement as decomposer of the present invention: the triangulation process module includes the second shared hardware, multichannel The input of selector 1 is respectively aj3The n element and a of vectorj3+1N element of vector, when multiple selector enable signal is When ' 0 ', multiple selector 1 gates aj3The element of vector is output to multiplier 1 to multiplier n, and multiple selector enable signal is When ' 1 ', multiple selector 1 gates aj3+1The element of vector is output to multiplier 1 to multiplier n, and multiplier e4 is from multi-path choice The data that device receives are received externally q as a multiplierj2The e4 element of vector is carried out as another multiplier Adder is output to after multiplication operation, adder carries out accumulating operation after receiving signal from multiplier, works as multiple selector When enable signal is ' 0 ', output signal r of the accumulator output signal as triangulation process modulej2,j3, when multiple selector is enabled When signal is ' 1 ', output signal r of the accumulator output signal as triangulation process modulej2,j3+1
A kind of QR decomposition method based on above-mentioned decomposer, the steps include:
Step S1: n column vector a of matrix A1,……anAs the input signal of QR decomposing module, a1As first The input of diagonal processing module, the output of diagonal processing module are r11, and q1, the iteration square of iterative processing module calculating next time Battle array, input are a1And aj, wherein 1 < j < n+1, j are positive integer, export as Iterative Matrix a next timej 1
Step S2~Sj step: j is more than or equal to 2 and is less than n, the signal a that -1 step of jth is inputtedj j-2... ..., an j-2And jth- The signal q of 1 step outputj-1, aj j-1... ..., an j-1As the input signal of second step, wherein aj j-1As diagonal processing module Input, for calculating rjjAnd qj, the input signal of kth 3 diagonal processing modules is qj-1, aj3 j-2And aj3+1 j-2;When n-j is surprise When number, j3 is more than or equal to j and is less than or equal to n-1 positive integer, and when n-j is even number, j3 is just whole less than or equal to n more than or equal to j Number;For calculating rJ-1, j3And rJ-1, j3+1, similar with the first step, iterative processing module is used to calculate Iterative Matrix next time, Input is aj j-1... ..., an j-1, export as aj+1 j... ..., an j; (2)
Step Sn: by the input a of the (n-1)th stepn n-2And (n-1)th step output qn-1And an n-1As input, wherein an n-1 As the input of block1, the output of block1 is rn,nAnd qn, qn-1And an n-2As the input of block3, the output of block3 For rn-1,n
Compared with the prior art, the advantages of the present invention are as follows: the triangle systolic arrays knot of the invention based on advanced iterative Structure QR decomposer and decomposition method, principle is simple, easily realizes, can dramatically speed up the speed of QR decomposition;For a n × n Carry out QR decomposition, the mentioned structure of the present invention only needs n time quantum can be completed, and use R.-H.Chang et al. proposition Triangle systolic array architecture need 2n-1 time quantum, such as matrix A for above-mentioned 4 × 4, using the present invention carry out QR It decomposes, it is only necessary to which 4 time quantums can be completed, and compares 7, has lacked 3 time quantums.
Detailed description of the invention
Fig. 1 is the topological structure schematic diagram of decomposer of the present invention.
Fig. 2 is the principle schematic diagram of present invention processing module diagonal in specific application example.
Fig. 3 is the principle schematic diagram of present invention iterative processing module in specific application example.
Fig. 4 is principle schematic diagram of the present invention in specific application example intermediate cam processing module.
Specific embodiment
The present invention is described in further details below with reference to Figure of description and specific embodiment.
As shown in Figure 1, being used to the present invention is based on the triangle systolic array architecture QR decomposer of advanced iterative to n × n's Matrix A carries out QR decomposition, it includes diagonal processing module, iterative processing module and triangulation process module;Wherein, n diagonal angle Reason module, (n-1)+(n-2)+...+1=n × (n-1)/2 iterative processing module needs n/2+ (n-2) when n is even number + (n-4)+(n-6)+...+2=n2/4 triangulation process modules need (n-1)+(n-3)+(n-5) when n is odd number + ...+2=(n+1) (n-1)/4 triangulation process module composition.
First diagonal processing module is received externally first column vector a of matrix A1, calculated result q1And r11Make For the output of entire QR decomposing module, and by q1It is output to the triangulation process module of next step, calculates and generates in calculating process Rjj 2Signal is output to all iterative processing modules in the first step;Jth -1 (j is more than or equal to 2 and is less than or equal to n-1) a iteration Processing module will be externally received j-th of column vector a of matrix Aj, first column vector a of matrix A1With first diagonal processing The r of module outputjj 2As input, next iteration matrix A is calculated1J-th of column vector aj 1, wherein a1 1As second The input of a diagonal processing module, A1Input of remaining column vector as second step iterative processing module while as third Walk the input of triangulation process module.Iterative processing module needs rjj 2When signal, first diagonal processing module is by rjj 2Letter It number calculates and to complete, so diagonal processing module and iterative processing module can execute parallel;
I-th of diagonal processing module is to a exported from the (i-1)-th stepi i-1Signal carries out that the defeated of QR decomposing module is calculated R outiiAnd qi, and r has been calculatedii 2, wherein qiInput of the vector as next step triangulation process module, rii 2As the i-th step In all iterative processing modules input, iterative processing module receives column vector a from the (i-1)-th stepi i-1Signal and ai1 i-1Signal, And r is obtained from diagonal processing moduleii 2Signal obtains Iterative Matrix A next time as input after processingiThe i-th 1 column Vector, wherein ai+1 iAs the input of the diagonal processing module of i+1, AiRemaining column vector it is defeated as next step iteration module Input while entering as the i-th+2 step triangulation process module, triangulation process module receive input signal q from the (i-1)-th stepi-1 While from the i-th -2 step receive ai2 i-2Signal and ai2+1 i-2Signal obtains the output signal of QR decomposing module after processing ri-1,i2Signal and ri-1,i2+1
N-th of diagonal processing module is to a from n-1 step outputn n-1Signal is handled to obtain QR decomposing module output letter Number rnnAnd qn, 4 triangulation process modules of kth from n-1 step receive signal qn-1And signal a is received from n-2 stepn n-2, after processing Obtain the output signal r of QR decomposing modulen-1,n
As shown in Fig. 2, diagonal processing module includes multiplier, adder, radical sign arithmetic unit mould in specific application example Block and divider, multiplier e (e is more than or equal to 1 and is less than or equal to n) are received externally input vector ajE-th of element, to it It is output to adder after carrying out involution processing, adder receives signal from multiplier 1 to multiplier n, after carrying out accumulation process As the output signal r of entire module while being output to radical sign operator blockjj 2, radical sign operator block is from addition After device receives signal, divider 1 is output to divider n as divider 1 to divider n after carrying out extraction of square root processing Divisor, while the output signal r as entire modulejj, (e1 is more than or equal to 1 and is received externally less than or equal to n) divider e1 Input vector ajThe e1 element as dividend, and using the signal received from radical sign arithmetic unit as divisor, operation knot Fruit is as entire module output vector qj2The e1 element.
Above-mentioned diagonal processing module is used to calculate 2 column vector q of jth of Q matrixj2, the diagonal entry r of R matrixjjWith And square r of diagonal entryjj 2, wherein rjj 2It will be used for the input of iterative processing module, since diagonal processing module is defeated R outjj 2At the time of and iterative processing module need to use rjj 2At the time of it is identical, so two modules can execute parallel, thus Improve the speed of QR decomposition.
As shown in figure 3, iterative processing module includes the first shared hardware, the first shared hardware in specific application example A multiple selector and multiplier 1 are contained to multiplier n, multiple selector selects for multiplier 1 to multiplier n different Input is used as multiplier, a of the multiple selector from outsidej3 pThe output signal of vector sum divider receives after input selected Multiplier 1 is outputted results to multiplier n, when enable signal is ' 0 ', multiplier e2 (e2 is more than or equal to 1 and is less than or equal to n) from The signal that multiple selector receives is as a multiplier, from external received aj pThe e2 element of vector multiplies as another It counts, result is output to adder Module after progress multiplication operation, adder Module receives defeated from multiplier 1 to multiplier n Enter signal, carries out being output to divider module after accumulation process, the signal that divider is received from adder Module is as quilt Divisor, the signal r being received externallyjj 2As divisor, the input that multiple selector 1 is output to after division operation is carried out, when making When energy signal is ' 1 ', operation result is output to subtracter e3 (e3 is more than or equal to 1 and is less than or equal to n), subtracter e3 by multiplier e2 Signal is received as subtrahend from multiplier e2, is received externally aj3 pThe e3 element of signal carries out phase as minuend Result is as entire module output signal a after subtracting processingj3 p+1The e3 element of vector.
The jth 3 that above-mentioned iterative processing module is used to calculate next iteration matrix arranges, and it is total to need to use first as shown in the figure Two positions for enjoying hardware module are mutually indepedent, it is possible to save hardware resource by the timesharing technology of sharing of hardware.
As shown in figure 4, triangulation process module includes the second shared hardware, multiple selector 1 in specific application example Input is respectively aj3The n element and a of vectorj3+1N element of vector, when multiple selector enable signal is ' 0 ', multichannel Selector 1 gates aj3The element of vector is output to multiplier 1 to multiplier n, when multiple selector enable signal is ' 1 ', multichannel Selector 1 gates aj3+1The element of vector is output to multiplier 1 and receives to multiplier n, multiplier e4 from multiple selector Data are received externally q as a multiplierj2The e4 element of vector is as another multiplier, after carrying out multiplication operation It is output to adder, adder carries out accumulating operation after receiving signal from multiplier, when multiple selector enable signal is When ' 0 ', output signal r of the accumulator output signal as triangulation process modulej2,j3, when multiple selector enable signal is ' 1 ' When, output signal r of the accumulator output signal as triangulation process modulej2,j3+1
Above-mentioned triangulation process module is located at the element at coordinate [j2, j3] and coordinate [j2, j3+1] for calculating matrix R, The comparison of Fig. 4 and Fig. 2, Fig. 3 it is found that at coordinates computed [j2, j3] time of element value less than the second basic module and the first base This module executes the 50% of time, therefore in the present invention that the hardware resource timesharing of element value at coordinates computed [j2, j3] is multiple With, achieve the purpose that save hardware resource.
The present invention further provides a kind of decomposition methods based on above-mentioned decomposer, use the matrix A of a n × n The circuit of above-mentioned decomposer carries out QR decomposition and needs to walk by n altogether, the specific steps are that:
Step S1: n column vector a of matrix A1,……anAs the input signal of QR decomposing module, a1As first The input of diagonal processing module, the output of diagonal processing module are r11, and q1, the iteration square of iterative processing module calculating next time Battle array, input are a1And aj(1 < j < n+1, j are positive integer) exports as Iterative Matrix a next timej 1.It is each defeated in the first step Out shown in the value of signal such as formula (1);
It can be found that the present invention is with traditional maximum difference of QR decomposition method from step S1, jump ahead of the present invention Iterative Matrix next time is calculated, it is because repeatedly that traditional QR, which is decomposed and why calculated next iteration matrix in second step, Use the output of the first step as a result, the present invention uses the first step by the improvement to conventional method for the calculating needs of matrix Input calculates Iterative Matrix next time, substantially increases QR decomposition rate.
Step S2~Sj step: j is more than or equal to 2 and is less than n, the signal a that -1 step of jth is inputtedj j-2... ..., an j-2And jth- The signal q of 1 step outputj-1, aj j-1... ..., an j-1As the input signal of second step, wherein aj j-1As diagonal processing module Input, for calculating rjjAnd qj, the input signal of kth 3 diagonal processing modules is qj-1, aj3 j-2And aj3+1 j-2(when n-j is surprise When number, j3 is more than or equal to j and is less than or equal to n-1 positive integer, and when n-j is even number, j3 is just whole less than or equal to n more than or equal to j Number), for calculating rJ-1, j3And rJ-1, j3+1, similar with the first step, iterative processing module is used to calculate Iterative Matrix next time, It is a that it, which is inputted,j j-1... ..., an j-1, export as aj+1 j... ..., an j, jth step in respectively export as shown in formula (2);
Step Sn: by the input a of the (n-1)th stepn n-2And (n-1)th step output qn-1And an n-1As input, wherein an n-1 As the input of block1, the output of block1 is rn,nAnd qn, qn-1And an n-2As the input of block3, the output of block3 For rn-1,n, respectively export as shown in formula (3) in the n-th step;
From the foregoing, it will be observed that the carry out QR decomposition for a n × n, the mentioned structure of the present invention only needs n time quantum It completes, and needs 2n-1 time quantum using the triangle systolic array architecture that R.-H.Chang et al. is proposed, such as aforementioned 4 × 4 matrix A, using the present invention carry out QR decomposition, it is only necessary to 4 time quantums can be completed, compare 7, lacked 3 Time quantum.Therefore, the present invention, which proposes the decomposition of the triangle systolic array architecture QR based on advanced iterative, can dramatically speed up QR points The speed of solution.
The above is only the preferred embodiment of the present invention, protection scope of the present invention is not limited merely to above-described embodiment, All technical solutions belonged under thinking of the present invention all belong to the scope of protection of the present invention.It should be pointed out that for the art For those of ordinary skill, several improvements and modifications without departing from the principles of the present invention should be regarded as protection of the invention Range.

Claims (6)

1. a kind of triangle systolic array architecture QR decomposer based on advanced iterative is used to carry out QR points to the matrix A of n × n Solution, which is characterized in that it includes diagonal processing module, iterative processing module and triangulation process module;Wherein, n diagonal processing Module, (n-1)+(n-2)+...+1=n × (n-1)/2 iterative processing module, when n be even number when, using n/2+ (n-2)+ (n-4)+(n-6)+...+2=n2/ 4 triangulation process modules, when n be odd number when, using (n-1)+(n-3)+(n-5)+... + 2=(n+1) (n-1)/4 triangulation process module;First diagonal processing module is received externally first column of matrix A Vector a1, calculated result q1And r11As the output of entire QR decomposing module, and by q1It is output to the triangulation process mould of next step Block calculates the r of generation in calculating process11 2Signal is output to all iterative processing modules in the first step;- 1 iteration of jth Processing module will be externally received j-th of column vector a of matrix Aj, wherein j is more than or equal to 2 and is less than or equal to n-1, and the of matrix A One column vector a1With the r of first diagonal processing module outputjj 2As input, next iteration matrix A is calculated1? J column vector aj 1, wherein a1 1As the input of second diagonal processing module, A1Remaining column vector as second step iteration at Input while managing the input of module as third step triangulation process module;And so on, finally by triangulation process module The output signal r of QR decomposing module is obtained after processingn-1,n
2. the triangle systolic array architecture QR decomposer according to claim 1 based on advanced iterative, which is characterized in that I-th of diagonal processing module is to a exported from the (i-1)-th stepi i-1Signal carries out the output r that QR decomposing module is calculatediiAnd qi, And r is calculatedii 2, wherein qiInput of the vector as next step triangulation process module, rii 2As all iteration in the i-th step The input of processing module, iterative processing module receive column vector a from the (i-1)-th stepi i-1Signal and ai1 i-1Signal, and from diagonal angle Reason module obtains rii 2Signal obtains Iterative Matrix A next time as input after processingiThe i-th 1 column vectors, wherein ai+1 iAs the input of the diagonal processing module of i+1, AiRemaining column vector as next step iteration module input while make For the input of the i-th+2 step triangulation process module, triangulation process module receives input signal q from the (i-1)-th stepi-1While from I-2 step receives ai2 i-2Signal and ai2+1 i-2Signal obtains the output signal r of QR decomposing module after processingi-1,i2Signal and ri-1,i2+1
N-th of diagonal processing module is to a from n-1 step outputn n-1Signal is handled to obtain QR decomposing module output signal rnn And qn, 4 triangulation process modules of kth from n-1 step receive signal qn-1And signal a is received from n-2 stepn n-2, obtained after processing The output signal r of QR decomposing modulen-1,n
3. the triangle systolic array architecture QR decomposer according to claim 1 or 2 based on advanced iterative, feature exist In the diagonal processing module includes multiplier, adder, radical sign operator block and divider, and multiplier e is received from outside To input vector ajE-th of element, wherein e be more than or equal to 1 be less than or equal to n, be output to addition after carrying out involution processing to it Device, adder receive signal from multiplier 1 to multiplier n, are output to the same of radical sign operator block after carrying out accumulation process When as entire module output signal rjj 2, after radical sign operator block receives signal from adder, carry out out flat The divisor that divider 1 arrives divider n to divider n as divider 1 is output to after side's processing, while as the defeated of entire module Signal r outjj, divider e1 is received externally input vector ajThe e1 element as dividend, and will be from radical sign operation The signal that device receives is as divisor, and wherein e1 is more than or equal to 1 and is less than or equal to n, and operation result is as entire module output vector qj2The e1 element.
4. the triangle systolic array architecture QR decomposer according to claim 1 or 2 based on advanced iterative, feature exist In the iterative processing module includes the first shared hardware, and the first shared hardware contains a multiple selector and multiplier 1 To multiplier n, multiple selector is multiplier 1 selects different inputs as multiplier to multiplier n, and multiple selector is from outside Aj3 pThe output signal of vector sum divider, which receives, outputs results to multiplier 1 to multiplier n after input is selected, when When enable signal is ' 0 ', the signal that multiplier e2 is received from multiple selector is as a multiplier, and wherein e2 is more than or equal to 1 Less than or equal to n, from external received aj pThe e2 element of vector carries out result is defeated after multiplication operation as another multiplier Arrive adder Module out, adder Module receives input signal from multiplier 1 to multiplier n, carries out defeated after accumulation process Divider module is arrived out, and the signal that divider is received from adder Module is as dividend, the signal r that is received externallyjj 2 As divisor, the input of multiple selector 1 is output to after progress division operation, when enable signal is ' 1 ', multiplier e2 will be transported It calculates result and is output to subtracter e3, wherein e3 is more than or equal to 1 and is less than or equal to n, and subtracter e3 receives signal from multiplier e2 and makees For subtrahend, it is received externally aj3 pThe e3 element of signal is as minuend, and result is as entire mould after subtract each other processing Block output signal aj3 p+1The e3 element of vector.
5. the triangle systolic array architecture QR decomposer according to claim 4 based on advanced iterative, which is characterized in that The triangulation process module includes the second shared hardware, and the input of multiple selector 1 is respectively aj3The n element and a of vectorj3+1 N element of vector, when multiple selector enable signal is ' 0 ', multiple selector 1 gates aj3The element of vector, which is output to, to be multiplied Musical instruments used in a Buddhist or Taoist mass 1 arrives multiplier n, and when multiple selector enable signal is ' 1 ', multiple selector 1 gates aj3+1The element of vector is output to Multiplier 1 arrives multiplier n, and the data that multiplier e4 is received from multiple selector are received externally q as a multiplierj2 The e4 element of vector is output to adder after carrying out multiplication operation, adder is received from multiplier as another multiplier Accumulating operation is carried out after to signal, when multiple selector enable signal is ' 0 ', accumulator output signal is as triangulation process The output signal r of modulej2,j3, when multiple selector enable signal is ' 1 ', accumulator output signal is as triangulation process module Output signal rj2,j3+1
6. a kind of QR decomposition method based on any one of the claims 1~5 decomposer, which is characterized in that step Are as follows:
Step S1: n column vector a of matrix A1,……anAs the input signal of QR decomposing module, a1It is diagonal as first The input of processing module, the output of diagonal processing module are r11, and q1, the Iterative Matrix of iterative processing module calculating next time, It is a that it, which is inputted,1And aj, wherein 1 < j < n+1, j are positive integer, export as Iterative Matrix a next timej 1
Step S2~Sj step: j is more than or equal to 2 and is less than n, the signal a that -1 step of jth is inputtedj j-2... ..., an j-2And -1 step of jth The signal q of outputj-1, aj j-1... ..., an j-1As the input signal of second step, wherein aj j-1As the defeated of diagonal processing module Enter, for calculating rjjAnd qj, the input signal of kth 3 diagonal processing modules is qj-1, aj3 j-2And aj3+1 j-2;When n-j is odd number When, j3 is more than or equal to j and is less than or equal to n-1 positive integer, and when n-j is even number, j3 is the positive integer for being less than or equal to n more than or equal to j; For calculating rJ-1, j3And rJ-1, j3+1, similar with the first step, iterative processing module is used to calculate Iterative Matrix next time, defeated Enter for aj j-1... ..., an j-1, export as aj+1 j... ..., an j
Step Sn: by the input a of the (n-1)th stepn n-2And (n-1)th step output qn-1And an n-1As input, wherein an n-1As The input of the diagonal processing module, the output of the diagonal processing module are rn,nAnd qn, qn-1And an n-2At the triangle The input of module is managed, the output of the triangulation process module is rn-1,n
CN201610173392.0A 2016-03-24 2016-03-24 Triangle systolic array architecture QR decomposer and decomposition method based on advanced iterative Active CN105846873B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610173392.0A CN105846873B (en) 2016-03-24 2016-03-24 Triangle systolic array architecture QR decomposer and decomposition method based on advanced iterative

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610173392.0A CN105846873B (en) 2016-03-24 2016-03-24 Triangle systolic array architecture QR decomposer and decomposition method based on advanced iterative

Publications (2)

Publication Number Publication Date
CN105846873A CN105846873A (en) 2016-08-10
CN105846873B true CN105846873B (en) 2018-12-18

Family

ID=56583444

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610173392.0A Active CN105846873B (en) 2016-03-24 2016-03-24 Triangle systolic array architecture QR decomposer and decomposition method based on advanced iterative

Country Status (1)

Country Link
CN (1) CN105846873B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113779501A (en) * 2021-08-23 2021-12-10 华控清交信息科技(北京)有限公司 Data processing method and device and data processing device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101170525A (en) * 2006-10-25 2008-04-30 中兴通讯股份有限公司 MLSE simplification detection method and its device based on blocked QR decomposition
CN101674160A (en) * 2009-10-22 2010-03-17 复旦大学 Signal detection method and device for multiple-input-multiple-output wireless communication system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100981121B1 (en) * 2007-12-18 2010-09-10 한국전자통신연구원 Apparatus and Method for Receiving Signal for MIMO System

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101170525A (en) * 2006-10-25 2008-04-30 中兴通讯股份有限公司 MLSE simplification detection method and its device based on blocked QR decomposition
CN101674160A (en) * 2009-10-22 2010-03-17 复旦大学 Signal detection method and device for multiple-input-multiple-output wireless communication system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"用于MIMO-OFDM系统QR分解的分布式脉动阵列处理算法";朱勇旭,吴斌,周玉梅,蔡菁菁,夏凯锋;《电子与信息学报》;20120831;全文 *

Also Published As

Publication number Publication date
CN105846873A (en) 2016-08-10

Similar Documents

Publication Publication Date Title
US10445065B2 (en) Constant depth, near constant depth, and subcubic size threshold circuits for linear algebraic calculations
Wu et al. Implementation of a high throughput soft MIMO detector on GPU
CN110276450A (en) Deep neural network structural sparse system and method based on more granularities
CN109767000A (en) Neural network convolution method and device based on Winograd algorithm
CN110361691B (en) Implementation method of coherent source DOA estimation FPGA based on non-uniform array
CN108733348A (en) The method for merging vector multiplier and carrying out operation using it
CN110163359A (en) A kind of computing device and method
Salmela et al. Complex-valued QR decomposition implementation for MIMO receivers
CN110276447A (en) A kind of computing device and method
CN108733627A (en) A kind of FPGA implementation method that positive definite matrix Cholesky is decomposed
CN105182378A (en) LLL (Lenstra-Lenstra-LovaszLattice) ambiguity decorrelation algorithm
CN107612523A (en) A kind of FIR filter implementation method based on software checking book method
CN108736935B (en) Universal descent search method for large-scale MIMO system signal detection
CN104933080B (en) A kind of method and device of determining abnormal data
Guenther et al. A scalable, multimode SVD precoding ASIC based on the cyclic Jacobi method
CN105846873B (en) Triangle systolic array architecture QR decomposer and decomposition method based on advanced iterative
CN110110285B (en) Parallel Jacobi calculation acceleration implementation method for FPGA
CN111198670B (en) Method, circuit and SOC for executing matrix multiplication operation
CN102567283B (en) Method for small matrix inversion by using GPU (graphic processing unit)
Auger et al. Multiplier-free divide, square root, and log algorithms [DSP tips and tricks]
CN108960420A (en) Processing method and accelerator
CN105847200B (en) Iteration structure QR decomposer and decomposition method based on advanced iterative
CN104360986B (en) A kind of implementation method of parallelization matrix inversion hardware unit
Liu et al. A high speed VLSI implementation of 256-bit scalar point multiplier for ECC over GF (p)
CN105335784B (en) A kind of method of the optimal soft protection of dsp system of selection based on genetic algorithm

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant