CN105846873A - Triangular systolic array structure QR decomposition device based on advanced iteration and decomposition method thereof - Google Patents

Triangular systolic array structure QR decomposition device based on advanced iteration and decomposition method thereof Download PDF

Info

Publication number
CN105846873A
CN105846873A CN201610173392.0A CN201610173392A CN105846873A CN 105846873 A CN105846873 A CN 105846873A CN 201610173392 A CN201610173392 A CN 201610173392A CN 105846873 A CN105846873 A CN 105846873A
Authority
CN
China
Prior art keywords
signal
module
output
multiplier
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610173392.0A
Other languages
Chinese (zh)
Other versions
CN105846873B (en
Inventor
邢座程
刘苍
原略超
唐川
张洋
王庆林
王�锋
汤先拓
危乐
吕朝
董永旺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN201610173392.0A priority Critical patent/CN105846873B/en
Publication of CN105846873A publication Critical patent/CN105846873A/en
Application granted granted Critical
Publication of CN105846873B publication Critical patent/CN105846873B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B7/00Radio transmission systems, i.e. using radiation field
    • H04B7/02Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas
    • H04B7/04Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas
    • H04B7/0413MIMO systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B7/00Radio transmission systems, i.e. using radiation field
    • H04B7/02Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas
    • H04B7/04Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas
    • H04B7/08Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the receiving station
    • H04B7/0837Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the receiving station using pre-detection combining
    • H04B7/0842Weighted combining
    • H04B7/0848Joint weighting
    • H04B7/0854Joint weighting using error minimizing algorithms, e.g. minimum mean squared error [MMSE], "cross-correlation" or matrix inversion

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Complex Calculations (AREA)

Abstract

A triangular systolic array structure QR decomposition device based on advanced iteration and decomposition method thereof are used for performing QR decomposition on an n*n matrix A. The triangular systolic array structure QR decomposition device comprises diagonal processing modules, iteration processing modules and triangular processing modules. A first diagonal processing module receives a first column vector a1 of the matrix A from outside. Results q1 and r11 are used as output of a QR decomposition module. Furthermore the q1 is output to a next triangular processing module. A generated rjj2 signal is output to all iteration processing modules in the first step. A (j-1)th iteration processing module utilizes a j-th column vector aj of the matrix A received from outside, the first column vector a1 of the matrix A and the rjj2 signal output from the first diagonal processing module as input, thereby obtaining a j-th column vector aj1 of a next iteration matrix A1. An so on, the output signal rn-1,n of the QR decomposition module is obtained after processing of the triangular processing modules. The decomposition method is applied based on the decomposition device. The triangular systolic array structure QR decomposition device and the decomposition method thereof have advantages of simple principle, high decomposition speed, high efficiency, etc.

Description

Triangle systolic array architecture QR decomposer based on advanced iterative and decomposition method
Technical field
Present invention relates generally to wireless communication system base band signal process field, refer in particular to a kind of based on advanced iterative three Angle systolic array architecture QR decomposer and decomposition method.
Background technology
OFDM (OFDM, orthogonal frequency division multiplexing) technology is with many Input multi output technology (MIMO, multiple input multiple output) technology because of its have high spectrum utilization and High transfer rate is paid close attention to widely, and a series of progress about precoding technique make based on MIMO-in recent years The multi-user wireless communication system of OFDM technology can realize servicing for multiple users simultaneously.But based on MIMO-OFDM technology Multi-user wireless communication system base band signal process algorithm computation complexity be greatly increased, the design to baseband signal processor Propose unprecedented challenge.
In base band signal process link based on MIMO-OFDM wireless communication system, precoding algorithms and MIMO detection Algorithm is complex two base band signal process algorithms, obtains the extensive concern of researcher in recent years.Nineteen eighty-three, Costa The dirty paper code algorithm proposed in its classic paper " Writing on dirty paper " (" dirty paper code ") is considered as The nonlinear precoding algorithm that performance is best, but its computation complexity is the highest, as a consequence it is hardly possible on hardware circuit in real time Ground performs, and Wei Yu in 2005 et al. is at its paper " Trellis and Convolutional Precoding for Transmitter-Based Interference Presubtraction " (" based on grid and the transmitter of convolution precoding Interference is pre-to be eliminated ") in THP (Tomlinson-Harashima Precoding) algorithm for nonlinear precoding and is obtained Preferable interference eradicating efficacy, although its performance relatively dirty paper code algorithm decreases, but its computation complexity drops significantly Low so that hardware realizes nonlinear precoding algorithm and is possibly realized, the part the highest at THP algorithm complexity is to letter Road matrix H performs the part that QR decomposes, and the most quickly QR Knock-Down Component is favorably improved THP precoding algorithms overall performance. Maximum-likelihood estimation is the algorithm that MIMO detects that in all algorithms, accuracy of detection is the highest, but its computation complexity is suitable Height, therefore, M.Shabany et al. is at " A 0.13 μm CMOS 655Mb/s 4 × 4 64-QAM k-best MIMO Detector " (" when using 64-QAM modulation system under 0.13 μm CMOS technology, 4 × 4MIMO detector of 655Mb/s sets Meter ") in use approximate data spherical detection (SD) algorithm of maximum-likelihood estimation to carry out MIMO detection, achieve well Detection results, QR decomposes one of bottleneck as SD algorithm, governs it and performs speed.
It is widely used in multi-user's baseband signal processor based on MIMO-OFDM technology owing to QR decomposes, And be the bottleneck restricting processing speed in the case of a lot, therefore, in the design of a lot of baseband signal processors, QR is decomposed and make It is that an important arithmetic unit is optimized.So-called QR decomposes, it is simply that the matrix A of n × n is decomposed into unitary matrice Q of n × n With the upper triangular matrix R of n × n, current QR decomposition algorithm is broadly divided into three classes, be based respectively on Householder conversion, Given rotates and MGS (modified Gram-Schmidt) algorithm, owing to the QR converted based on Householder decomposes very Difficult with hardware realization, so using less, although QR decomposition algorithm based on Given rotation greatly reduces used hardware Resource, but its required execution time is longer, does not meets the requirement of communication system real-time, and QR based on MGS algorithm decomposes Because taking, hardware resource is less and the shorter actual demand meeting communication system of execution time.
Practitioner R.-H.Chang et al. is had to publish an article " Iterative QR decomposition architecture using the modified Gram-Schmidt algorithm for MIMO systems” (" Iterative QR Decomposition structure based on MGS algorithm in mimo system ") proposes a kind of triangle systolic arrays based on MGS algorithm Structure QR decomposes hardware circuit, and the QR completing a n (n is the positive integer more than or equal to 2) rank square formation decomposes, the triangle proposed Systolic array architecture QR decomposition circuit only needs 2n-1 time quantum.When specifically applying, R.-H.Chang et al. is used to propose Triangle systolic array architecture QR decomposition circuit the matrix A of 4 × 4 is carried out QR decomposition, for the matrix of 4 × 4, make Carrying out QR decomposition with iteration structure based on MGS algorithm needs seven steps to complete, and each step needs a time quantum, needs altogether Want seven time quantums.As can be seen here, although the QR decomposition method of the triangle systolic array architecture that R.-H.Chang et al. proposes Greatly reduce the calculating time, but the base band signal process of practical communication system intentionally gets the faster QR of speed and decomposes knot Structure.And the document that the QR the most only relating to 4 × 4 matrixes decomposes, there is not the QR of the n × n matrix of announcement to decompose hard Part circuit.
Summary of the invention
The technical problem to be solved in the present invention is that the technical problem existed for prior art, and the present invention provides one The realization simple, easy of kind principle, the triangle systolic array architecture QR decomposer based on advanced iterative that decomposition rate is fast, efficiency is high And decomposition method.
For solve above-mentioned technical problem, the present invention by the following technical solutions:
A kind of triangle systolic array architecture QR decomposer based on advanced iterative, is used for the matrix A of n × n is carried out QR Decomposing, it includes diagonal angle processing module, iterative processing module and triangulation process module;Wherein, n diagonal angle processing module, (n-1) + (n-2)+...+1=n × (n-1)/2 iterative processing module, when n is even number, use n/2+ (n-2)+(n-4)+(n-6) + ...+2=n2/4 triangulation process module, when n is odd number, employing (n-1)+(n-3)+(n-5)+...+2=(n+1) (n-1)/4 triangulation process modules;First diagonal angle processing module is received externally first column vector a of matrix A1, meter Calculate result q1And r11As the output of whole QR decomposing module, and by q1Output, to next step triangulation process module, is calculating During calculate the r of generationjj 2Signal exports all iterative processing module in the first step;-1 iterative processing module of jth will It is externally received jth column vector a of matrix Aj, wherein j is less than or equal to n-1, first column vector a of matrix A more than or equal to 21 R with first diagonal angle processing module outputjj 2As input, it is calculated next iteration matrix A1Jth column vector aj 1, wherein a1 1As the input of second diagonal angle processing module, A1Remaining column vector as second step iterative processing module As the input of the 3rd step triangulation process module while input;By that analogy, obtain finally by after triangulation process resume module Output signal r to QR decomposing modulen-1,。
Further improvement as decomposer of the present invention: i-th diagonal angle processing module is to a from the i-th-1 step outputi i-1 Signal carries out being calculated the output r of QR decomposing moduleiiAnd qi, and it has been calculated rii 2, wherein qiVector as next step three The input of angle processing module, rii 2As the input of iterative processing module all in the i-th step, iterative processing module connects from the i-th-1 step Receive column vector ai i-1Signal and ai1 i-1Signal, and obtain r from diagonal angle processing moduleii 2Signal, as input, obtains after process Iterative Matrix A next timeiThe i-th 1 column vectors, wherein ai+1 iAs the input of i+1 diagonal angle processing module, AiIts Remaining column vector input as next step iteration module while as the input of the i-th+2 step triangulation process module, triangulation process mould Block receives input signal q from the i-th-1 stepi-1While receive a from the i-th-2 stepi2 i-2Signal and ai2+1 i-2Signal, processes it After obtain output signal r of QR decomposing modulei-1,i2Signal and ri-1,i2+1
N-th diagonal angle processing module is to a from n-1 step outputn n-1Signal carries out process and obtains QR decomposing module output letter Number rnnAnd qn, 4 triangulation process modules of kth receive signal q from n-1 stepn-1And receive signal a from n-2 stepn n-2, after process Obtain output signal r of QR decomposing modulen-1,n
Further improvement as decomposer of the present invention: described diagonal angle processing module includes multiplier, adder, radical sign Operator block and divider, multiplier e is received externally input vector ajThe e element, wherein e is more than or equal to 1 little In equal to n, after it is carried out involution process, output is to adder, and adder receives signal from multiplier 1 to multiplier n, enters As output signal r of whole module while after row accumulation process, radical sign operator block is arrived in outputjj 2, radical sign computing Device module is after adder receives signal, and after carrying out extraction of square root process, output arrives divider n as divider to divider 1 The divisor of 1 to divider n, simultaneously as output signal r of whole modulejj, divider e1 is received externally input vector aj The e1 element as dividend, and using the signal that receives arithmetical unit from radical sign as divisor, wherein e1 is little more than or equal to 1 In equal to n, operation result is as whole module output vector qj2The e1 element.
Further improvement as decomposer of the present invention: described iterative processing module includes that first shares hardware, first Shared hardware contains a MUX and multiplier to multiplier n, and MUX is that multiplier 1 selects to multiplier n Selecting different inputs as multiplier, MUX is from outside aj3 pThe output signal of vector sum divider receive input into Row outputs results to multiplier 1 and arrives multiplier n after selecting, when enabling signal and being ' 0 ', multiplier e2 receives from MUX The signal arrived is as a multiplier, and wherein e2 is less than or equal to n more than or equal to 1, from a of external receptionj pThe e2 element of vector As another multiplier, after carrying out multiplication operation, result is exported adder Module, adder Module from multiplier 1 to multiplication Device n receives input signal, and after carrying out accumulation process, output is to divider module, and divider receives from adder Module Signal is as dividend, the signal r being received externallyjj 2As divisor, after carrying out division operation, output is to MUX 1 Input, when enabling signal and be ' 1 ', operation result is exported subtractor e3 by multiplier e2, and wherein e3 is more than or equal to 1 and is less than Receive signal as subtrahend equal to n, subtractor e3 from multiplier e2, be received externally aj3 pThe e3 element of signal is made For minuend, after carrying out subtracting each other process, result is as whole module output signal aj3 p+1The e3 element of vector.
Further improvement as decomposer of the present invention: described triangulation process module includes that second shares hardware, multichannel The input of selector 1 is respectively aj3N element of vector and aj3+1N element of vector, when MUX enables signal be When ' 0 ', MUX 1 gates aj3The element of vector exports multiplier 1 When ' 1 ', MUX 1 gates aj3+1The element of vector exports multiplier 1 and arrives multiplier n, and multiplier e4 is from multi-path choice The data that device receives, as a multiplier, are received externally qj2The e4 element of vector, as another multiplier, is carried out After multiplication operation, output is to adder, and adder carries out accumulating operation after multiplier receives signal, works as MUX When enable signal is ' 0 ', accumulator output signal is as output signal r of triangulation process modulej2,j3, when MUX enables When signal is ' 1 ', accumulator output signal is as output signal r of triangulation process modulej2,j3+1
A kind of QR decomposition method based on above-mentioned decomposer, the steps include:
Step S1: n column vector a of matrix A1,……anAs the input signal of QR decomposing module, a1As first The input of diagonal angle processing module, diagonal angle processing module is output as r11, and q1, iterative processing module calculates iteration square next time Battle array, its input is a1And aj, wherein 1 < j < n+1, j is positive integer, is output as Iterative Matrix a next timej 1
Step S2~Sj step: j are less than n, signal a jth-1 step inputted more than or equal to 2j j-2..., an j-2And jth- The signal q of 1 step outputj-1, aj j-1..., an j-1As the input signal of second step, wherein aj j-1As diagonal angle processing module Input, is used for calculating rjjAnd qj, the input signal of 3 diagonal angle processing modules of kth is qj-1, aj3 j-2And aj3+1 j-2;When n-j is strange During number, j3 is more than or equal to j less than or equal to n-1 positive integer, and when n-j is even number, j3 is to be less than or equal to the most whole of n more than or equal to j Number;For calculating rJ-1, j3And rJ-1, j3+1, similar with the first step, iterative processing module is used for the Iterative Matrix calculated next time, its Input is aj j-1..., an j-1, it is output as aj+1 j..., an j; (2)
Step Sn: by the input a of the (n-1)th stepn n-2And the (n-1)th output q of stepn-1And an n-1As input, wherein an n-1 As the input of block1, block1 is output as rn,nAnd qn, qn-1And an n-2As the input of block3, the output of block3 For rn-1,n
Compared with prior art, it is an advantage of the current invention that: triangle systolic arrays based on the advanced iterative knot of the present invention Structure QR decomposer and decomposition method, principle simply, easily realize, can dramatically speed up the speed that QR decomposes;For a n × n Carry out QR decomposition, the carried structure of the present invention only needs n time quantum to complete, and uses R.-H.Chang et al. to propose Triangle systolic array architecture need 2n-1 time quantum, as aforesaid 4 × 4 matrix A, use the present invention carry out QR Decompose, it is only necessary to 4 time quantums can complete, and compares 7, has lacked 3 time quantums.
Accompanying drawing explanation
Fig. 1 is the topological structure schematic diagram of decomposer of the present invention.
Fig. 2 is present invention structural principle schematic diagram of diagonal angle processing module in concrete application example.
Fig. 3 is present invention structural principle schematic diagram of iterative processing module in concrete application example.
Fig. 4 is the present invention structural principle schematic diagram in concrete application example intermediate cam processing module.
Detailed description of the invention
Below with reference to Figure of description and specific embodiment, the present invention is described in further details.
As it is shown in figure 1, present invention triangle based on advanced iterative systolic array architecture QR decomposer, it is used for n × n's Matrix A carries out QR decomposition, and it includes diagonal angle processing module, iterative processing module and triangulation process module;Wherein, at n diagonal angle Reason module, (n-1)+(n-2)+...+1=n × (n-1)/2 iterative processing module, when n is even number, need n/2+ (n-2) + (n-4)+(n-6)+...+2=n2/4 triangulation process module, when n is odd number, need (n-1)+(n-3)+(n-5) + ...+2=(n+1) (n-1)/4 triangulation process module composition.
First diagonal angle processing module is received externally first column vector a of matrix A1, result of calculation q1And r11Make For the output of whole QR decomposing module, and by q1Output, to next step triangulation process module, calculates during calculating and produces Rjj 2Signal exports all iterative processing module in the first step;Jth-1 (j is less than or equal to n-1 more than or equal to 2) individual iteration Processing module will be externally received jth column vector a of matrix Aj, first column vector a of matrix A1Process with first diagonal angle The r of module outputjj 2As input, it is calculated next iteration matrix A1Jth column vector aj 1, wherein a1 1As second The input of individual diagonal angle processing module, A1Remaining column vector as while the input of second step iterative processing module as the 3rd The input of step triangulation process module.Iterative processing module needs rjj 2During signal, first diagonal angle processing module is by rjj 2Letter Number calculating completes, so diagonal angle processing module and iterative processing module can be with executed in parallel;
I-th diagonal angle processing module is to a from the i-th-1 step outputi i-1Signal carries out being calculated the defeated of QR decomposing module Go out riiAnd qi, and it has been calculated rii 2, wherein qiVector is as the input of next step triangulation process module, rii 2As the i-th step In the input of all iterative processing module, iterative processing module receives column vector a from the i-th-1 stepi i-1Signal and ai1 i-1Signal, And obtain r from diagonal angle processing moduleii 2Signal, as input, obtains Iterative Matrix A next time after processiThe i-th 1 row Vector, wherein ai+1 iAs the input of i+1 diagonal angle processing module, AiRemaining column vector defeated as next step iteration module As the input of the i-th+2 step triangulation process module while entering, triangulation process module receives input signal q from the i-th-1 stepi-1 While receive a from the i-th-2 stepi2 i-2Signal and ai2+1 i-2Signal, obtains the output signal of QR decomposing module after process ri-1,i2Signal and ri-1,i2+1
N-th diagonal angle processing module is to a from n-1 step outputn n-1Signal carries out process and obtains QR decomposing module output letter Number rnnAnd qn, 4 triangulation process modules of kth receive signal q from n-1 stepn-1And receive signal a from n-2 stepn n-2, after process Obtain output signal r of QR decomposing modulen-1,n
As in figure 2 it is shown, in concrete application example, diagonal angle processing module includes multiplier, adder, radical sign mould arithmetical unit Block and divider, multiplier e (e is less than or equal to n more than or equal to 1) is received externally input vector ajThe e element, to it After carrying out involution process, output is to adder, and adder receives signal from multiplier 1 to multiplier n, after carrying out accumulation process Output is to output signal r as whole module while radical sign operator blockjj 2, radical sign operator block is from addition After device receives signal, after carrying out extraction of square root process, output arrives divider n to divider n as divider 1 to divider 1 Divisor, simultaneously as output signal r of whole modulejj, divider e1 (e1 is less than or equal to n more than or equal to 1) is received externally Input vector ajThe e1 element as dividend, and the signal received arithmetical unit from radical sign is tied as divisor, computing Fruit is as whole module output vector qj2The e1 element.
Above-mentioned diagonal angle processing module is for calculating 2 column vectors q of jth of Q matrixj2, the diagonal entry r of R matrixjjWith And square r of diagonal entryjj 2, wherein rjj 2The input of iterative processing module will be used for, owing to diagonal angle processing module is defeated Go out rjj 2Moment and iterative processing module need to use rjj 2Moment identical, so two modules can with executed in parallel, thus Improve the speed that QR decomposes.
As it is shown on figure 3, in concrete application example, iterative processing module includes that first shares hardware, and first shares hardware Containing a MUX and multiplier 1 arrives multiplier n, MUX is that multiplier 1 selects different to multiplier n Inputting as multiplier, MUX is from outside aj3 pThe output signal of vector sum divider receives after input selects Output results to multiplier 1 to multiplier n, when enabling signal and be ' 0 ', multiplier e2 (e2 is less than or equal to n more than or equal to 1) from The signal that MUX receives is as a multiplier, from a of external receptionj pThe e2 element of vector is taken advantage of as another Number, exports adder Module by result after carrying out multiplication operation, adder Module receives defeated from multiplier 1 to multiplier n Entering signal, carry out after accumulation process output to divider module, the signal that divider receives from adder Module is as quilt Divisor, the signal r being received externallyjj 2As divisor, after carrying out division operation, output arrives the input of MUX 1, when making When energy signal is ' 1 ', operation result is exported subtractor e3 (e3 is less than or equal to n more than or equal to 1), subtractor e3 by multiplier e2 Receive signal as subtrahend from multiplier e2, be received externally aj3 pThe e3 element of signal, as minuend, carries out phase After subtracting process, result is as whole module output signal aj3 p+1The e3 element of vector.
Above-mentioned iterative processing module arranges for the jth 3 calculating next iteration matrix, needs to use first altogether shown in figure Two positions enjoying hardware module are separate, it is possible to save hardware resource by the timesharing technology of sharing of hardware.
As shown in Figure 4, in concrete application example, triangulation process module includes that second shares hardware, MUX 1 Input is respectively aj3N element of vector and aj3+1N element of vector, when MUX enable signal is ' 0 ', multichannel Selector 1 gates aj3The element of vector exports multiplier 1 and arrives multiplier n, when MUX enable signal is ' 1 ', and multichannel Selector 1 gates aj3+1The element of vector exports multiplier 1 and receives from MUX to multiplier n, multiplier e4 Data, as a multiplier, are received externally qj2The e4 element of vector is as another multiplier, after carrying out multiplication operation Output is to adder, and adder carries out accumulating operation after multiplier receives signal, when MUX enables signal is When ' 0 ', accumulator output signal is as output signal r of triangulation process modulej2,j3, it is ' 1 ' when MUX enables signal Time, accumulator output signal is as output signal r of triangulation process modulej2,j3+1
Above-mentioned triangulation process module is used for calculating matrix R and is positioned at coordinate [j2, j3] and the element at coordinate [j2, j3+1] place, Fig. 4 Yu Fig. 2, the contrast of Fig. 3 understand, and the time of coordinates computed [j2, j3] place element value is less than the second basic module and the first base This module performs the 50% of time, in the present invention that the hardware resource timesharing of coordinates computed [j2, j3] place element value is the most multiple With, reach to save the purpose of hardware resource.
The present invention further provides a kind of decomposition method based on above-mentioned decomposer, the matrix A of a n × n is used The circuit of above-mentioned decomposer carries out QR decomposition to be needed to walk through n altogether, and it concretely comprises the following steps:
Step S1: n column vector a of matrix A1,……anAs the input signal of QR decomposing module, a1As first The input of diagonal angle processing module, diagonal angle processing module is output as r11, and q1, iterative processing module calculates iteration square next time Battle array, its input is a1And aj(1 < j < n+1, j is positive integer), is output as Iterative Matrix a next timej 1.Each defeated in the first step Go out shown in the value such as formula (1) of signal;
r 11 = ( a 11 ) 2 + ( a 21 ) 2 ...... + ( a n 1 ) 2 q 11 = a 11 r 11 , ...... , q n 1 = a n 1 r 11 a 1 1 = 0 a 2 1 = a 2 - r 12 q 1 = a 2 - q 1 T a 2 q 1 = a 2 - a 1 T a 2 a 1 r 11 2 ...... a n 1 = a 3 - a 1 T a n a 1 r 11 2 - - - ( 1 )
From step S1 it appeared that maximum different of the present invention and traditional QR decomposition method being, jump ahead of the present invention Having calculated Iterative Matrix next time, why traditional QR decomposition calculates next iteration matrix at second step is because repeatedly The needs that calculate for matrix use the output result of the first step, and the present invention uses the first step by improving traditional method Input calculates Iterative Matrix next time, substantially increases QR decomposition rate.
Step S2~Sj step: j are less than n, signal a jth-1 step inputted more than or equal to 2j j-2..., an j-2And jth- The signal q of 1 step outputj-1, aj j-1..., an j-1As the input signal of second step, wherein aj j-1As diagonal angle processing module Input, is used for calculating rjjAnd qj, the input signal of 3 diagonal angle processing modules of kth is qj-1, aj3 j-2And aj3+1 j-2(when n-j is strange During number, j3 is more than or equal to j less than or equal to n-1 positive integer, and when n-j is even number, j3 is to be less than or equal to the most whole of n more than or equal to j Number), it is used for calculating rJ-1, j3And rJ-1, j3+1, similar with the first step, iterative processing module is used for the Iterative Matrix calculated next time, Its input is aj j-1..., an j-1, it is output as aj+1 j..., an j, jth step respectively exports as shown in formula (2);
r j , j = ( a 1 , j j - 1 ) 2 + ( a 2 , j j - 1 ) 2 + ...... + ( a n , j j - 1 ) 2 q 1 , j = a 1 , j j - 1 r j , j , ..... , q n , j = a n , j j - 1 r j , j r j - 1 , j 3 = q j - 1 T a j 3 j - 2 = q 1 , j - 1 a 1 , j 3 + ...... + q n , j - 1 a n , j 3 , j &le; j 3 &le; n ; j 3 &Element; N a i 4 j = a i 4 j - 1 - ( a j j - 1 ) T a i 4 j - 1 a j j - 1 r j j 2 , j &le; i 4 &le; n ; i 4 &Element; N
Step Sn: by the input a of the (n-1)th stepn n-2And the (n-1)th output q of stepn-1And an n-1As input, wherein an n-1 As the input of block1, block1 is output as rn,nAnd qn, qn-1And an n-2As the input of block3, the output of block3 For rn-1,n, the n-th step respectively exports as shown in formula (3);
r n , n = ( a 1 , n n - 1 ) 2 + ( a 2 , n n - 1 ) 2 + ...... + ( a n , n n - 1 ) 2 q 1 , n = a 1 , n n - 1 r n , n , ..... , q n , n = a n , n n - 1 r n , n r n - 1 , n = q n - 1 T a n n - 2 = q 1 , n - 1 a 1 , n n - 2 + ...... + q n , n - 1 a n , n n - 2 - - - ( 3 )
From the foregoing, it will be observed that carry out QR decomposition for a n × n, the carried structure of the present invention only needs n time quantum Complete, and the triangle systolic array architecture using R.-H.Chang et al. to propose needs 2n-1 time quantum, as aforementioned 4 × 4 matrix A, use the present invention carry out QR decomposition, it is only necessary to 4 time quantums can complete, and compares 7, has lacked 3 Time quantum.Therefore, the present invention is carried triangle systolic array architecture QR based on advanced iterative and decomposes and can dramatically speed up QR and divide The speed solved.
Below being only the preferred embodiment of the present invention, protection scope of the present invention is not limited merely to above-described embodiment, All technical schemes belonged under thinking of the present invention belong to protection scope of the present invention.It should be pointed out that, for the art For those of ordinary skill, some improvements and modifications without departing from the principles of the present invention, should be regarded as the protection of the present invention Scope.

Claims (6)

1. a triangle systolic array architecture QR decomposer based on advanced iterative, is used for that the matrix A of n × n is carried out QR and divides Solve, it is characterised in that it includes diagonal angle processing module, iterative processing module and triangulation process module;Wherein, n diagonal angle processes Module, (n-1)+(n-2)+...+1=n × (n-1)/2 iterative processing module, when n is even number, employing n/2+ (n-2)+ (n-4)+(n-6)+...+2=n2/ 4 triangulation process modules, when n is odd number, employing (n-1)+(n-3)+(n-5)+... + 2=(n+1) (n-1)/4 triangulation process module;First diagonal angle processing module is received externally first row of matrix A Vector a1, result of calculation q1And r11As the output of whole QR decomposing module, and by q1Output is to next step triangulation process mould Block, calculates the r of generation during calculatingjj 2Signal exports all iterative processing module in the first step;-1 iteration of jth Processing module will be externally received jth column vector a of matrix Aj, wherein j more than or equal to 2 less than or equal to n-1, the of matrix A One column vector a1R with first diagonal angle processing module outputjj 2As input, it is calculated next iteration matrix A 1 Jth column vector aj 1, wherein a1 1As the input of second diagonal angle processing module, A1Remaining column vector as second step iteration As the input of the 3rd step triangulation process module while the input of processing module;By that analogy, finally by triangulation process mould Block obtains output signal r of QR decomposing module after processingn-1,n
Triangle systolic array architecture QR decomposer based on advanced iterative the most according to claim 1, it is characterised in that I-th diagonal angle processing module is to a from the i-th-1 step outputi i-1Signal carries out being calculated the output r of QR decomposing moduleiiAnd qi, And it has been calculated rii 2, wherein qiVector is as the input of next step triangulation process module, rii 2As iteration all in the i-th step The input of processing module, iterative processing module receives column vector a from the i-th-1 stepi i-1Signal and ai1 i-1Signal, and at diagonal angle Reason module obtains rii 2Signal, as input, obtains Iterative Matrix A next time after processiThe i-th 1 column vectors, wherein ai+1 iAs the input of i+1 diagonal angle processing module, AiRemaining column vector as next step iteration module input while make Being the input of the i-th+2 step triangulation process module, triangulation process module receives input signal q from the i-th-1 stepi-1While from I-2 step receives ai2 i-2Signal and ai2+1 i-2Signal, obtains output signal r of QR decomposing module after processi-1,i2Signal and ri-1,i2+1
N-th diagonal angle processing module is to a from n-1 step outputn n-1Signal carries out process and obtains QR decomposing module output signal rnn And qn, 4 triangulation process modules of kth receive signal q from n-1 stepn-1And receive signal a from n-2 stepn n-2, obtain after process Output signal r of QR decomposing modulen-1,n
Triangle systolic array architecture QR decomposer based on advanced iterative the most according to claim 1 and 2, its feature exists In, described diagonal angle processing module includes multiplier, adder, radical sign operator block and divider, and multiplier e is from external reception To input vector ajThe e element, wherein e is more than or equal to 1 less than or equal to n, and after it is carried out involution process, output is to addition Device, adder receives signal from multiplier 1 to multiplier n, and after carrying out accumulation process, the same of radical sign operator block is arrived in output Time as output signal r of whole modulejj 2, radical sign operator block, after adder receives signal, carries out out flat Side process after output to divider 1 to divider n as the divisor of divider 1 to divider n, defeated simultaneously as whole module Go out signal rjj, divider e1 is received externally input vector ajThe e1 element as dividend, and will be from radical sign computing The signal that device receives is as divisor, and wherein e1 is more than or equal to 1 less than or equal to n, and operation result is as whole module output vector qj2The e1 element.
Triangle systolic array architecture QR decomposer based on advanced iterative the most according to claim 1 and 2, its feature exists In, described iterative processing module includes that first shares hardware, and first shares hardware contains a MUX and multiplier To multiplier n, MUX is that multiplier 1 selects different inputs as multiplier to multiplier n, and MUX is from outside Aj3 pThe output signal of vector sum divider receives and outputs results to multiplier 1 after input selects to multiplier n, when Enabling signal when be ' 0 ', the signal that multiplier e2 receives from MUX is as a multiplier, and wherein e2 is more than or equal to 1 Less than or equal to n, from a of external receptionj pThe e2 element of vector is as another multiplier, after carrying out multiplication operation, result is defeated Going out to adder Module, adder Module receives input signal from multiplier 1 to multiplier n, defeated after carrying out accumulation process Going out to divider module, the signal that divider receives from adder Module is as dividend, the signal r being received externallyjj 2 As divisor, after carrying out division operation, the input of MUX 1 is arrived in output, and when enabling signal and being ' 1 ', multiplier e2 will transport Calculating result and export subtractor e3, wherein e3 receives signal work less than or equal to n, subtractor e3 from multiplier e2 more than or equal to 1 For subtrahend, it is received externally aj3 pThe e3 element of signal is as minuend, and after carrying out subtracting each other process, result is as whole mould Block output signal aj3 p+1The e3 element of vector.
Triangle systolic array architecture QR decomposer based on advanced iterative the most according to claim 4, it is characterised in that Described triangulation process module includes that second shares hardware, and the input of MUX 1 is respectively aj3N element of vector and aj3+1 N element of vector, when MUX enable signal is ' 0 ', MUX 1 gates aj3The element of vector exports to be taken advantage of Musical instruments used in a Buddhist or Taoist mass 1 is to multiplier n, and when MUX enable signal is ' 1 ', MUX 1 gates aj3+1The element of vector exports The data that multiplier 1 receives to multiplier n, multiplier e4 from MUX, as a multiplier, are received externally qj2 The e4 element of vector is as another multiplier, and after carrying out multiplication operation, output receives to adder, adder from multiplier Carrying out accumulating operation after signal, when MUX enable signal is ' 0 ', accumulator output signal is as triangulation process Output signal r of modulej2,j3, when MUX enable signal is ' 1 ', accumulator output signal is as triangulation process module Output signal rj2,j3+1
6. one kind based on the QR decomposition method of any one decomposer in the claims 1~5, it is characterised in that step For:
Step S1: n column vector a of matrix A1,……anAs the input signal of QR decomposing module, a1As first diagonal angle The input of processing module, diagonal angle processing module is output as r11, and q1, iterative processing module calculates Iterative Matrix next time, Its input is a1And aj, wherein 1 < j < n+1, j is positive integer, is output as Iterative Matrix a next timej 1
Step S2~Sj step: j are less than n, signal a jth-1 step inputted more than or equal to 2j j-2..., an j-2And jth-1 step The signal q of outputj-1, aj j-1..., an j-1As the input signal of second step, wherein aj j-1Defeated as diagonal angle processing module Enter, be used for calculating rjjAnd qj, the input signal of 3 diagonal angle processing modules of kth is qj-1, aj3 j-2And aj3+1 j-2;When n-j is odd number Time, j3 is more than or equal to j less than or equal to n-1 positive integer, and when n-j is even number, j3 is the positive integer being less than or equal to n more than or equal to j; For calculating rJ-1, j3And rJ-1, j3+1, similar with the first step, iterative processing module is used for the Iterative Matrix calculated next time, and it is defeated Enter for aj j-1..., an j-1, it is output as aj+1 j..., an j
Step Sn: by the input a of the (n-1)th stepn n-2And the (n-1)th output q of stepn-1And an n-1As input, wherein an n-1As The input of block1, block1 is output as rn,nAnd qn, qn-1And an n-2As the input of block3, block3 is output as rn-1,n
CN201610173392.0A 2016-03-24 2016-03-24 Triangle systolic array architecture QR decomposer and decomposition method based on advanced iterative Active CN105846873B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610173392.0A CN105846873B (en) 2016-03-24 2016-03-24 Triangle systolic array architecture QR decomposer and decomposition method based on advanced iterative

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610173392.0A CN105846873B (en) 2016-03-24 2016-03-24 Triangle systolic array architecture QR decomposer and decomposition method based on advanced iterative

Publications (2)

Publication Number Publication Date
CN105846873A true CN105846873A (en) 2016-08-10
CN105846873B CN105846873B (en) 2018-12-18

Family

ID=56583444

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610173392.0A Active CN105846873B (en) 2016-03-24 2016-03-24 Triangle systolic array architecture QR decomposer and decomposition method based on advanced iterative

Country Status (1)

Country Link
CN (1) CN105846873B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113779501A (en) * 2021-08-23 2021-12-10 华控清交信息科技(北京)有限公司 Data processing method and device and data processing device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101170525A (en) * 2006-10-25 2008-04-30 中兴通讯股份有限公司 MLSE simplification detection method and its device based on blocked QR decomposition
US20090154608A1 (en) * 2007-12-18 2009-06-18 Electronics And Telecommunications Research Institute Receiving apparatus and method for mimo system
CN101674160A (en) * 2009-10-22 2010-03-17 复旦大学 Signal detection method and device for multiple-input-multiple-output wireless communication system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101170525A (en) * 2006-10-25 2008-04-30 中兴通讯股份有限公司 MLSE simplification detection method and its device based on blocked QR decomposition
US20090154608A1 (en) * 2007-12-18 2009-06-18 Electronics And Telecommunications Research Institute Receiving apparatus and method for mimo system
CN101674160A (en) * 2009-10-22 2010-03-17 复旦大学 Signal detection method and device for multiple-input-multiple-output wireless communication system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
朱勇旭,吴斌,周玉梅,蔡菁菁,夏凯锋: ""用于MIMO-OFDM系统QR分解的分布式脉动阵列处理算法"", 《电子与信息学报》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113779501A (en) * 2021-08-23 2021-12-10 华控清交信息科技(北京)有限公司 Data processing method and device and data processing device
CN113779501B (en) * 2021-08-23 2024-06-04 华控清交信息科技(北京)有限公司 Data processing method and device for data processing

Also Published As

Publication number Publication date
CN105846873B (en) 2018-12-18

Similar Documents

Publication Publication Date Title
Yepez et al. Stride 2 1-D, 2-D, and 3-D Winograd for convolutional neural networks
CN107704916B (en) Hardware accelerator and method for realizing RNN neural network based on FPGA
CN110263925B (en) Hardware acceleration implementation device for convolutional neural network forward prediction based on FPGA
CN108733348B (en) Fused vector multiplier and method for performing operation using the same
CN101504638B (en) Point-variable assembly line FFT processor
CN106445471A (en) Processor and method for executing matrix multiplication on processor
CN110543939B (en) Hardware acceleration realization device for convolutional neural network backward training based on FPGA
CN111966324B (en) Implementation method and device for multi-elliptic curve scalar multiplier and storage medium
CN110276447A (en) A kind of computing device and method
CN108170640A (en) The method of its progress operation of neural network computing device and application
CN102521211B (en) Parallel device for solving linear equation set on finite field
Liu et al. WinoCNN: Kernel sharing Winograd systolic array for efficient convolutional neural network acceleration on FPGAs
CN110163350A (en) A kind of computing device and method
CN110069444A (en) A kind of computing unit, array, module, hardware system and implementation method
CN104360986B (en) A kind of implementation method of parallelization matrix inversion hardware unit
Guenther et al. A scalable, multimode SVD precoding ASIC based on the cyclic Jacobi method
CN110110285B (en) Parallel Jacobi calculation acceleration implementation method for FPGA
CN209708122U (en) A kind of computing unit, array, module, hardware system
CN112799634B (en) Based on base 2 2 MDC NTT structured high performance loop polynomial multiplier
CN107992283A (en) A kind of method and apparatus that finite field multiplier is realized based on dimensionality reduction
CN105846873A (en) Triangular systolic array structure QR decomposition device based on advanced iteration and decomposition method thereof
CN117692126A (en) Paillier homomorphic encryption method and system based on low-complexity modular multiplication algorithm
CN102624653B (en) Extensible QR decomposition method based on pipeline working mode
Hu et al. Efficient homomorphic convolution designs on FPGA for secure inference
CN111401533A (en) Special calculation array for neural network and calculation method thereof

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant