CN105846873B

CN105846873B - Triangle systolic array architecture QR decomposer and decomposition method based on advanced iterative

Info

Publication number: CN105846873B
Application number: CN201610173392.0A
Authority: CN
Inventors: 邢座程; 刘苍; 原略超; 唐川; 张洋; 王庆林; 王�锋; 汤先拓; 危乐; 吕朝; 董永旺
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2016-03-24
Filing date: 2016-03-24
Publication date: 2018-12-18
Anticipated expiration: 2036-03-24
Also published as: CN105846873A

Abstract

A kind of triangle systolic array architecture QR decomposer and decomposition method based on advanced iterative is used to carry out QR decomposition to the matrix A of n × n, it includes diagonal processing module, iterative processing module and triangulation process module；First diagonal processing module is received externally first column vector a of matrix A₁, as a result q₁And r₁₁As the output of QR decomposing module, and by q₁It is output to the triangulation process module of next step, the r of generation_jj ²Signal is output to all iterative processing modules in the first step；- 1 iterative processing module of jth will be externally received j-th of column vector a of matrix A_j, first column vector a of matrix A₁With the r of first diagonal processing module output_jj ²As input, next iteration matrix A is obtained¹J-th of column vector a_j ¹；And so on, finally by the output signal r for obtaining QR decomposing module after triangulation process resume module_n‑1,n.Decomposition method is implemented based on above-mentioned decomposer.The present invention has many advantages, such as that principle is simple, decomposition rate is fast, high-efficient.

Description

Triangle systolic array architecture QR decomposer and decomposition method based on advanced iterative

Technical field

Present invention relates generally to wireless communication system base band signal process fields, refer in particular to a kind of three based on advanced iterative Angle systolic array architecture QR decomposer and decomposition method.

Background technique

Orthogonal frequency division multiplexing (OFDM, orthogonal frequency division multiplexing) technology and more Input multi output technology (MIMO, multiple input multiple output) technology because of it with high spectrum utilization and High transfer rate receives extensive attention, and makes in recent years about a series of progress of precoding technique based on MIMO- The multi-user wireless communication system of OFDM technology may be implemented simultaneously as multiple user services.However it is based on MIMO-OFDM technology Multi-user wireless communication system base band signal process algorithm computation complexity greatly increase, the design to baseband signal processor Propose unprecedented challenge.

In the base band signal process link based on MIMO-OFDM wireless communication system, precoding algorithms and MIMO detection Algorithm is more complicated two base band signal process algorithms, obtains the extensive concern of researcher in recent years.Nineteen eighty-three, Costa The dirty paper code algorithm of proposition is considered as in its classic paper " Writing on dirty paper " (" dirty paper code ") The best nonlinear precoding algorithm of performance, but its computation complexity is especially high, as a consequence it is hardly possible on hardware circuit in real time Ground executes, and Wei Yu in 2005 et al. is in its paper " Trellis and Convolutional Precoding for Transmitter-Based the Interference Presubtraction " (" transmitter based on grid and convolution precoding Interference is pre- to be eliminated ") in THP (Tomlinson-Harashima Precoding) algorithm for nonlinear precoding and is obtained Preferable interference eradicating efficacy, although its performance decreases compared with dirty paper code algorithm, its computation complexity drops significantly It is low, make it possible hardware realization nonlinear precoding algorithm, the highest part of computation complexity is to letter in THP algorithm Road matrix H executes the part that QR is decomposed, and efficiently quickly QR Knock-Down Component helps to improve THP precoding algorithms overall performance. Maximum- likelihood estimation is that MIMO detects the highest algorithm of detection accuracy in all algorithms, however its computation complexity is suitable Height, therefore, M.Shabany et al. is in " 0.13 μm of CMOS 655Mb/s of A, 4 × 4 64-QAM k-best MIMO Detector " (" uses 4 × 4MIMO detector of 655Mb/s when 64-QAM modulation system to set under 0.13 μm of CMOS technology Meter ") in using maximum- likelihood estimation approximate algorithm spherical shape detection (SD) algorithm carry out MIMO detection, achieve well Detection effect, QR decompose one of the bottleneck as SD algorithm, restrict it and execute speed.

It is widely used in multi-user's baseband signal processor based on MIMO-OFDM technology since QR is decomposed, And be the bottleneck for restricting processing speed in many cases, therefore, QR is decomposed in the design of many baseband signal processors and is made It is optimized for an important arithmetic unit.So-called QR is decomposed, and the matrix A of n × n is exactly decomposed into the unitary matrice Q of n × n With the upper triangular matrix R of n × n, current QR decomposition algorithm is broadly divided into three classes, be based respectively on Householder transformation, Given rotation and MGS (modified Gram-Schmidt) algorithm, since the QR converted based on Householder is decomposed very Difficult hardware realization, so using less although the QR decomposition algorithm based on Given rotation greatly reduces used hardware Resource, but the execution time needed for it is longer, does not meet the requirement of communication system real-time, the QR based on MGS algorithm is decomposed Because occupancy hardware resource is less and executes the time shorter actual demand for meeting communication system.

There is practitioner R.-H.Chang et al. to publish an article " Iterative QR decomposition architecture using the modified Gram-Schmidt algorithm for MIMO systems” (" based on the Iterative QR Decomposition structure of MGS algorithm in mimo system ") proposes a kind of triangle systolic arrays based on MGS algorithm Structure QR decomposes hardware circuit, and the QR for completing a n (n is the positive integer more than or equal to 2) rank square matrix is decomposed, the triangle proposed Systolic array architecture QR decomposition circuit only needs 2n-1 time quantum.In specific application, it is proposed using R.-H.Chang et al. Triangle systolic array architecture QR decomposition circuit QR decomposition is carried out to one 4 × 4 matrix A one 4 × 4 matrix is made Carrying out QR decomposition with the iteration structure based on MGS algorithm needs seven steps can be completed, and each step needs a time quantum, needs altogether Want seven time quantums.Although it can be seen that the QR decomposition method for the triangle systolic array architecture that R.-H.Chang et al. is proposed It greatly reduces and calculates the time, but intentionally get the faster QR of speed in the base band signal process of practical communication system and decompose knot Structure.And the document that the QR for also only only relating to 4 × 4 matrixes at present is decomposed, there is not n × n matrix QR of announcement to decompose hard Part circuit.

Summary of the invention

The technical problem to be solved in the present invention is that, for technical problem of the existing technology, the present invention provides one Kind principle is simple, easily realize, decomposition rate is fast, the high-efficient triangle systolic array architecture QR decomposer based on advanced iterative And decomposition method.

In order to solve the above technical problems, the invention adopts the following technical scheme:

A kind of triangle systolic array architecture QR decomposer based on advanced iterative is used to carry out QR to the matrix A of n × n It decomposes, it includes diagonal processing module, iterative processing module and triangulation process module；Wherein, n diagonal processing modules, (n-1) + (n-2)+...+1=n × (n-1)/2 iterative processing module, when n is even number, using n/2+ (n-2)+(n-4)+(n-6) + ...+2=n2/4 triangulation process modules, when n is odd number, using (n-1)+(n-3)+(n-5)+...+2=(n+1) (n-1)/4 triangulation process module；First diagonal processing module is received externally first column vector a of matrix A₁, meter Calculate result q₁And r₁₁As the output of entire QR decomposing module, and by q₁It is output to the triangulation process module of next step, is being calculated The r of generation is calculated in the process_jj ²Signal is output to all iterative processing modules in the first step；- 1 iterative processing module of jth will It is externally received j-th of column vector a of matrix A_j, wherein j is more than or equal to 2 and is less than or equal to n-1, first column vector a of matrix A₁ With the r of first diagonal processing module output_jj ²As input, next iteration matrix A is calculated¹J-th of column vector a_j ¹, wherein a₁ ¹As the input of second diagonal processing module, A¹Remaining column vector as second step iterative processing module Input while input as third step triangulation process module；And so on, it is obtained finally by after triangulation process resume module To the output signal r of QR decomposing module_n-₁,。

Further improvement as decomposer of the present invention: i-th of diagonal processing module is to a exported from the (i-1)-th step_i ^i-1 Signal carries out the output r that QR decomposing module is calculated_iiAnd q_i, and r has been calculated_ii ², wherein q_iVector is as in next step three The input of angle processing module, r_ii ²As the input of all iterative processing modules in the i-th step, iterative processing module connects from the (i-1)-th step Receive column vector a_i ^i-1Signal and a_i1 ^i-1Signal, and r is obtained from diagonal processing module_ii ²Signal obtains after processing as input Iterative Matrix A next timeⁱThe i-th 1 column vectors, wherein a_i+1 ⁱAs the input of the diagonal processing module of i+1, AⁱIts Input while remaining column vector is inputted as next step iteration module as the i-th+2 step triangulation process module, triangulation process mould Block receives input signal q from the (i-1)-th step_i-1While from the i-th -2 step receive a_i2 ^i-2Signal and a_i2+1 ^i-2Signal handles it The output signal r of QR decomposing module is obtained afterwards_i-1,i2Signal and r_i-1,i2+1；

N-th of diagonal processing module is to a from n-1 step output_n ^n-1Signal is handled to obtain QR decomposing module output letter Number r_nnAnd q_n, 4 triangulation process modules of kth from n-1 step receive signal q_n-1And signal a is received from n-2 step_n ^n-2, after processing Obtain the output signal r of QR decomposing module_n-1,n。

Further improvement as decomposer of the present invention: the diagonal processing module includes multiplier, adder, radical sign Operator block and divider, multiplier e are received externally input vector a_jE-th of element, wherein e is more than or equal to 1 small In being equal to n, it is output to adder after involution processing is carried out to it, adder receives signal from multiplier 1 to multiplier n, into As the output signal r of entire module while being output to radical sign operator block after row accumulation process_jj ², radical sign operation After device module receives signal from adder, divider 1 is output to divider n as divider after carrying out extraction of square root processing 1 arrives the divisor of divider n, while the output signal r as entire module_jj, divider e1 is received externally input vector a_j The e1 element as dividend, and using the signal received from radical sign arithmetic unit as divisor, wherein it is small to be more than or equal to 1 by e1 In being equal to n, operation result is as entire module output vector q_j2The e1 element.

Further improvement as decomposer of the present invention: the iterative processing module include the first shared hardware, first Shared hardware contains a multiple selector and multiplier to multiplier n, and multiple selector is that multiplier 1 is selected to multiplier n Different inputs is selected as multiplier, a of the multiple selector from outside_j3 ^pThe output signal of vector sum divider receive input into Multiplier 1 is outputted results to after row selection to multiplier n, when enable signal is ' 0 ', multiplier e2 is received from multiple selector The signal arrived is as a multiplier, and wherein e2 is more than or equal to 1 and is less than or equal to n, from external received a_j ^pThe e2 element of vector As another multiplier, result is output to adder Module after progress multiplication operation, adder Module is from multiplier 1 to multiplication Device n receives input signal, carries out being output to divider module after accumulation process, divider is received from adder Module Signal is as dividend, the signal r that is received externally_jj ²As divisor, multiple selector 1 is output to after carrying out division operation Input, when enable signal is ' 1 ', operation result is output to subtracter e3 by multiplier e2, and wherein e3 is more than or equal to and 1 is less than Equal to n, subtracter e3 receives signal as subtrahend from multiplier e2, is received externally a_j3 ^pThe e3 element of signal is made For minuend, result is carried out after subtracting each other processing as entire module output signal a_j3 ^p+1The e3 element of vector.

Further improvement as decomposer of the present invention: the triangulation process module includes the second shared hardware, multichannel The input of selector 1 is respectively a_j3The n element and a of vector_j3+1N element of vector, when multiple selector enable signal is When ' 0 ', multiple selector 1 gates a_j3The element of vector is output to multiplier 1 to multiplier n, and multiple selector enable signal is When ' 1 ', multiple selector 1 gates a_j3+1The element of vector is output to multiplier 1 to multiplier n, and multiplier e4 is from multi-path choice The data that device receives are received externally q as a multiplier_j2The e4 element of vector is carried out as another multiplier Adder is output to after multiplication operation, adder carries out accumulating operation after receiving signal from multiplier, works as multiple selector When enable signal is ' 0 ', output signal r of the accumulator output signal as triangulation process module_j2,j3, when multiple selector is enabled When signal is ' 1 ', output signal r of the accumulator output signal as triangulation process module_j2,j3+1。

A kind of QR decomposition method based on above-mentioned decomposer, the steps include:

Step S1: n column vector a of matrix A₁,……a_nAs the input signal of QR decomposing module, a₁As first The input of diagonal processing module, the output of diagonal processing module are r₁₁, and q₁, the iteration square of iterative processing module calculating next time Battle array, input are a₁And a_j, wherein 1 < j < n+1, j are positive integer, export as Iterative Matrix a next time_j ¹；

Step S2~Sj step: j is more than or equal to 2 and is less than n, the signal a that -1 step of jth is inputted_j ^j-2... ..., a_n ^j-2And jth- The signal q of 1 step output_j-1, a_j ^j-1... ..., a_n ^j-1As the input signal of second step, wherein a_j ^j-1As diagonal processing module Input, for calculating r_jjAnd q_j, the input signal of kth 3 diagonal processing modules is q_j-1, a_j3 ^j-2And a_j3+1 ^j-2；When n-j is surprise When number, j3 is more than or equal to j and is less than or equal to n-1 positive integer, and when n-j is even number, j3 is just whole less than or equal to n more than or equal to j Number；For calculating r_{J-1, j3}And r_{J-1, j3+1}, similar with the first step, iterative processing module is used to calculate Iterative Matrix next time, Input is a_j ^j-1... ..., a_n ^j-1, export as a_j+1 ^j... ..., a_n ^j； (2)

Step Sn: by the input a of the (n-1)th step_n ^n-2And (n-1)th step output q_n-1And a_n ^n-1As input, wherein a_n ^n-1 As the input of block1, the output of block1 is r_n,nAnd q_n, q_n-1And a_n ^n-2As the input of block3, the output of block3 For r_n-1,n。

Compared with the prior art, the advantages of the present invention are as follows: the triangle systolic arrays knot of the invention based on advanced iterative Structure QR decomposer and decomposition method, principle is simple, easily realizes, can dramatically speed up the speed of QR decomposition；For a n × n Carry out QR decomposition, the mentioned structure of the present invention only needs n time quantum can be completed, and use R.-H.Chang et al. proposition Triangle systolic array architecture need 2n-1 time quantum, such as matrix A for above-mentioned 4 × 4, using the present invention carry out QR It decomposes, it is only necessary to which 4 time quantums can be completed, and compares 7, has lacked 3 time quantums.

Detailed description of the invention

Fig. 1 is the topological structure schematic diagram of decomposer of the present invention.

Fig. 2 is the principle schematic diagram of present invention processing module diagonal in specific application example.

Fig. 3 is the principle schematic diagram of present invention iterative processing module in specific application example.

Fig. 4 is principle schematic diagram of the present invention in specific application example intermediate cam processing module.

Specific embodiment

The present invention is described in further details below with reference to Figure of description and specific embodiment.

As shown in Figure 1, being used to the present invention is based on the triangle systolic array architecture QR decomposer of advanced iterative to n × n's Matrix A carries out QR decomposition, it includes diagonal processing module, iterative processing module and triangulation process module；Wherein, n diagonal angle Reason module, (n-1)+(n-2)+...+1=n × (n-1)/2 iterative processing module needs n/2+ (n-2) when n is even number + (n-4)+(n-6)+...+2=n2/4 triangulation process modules need (n-1)+(n-3)+(n-5) when n is odd number + ...+2=(n+1) (n-1)/4 triangulation process module composition.

First diagonal processing module is received externally first column vector a of matrix A₁, calculated result q₁And r₁₁Make For the output of entire QR decomposing module, and by q₁It is output to the triangulation process module of next step, calculates and generates in calculating process R_jj ²Signal is output to all iterative processing modules in the first step；Jth -1 (j is more than or equal to 2 and is less than or equal to n-1) a iteration Processing module will be externally received j-th of column vector a of matrix A_j, first column vector a of matrix A₁With first diagonal processing The r of module output_jj ²As input, next iteration matrix A is calculated¹J-th of column vector a_j ¹, wherein a₁ ¹As second The input of a diagonal processing module, A¹Input of remaining column vector as second step iterative processing module while as third Walk the input of triangulation process module.Iterative processing module needs r_jj ²When signal, first diagonal processing module is by r_jj ²Letter It number calculates and to complete, so diagonal processing module and iterative processing module can execute parallel；

I-th of diagonal processing module is to a exported from the (i-1)-th step_i ^i-1Signal carries out that the defeated of QR decomposing module is calculated R out_iiAnd q_i, and r has been calculated_ii ², wherein q_iInput of the vector as next step triangulation process module, r_ii ²As the i-th step In all iterative processing modules input, iterative processing module receives column vector a from the (i-1)-th step_i ^i-1Signal and a_i1 ^i-1Signal, And r is obtained from diagonal processing module_ii ²Signal obtains Iterative Matrix A next time as input after processingⁱThe i-th 1 column Vector, wherein a_i+1 ⁱAs the input of the diagonal processing module of i+1, AⁱRemaining column vector it is defeated as next step iteration module Input while entering as the i-th+2 step triangulation process module, triangulation process module receive input signal q from the (i-1)-th step_i-1 While from the i-th -2 step receive a_i2 ^i-2Signal and a_i2+1 ^i-2Signal obtains the output signal of QR decomposing module after processing r_i-1,i2Signal and r_i-1,i2+1；

As shown in Fig. 2, diagonal processing module includes multiplier, adder, radical sign arithmetic unit mould in specific application example Block and divider, multiplier e (e is more than or equal to 1 and is less than or equal to n) are received externally input vector a_jE-th of element, to it It is output to adder after carrying out involution processing, adder receives signal from multiplier 1 to multiplier n, after carrying out accumulation process As the output signal r of entire module while being output to radical sign operator block_jj ², radical sign operator block is from addition After device receives signal, divider 1 is output to divider n as divider 1 to divider n after carrying out extraction of square root processing Divisor, while the output signal r as entire module_jj, (e1 is more than or equal to 1 and is received externally less than or equal to n) divider e1 Input vector a_jThe e1 element as dividend, and using the signal received from radical sign arithmetic unit as divisor, operation knot Fruit is as entire module output vector q_j2The e1 element.

Above-mentioned diagonal processing module is used to calculate 2 column vector q of jth of Q matrix_j2, the diagonal entry r of R matrix_jjWith And square r of diagonal entry_jj ², wherein r_jj ²It will be used for the input of iterative processing module, since diagonal processing module is defeated R out_jj ²At the time of and iterative processing module need to use r_jj ²At the time of it is identical, so two modules can execute parallel, thus Improve the speed of QR decomposition.

As shown in figure 3, iterative processing module includes the first shared hardware, the first shared hardware in specific application example A multiple selector and multiplier 1 are contained to multiplier n, multiple selector selects for multiplier 1 to multiplier n different Input is used as multiplier, a of the multiple selector from outside_j3 ^pThe output signal of vector sum divider receives after input selected Multiplier 1 is outputted results to multiplier n, when enable signal is ' 0 ', multiplier e2 (e2 is more than or equal to 1 and is less than or equal to n) from The signal that multiple selector receives is as a multiplier, from external received a_j ^pThe e2 element of vector multiplies as another It counts, result is output to adder Module after progress multiplication operation, adder Module receives defeated from multiplier 1 to multiplier n Enter signal, carries out being output to divider module after accumulation process, the signal that divider is received from adder Module is as quilt Divisor, the signal r being received externally_jj ²As divisor, the input that multiple selector 1 is output to after division operation is carried out, when making When energy signal is ' 1 ', operation result is output to subtracter e3 (e3 is more than or equal to 1 and is less than or equal to n), subtracter e3 by multiplier e2 Signal is received as subtrahend from multiplier e2, is received externally a_j3 ^pThe e3 element of signal carries out phase as minuend Result is as entire module output signal a after subtracting processing_j3 ^p+1The e3 element of vector.

The jth 3 that above-mentioned iterative processing module is used to calculate next iteration matrix arranges, and it is total to need to use first as shown in the figure Two positions for enjoying hardware module are mutually indepedent, it is possible to save hardware resource by the timesharing technology of sharing of hardware.

As shown in figure 4, triangulation process module includes the second shared hardware, multiple selector 1 in specific application example Input is respectively a_j3The n element and a of vector_j3+1N element of vector, when multiple selector enable signal is ' 0 ', multichannel Selector 1 gates a_j3The element of vector is output to multiplier 1 to multiplier n, when multiple selector enable signal is ' 1 ', multichannel Selector 1 gates a_j3+1The element of vector is output to multiplier 1 and receives to multiplier n, multiplier e4 from multiple selector Data are received externally q as a multiplier_j2The e4 element of vector is as another multiplier, after carrying out multiplication operation It is output to adder, adder carries out accumulating operation after receiving signal from multiplier, when multiple selector enable signal is When ' 0 ', output signal r of the accumulator output signal as triangulation process module_j2,j3, when multiple selector enable signal is ' 1 ' When, output signal r of the accumulator output signal as triangulation process module_j2,j3+1。

Above-mentioned triangulation process module is located at the element at coordinate [j2, j3] and coordinate [j2, j3+1] for calculating matrix R, The comparison of Fig. 4 and Fig. 2, Fig. 3 it is found that at coordinates computed [j2, j3] time of element value less than the second basic module and the first base This module executes the 50% of time, therefore in the present invention that the hardware resource timesharing of element value at coordinates computed [j2, j3] is multiple With, achieve the purpose that save hardware resource.

The present invention further provides a kind of decomposition methods based on above-mentioned decomposer, use the matrix A of a n × n The circuit of above-mentioned decomposer carries out QR decomposition and needs to walk by n altogether, the specific steps are that:

Step S1: n column vector a of matrix A₁,……a_nAs the input signal of QR decomposing module, a₁As first The input of diagonal processing module, the output of diagonal processing module are r₁₁, and q₁, the iteration square of iterative processing module calculating next time Battle array, input are a₁And a_j(1 < j < n+1, j are positive integer) exports as Iterative Matrix a next time_j ¹.It is each defeated in the first step Out shown in the value of signal such as formula (1)；

It can be found that the present invention is with traditional maximum difference of QR decomposition method from step S1, jump ahead of the present invention Iterative Matrix next time is calculated, it is because repeatedly that traditional QR, which is decomposed and why calculated next iteration matrix in second step, Use the output of the first step as a result, the present invention uses the first step by the improvement to conventional method for the calculating needs of matrix Input calculates Iterative Matrix next time, substantially increases QR decomposition rate.

Step S2~Sj step: j is more than or equal to 2 and is less than n, the signal a that -1 step of jth is inputted_j ^j-2... ..., a_n ^j-2And jth- The signal q of 1 step output_j-1, a_j ^j-1... ..., a_n ^j-1As the input signal of second step, wherein a_j ^j-1As diagonal processing module Input, for calculating r_jjAnd q_j, the input signal of kth 3 diagonal processing modules is q_j-1, a_j3 ^j-2And a_j3+1 ^j-2(when n-j is surprise When number, j3 is more than or equal to j and is less than or equal to n-1 positive integer, and when n-j is even number, j3 is just whole less than or equal to n more than or equal to j Number), for calculating r_{J-1, j3}And r_{J-1, j3+1}, similar with the first step, iterative processing module is used to calculate Iterative Matrix next time, It is a that it, which is inputted,_j ^j-1... ..., a_n ^j-1, export as a_j+1 ^j... ..., a_n ^j, jth step in respectively export as shown in formula (2)；

Step Sn: by the input a of the (n-1)th step_n ^n-2And (n-1)th step output q_n-1And a_n ^n-1As input, wherein a_n ^n-1 As the input of block1, the output of block1 is r_n,nAnd q_n, q_n-1And a_n ^n-2As the input of block3, the output of block3 For r_n-1,n, respectively export as shown in formula (3) in the n-th step；

From the foregoing, it will be observed that the carry out QR decomposition for a n × n, the mentioned structure of the present invention only needs n time quantum It completes, and needs 2n-1 time quantum using the triangle systolic array architecture that R.-H.Chang et al. is proposed, such as aforementioned 4 × 4 matrix A, using the present invention carry out QR decomposition, it is only necessary to 4 time quantums can be completed, compare 7, lacked 3 Time quantum.Therefore, the present invention, which proposes the decomposition of the triangle systolic array architecture QR based on advanced iterative, can dramatically speed up QR points The speed of solution.

The above is only the preferred embodiment of the present invention, protection scope of the present invention is not limited merely to above-described embodiment, All technical solutions belonged under thinking of the present invention all belong to the scope of protection of the present invention.It should be pointed out that for the art For those of ordinary skill, several improvements and modifications without departing from the principles of the present invention should be regarded as protection of the invention Range.

Claims

1. a kind of triangle systolic array architecture QR decomposer based on advanced iterative is used to carry out QR points to the matrix A of n × n Solution, which is characterized in that it includes diagonal processing module, iterative processing module and triangulation process module；Wherein, n diagonal processing Module, (n-1)+(n-2)+...+1=n × (n-1)/2 iterative processing module, when n be even number when, using n/2+ (n-2)+ (n-4)+(n-6)+...+2=n²/ 4 triangulation process modules, when n be odd number when, using (n-1)+(n-3)+(n-5)+... + 2=(n+1) (n-1)/4 triangulation process module；First diagonal processing module is received externally first column of matrix A Vector a₁, calculated result q₁And r₁₁As the output of entire QR decomposing module, and by q₁It is output to the triangulation process mould of next step Block calculates the r of generation in calculating process₁₁ ²Signal is output to all iterative processing modules in the first step；- 1 iteration of jth Processing module will be externally received j-th of column vector a of matrix A_j, wherein j is more than or equal to 2 and is less than or equal to n-1, and the of matrix A One column vector a₁With the r of first diagonal processing module output_jj ²As input, next iteration matrix A is calculated¹? J column vector a_j ¹, wherein a₁ ¹As the input of second diagonal processing module, A¹Remaining column vector as second step iteration at Input while managing the input of module as third step triangulation process module；And so on, finally by triangulation process module The output signal r of QR decomposing module is obtained after processing_n-1,n。

2. the triangle systolic array architecture QR decomposer according to claim 1 based on advanced iterative, which is characterized in that I-th of diagonal processing module is to a exported from the (i-1)-th step_i ^i-1Signal carries out the output r that QR decomposing module is calculated_iiAnd q_i, And r is calculated_ii ², wherein q_iInput of the vector as next step triangulation process module, r_ii ²As all iteration in the i-th step The input of processing module, iterative processing module receive column vector a from the (i-1)-th step_i ^i-1Signal and a_i1 ^i-1Signal, and from diagonal angle Reason module obtains r_ii ²Signal obtains Iterative Matrix A next time as input after processingⁱThe i-th 1 column vectors, wherein a_i+1 ⁱAs the input of the diagonal processing module of i+1, AⁱRemaining column vector as next step iteration module input while make For the input of the i-th+2 step triangulation process module, triangulation process module receives input signal q from the (i-1)-th step_i-1While from I-2 step receives a_i2 ^i-2Signal and a_i2+1 ^i-2Signal obtains the output signal r of QR decomposing module after processing_i-1,i2Signal and r_i-1,i2+1；

N-th of diagonal processing module is to a from n-1 step output_n ^n-1Signal is handled to obtain QR decomposing module output signal r_nn And q_n, 4 triangulation process modules of kth from n-1 step receive signal q_n-1And signal a is received from n-2 step_n ^n-2, obtained after processing The output signal r of QR decomposing module_n-1,n。

3. the triangle systolic array architecture QR decomposer according to claim 1 or 2 based on advanced iterative, feature exist In the diagonal processing module includes multiplier, adder, radical sign operator block and divider, and multiplier e is received from outside To input vector a_jE-th of element, wherein e be more than or equal to 1 be less than or equal to n, be output to addition after carrying out involution processing to it Device, adder receive signal from multiplier 1 to multiplier n, are output to the same of radical sign operator block after carrying out accumulation process When as entire module output signal r_jj ², after radical sign operator block receives signal from adder, carry out out flat The divisor that divider 1 arrives divider n to divider n as divider 1 is output to after side's processing, while as the defeated of entire module Signal r out_jj, divider e1 is received externally input vector a_jThe e1 element as dividend, and will be from radical sign operation The signal that device receives is as divisor, and wherein e1 is more than or equal to 1 and is less than or equal to n, and operation result is as entire module output vector q_j2The e1 element.

4. the triangle systolic array architecture QR decomposer according to claim 1 or 2 based on advanced iterative, feature exist In the iterative processing module includes the first shared hardware, and the first shared hardware contains a multiple selector and multiplier 1 To multiplier n, multiple selector is multiplier 1 selects different inputs as multiplier to multiplier n, and multiple selector is from outside A_j3 ^pThe output signal of vector sum divider, which receives, outputs results to multiplier 1 to multiplier n after input is selected, when When enable signal is ' 0 ', the signal that multiplier e2 is received from multiple selector is as a multiplier, and wherein e2 is more than or equal to 1 Less than or equal to n, from external received a_j ^pThe e2 element of vector carries out result is defeated after multiplication operation as another multiplier Arrive adder Module out, adder Module receives input signal from multiplier 1 to multiplier n, carries out defeated after accumulation process Divider module is arrived out, and the signal that divider is received from adder Module is as dividend, the signal r that is received externally_jj ² As divisor, the input of multiple selector 1 is output to after progress division operation, when enable signal is ' 1 ', multiplier e2 will be transported It calculates result and is output to subtracter e3, wherein e3 is more than or equal to 1 and is less than or equal to n, and subtracter e3 receives signal from multiplier e2 and makees For subtrahend, it is received externally a_j3 ^pThe e3 element of signal is as minuend, and result is as entire mould after subtract each other processing Block output signal a_j3 ^p+1The e3 element of vector.

5. the triangle systolic array architecture QR decomposer according to claim 4 based on advanced iterative, which is characterized in that The triangulation process module includes the second shared hardware, and the input of multiple selector 1 is respectively a_j3The n element and a of vector_j3+1 N element of vector, when multiple selector enable signal is ' 0 ', multiple selector 1 gates a_j3The element of vector, which is output to, to be multiplied Musical instruments used in a Buddhist or Taoist mass 1 arrives multiplier n, and when multiple selector enable signal is ' 1 ', multiple selector 1 gates a_j3+1The element of vector is output to Multiplier 1 arrives multiplier n, and the data that multiplier e4 is received from multiple selector are received externally q as a multiplier_j2 The e4 element of vector is output to adder after carrying out multiplication operation, adder is received from multiplier as another multiplier Accumulating operation is carried out after to signal, when multiple selector enable signal is ' 0 ', accumulator output signal is as triangulation process The output signal r of module_j2,j3, when multiple selector enable signal is ' 1 ', accumulator output signal is as triangulation process module Output signal r_j2,j3+1。

6. a kind of QR decomposition method based on any one of the claims 1~5 decomposer, which is characterized in that step Are as follows:

Step S1: n column vector a of matrix A₁,……a_nAs the input signal of QR decomposing module, a₁It is diagonal as first The input of processing module, the output of diagonal processing module are r₁₁, and q₁, the Iterative Matrix of iterative processing module calculating next time, It is a that it, which is inputted,₁And a_j, wherein 1 < j < n+1, j are positive integer, export as Iterative Matrix a next time_j ¹；

Step S2~Sj step: j is more than or equal to 2 and is less than n, the signal a that -1 step of jth is inputted_j ^j-2... ..., a_n ^j-2And -1 step of jth The signal q of output_j-1, a_j ^j-1... ..., a_n ^j-1As the input signal of second step, wherein a_j ^j-1As the defeated of diagonal processing module Enter, for calculating r_jjAnd q_j, the input signal of kth 3 diagonal processing modules is q_j-1, a_j3 ^j-2And a_j3+1 ^j-2；When n-j is odd number When, j3 is more than or equal to j and is less than or equal to n-1 positive integer, and when n-j is even number, j3 is the positive integer for being less than or equal to n more than or equal to j； For calculating r_{J-1, j3}And r_{J-1, j3+1}, similar with the first step, iterative processing module is used to calculate Iterative Matrix next time, defeated Enter for a_j ^j-1... ..., a_n ^j-1, export as a_j+1 ^j... ..., a_n ^j；

Step Sn: by the input a of the (n-1)th step_n ^n-2And (n-1)th step output q_n-1And a_n ^n-1As input, wherein a_n ^n-1As The input of the diagonal processing module, the output of the diagonal processing module are r_n,nAnd q_n, q_n-1And a_n ^n-2At the triangle The input of module is managed, the output of the triangulation process module is r_n-1,n。