CN105846873B - Triangle systolic array architecture QR decomposer and decomposition method based on advanced iterative - Google Patents
Triangle systolic array architecture QR decomposer and decomposition method based on advanced iterative Download PDFInfo
- Publication number
- CN105846873B CN105846873B CN201610173392.0A CN201610173392A CN105846873B CN 105846873 B CN105846873 B CN 105846873B CN 201610173392 A CN201610173392 A CN 201610173392A CN 105846873 B CN105846873 B CN 105846873B
- Authority
- CN
- China
- Prior art keywords
- signal
- module
- output
- multiplier
- input
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04B—TRANSMISSION
- H04B7/00—Radio transmission systems, i.e. using radiation field
- H04B7/02—Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas
- H04B7/04—Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas
- H04B7/0413—MIMO systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04B—TRANSMISSION
- H04B7/00—Radio transmission systems, i.e. using radiation field
- H04B7/02—Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas
- H04B7/04—Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas
- H04B7/08—Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the receiving station
- H04B7/0837—Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the receiving station using pre-detection combining
- H04B7/0842—Weighted combining
- H04B7/0848—Joint weighting
- H04B7/0854—Joint weighting using error minimizing algorithms, e.g. minimum mean squared error [MMSE], "cross-correlation" or matrix inversion
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Complex Calculations (AREA)
Abstract
A kind of triangle systolic array architecture QR decomposer and decomposition method based on advanced iterative is used to carry out QR decomposition to the matrix A of n × n, it includes diagonal processing module, iterative processing module and triangulation process module;First diagonal processing module is received externally first column vector a of matrix A1, as a result q1And r11As the output of QR decomposing module, and by q1It is output to the triangulation process module of next step, the r of generationjj 2Signal is output to all iterative processing modules in the first step;- 1 iterative processing module of jth will be externally received j-th of column vector a of matrix Aj, first column vector a of matrix A1With the r of first diagonal processing module outputjj 2As input, next iteration matrix A is obtained1J-th of column vector aj 1;And so on, finally by the output signal r for obtaining QR decomposing module after triangulation process resume modulen‑1,n.Decomposition method is implemented based on above-mentioned decomposer.The present invention has many advantages, such as that principle is simple, decomposition rate is fast, high-efficient.
Description
Technical field
Present invention relates generally to wireless communication system base band signal process fields, refer in particular to a kind of three based on advanced iterative
Angle systolic array architecture QR decomposer and decomposition method.
Background technique
Orthogonal frequency division multiplexing (OFDM, orthogonal frequency division multiplexing) technology and more
Input multi output technology (MIMO, multiple input multiple output) technology because of it with high spectrum utilization and
High transfer rate receives extensive attention, and makes in recent years about a series of progress of precoding technique based on MIMO-
The multi-user wireless communication system of OFDM technology may be implemented simultaneously as multiple user services.However it is based on MIMO-OFDM technology
Multi-user wireless communication system base band signal process algorithm computation complexity greatly increase, the design to baseband signal processor
Propose unprecedented challenge.
In the base band signal process link based on MIMO-OFDM wireless communication system, precoding algorithms and MIMO detection
Algorithm is more complicated two base band signal process algorithms, obtains the extensive concern of researcher in recent years.Nineteen eighty-three, Costa
The dirty paper code algorithm of proposition is considered as in its classic paper " Writing on dirty paper " (" dirty paper code ")
The best nonlinear precoding algorithm of performance, but its computation complexity is especially high, as a consequence it is hardly possible on hardware circuit in real time
Ground executes, and Wei Yu in 2005 et al. is in its paper " Trellis and Convolutional Precoding for
Transmitter-Based the Interference Presubtraction " (" transmitter based on grid and convolution precoding
Interference is pre- to be eliminated ") in THP (Tomlinson-Harashima Precoding) algorithm for nonlinear precoding and is obtained
Preferable interference eradicating efficacy, although its performance decreases compared with dirty paper code algorithm, its computation complexity drops significantly
It is low, make it possible hardware realization nonlinear precoding algorithm, the highest part of computation complexity is to letter in THP algorithm
Road matrix H executes the part that QR is decomposed, and efficiently quickly QR Knock-Down Component helps to improve THP precoding algorithms overall performance.
Maximum- likelihood estimation is that MIMO detects the highest algorithm of detection accuracy in all algorithms, however its computation complexity is suitable
Height, therefore, M.Shabany et al. is in " 0.13 μm of CMOS 655Mb/s of A, 4 × 4 64-QAM k-best MIMO
Detector " (" uses 4 × 4MIMO detector of 655Mb/s when 64-QAM modulation system to set under 0.13 μm of CMOS technology
Meter ") in using maximum- likelihood estimation approximate algorithm spherical shape detection (SD) algorithm carry out MIMO detection, achieve well
Detection effect, QR decompose one of the bottleneck as SD algorithm, restrict it and execute speed.
It is widely used in multi-user's baseband signal processor based on MIMO-OFDM technology since QR is decomposed,
And be the bottleneck for restricting processing speed in many cases, therefore, QR is decomposed in the design of many baseband signal processors and is made
It is optimized for an important arithmetic unit.So-called QR is decomposed, and the matrix A of n × n is exactly decomposed into the unitary matrice Q of n × n
With the upper triangular matrix R of n × n, current QR decomposition algorithm is broadly divided into three classes, be based respectively on Householder transformation,
Given rotation and MGS (modified Gram-Schmidt) algorithm, since the QR converted based on Householder is decomposed very
Difficult hardware realization, so using less although the QR decomposition algorithm based on Given rotation greatly reduces used hardware
Resource, but the execution time needed for it is longer, does not meet the requirement of communication system real-time, the QR based on MGS algorithm is decomposed
Because occupancy hardware resource is less and executes the time shorter actual demand for meeting communication system.
There is practitioner R.-H.Chang et al. to publish an article " Iterative QR decomposition
architecture using the modified Gram-Schmidt algorithm for MIMO systems”
(" based on the Iterative QR Decomposition structure of MGS algorithm in mimo system ") proposes a kind of triangle systolic arrays based on MGS algorithm
Structure QR decomposes hardware circuit, and the QR for completing a n (n is the positive integer more than or equal to 2) rank square matrix is decomposed, the triangle proposed
Systolic array architecture QR decomposition circuit only needs 2n-1 time quantum.In specific application, it is proposed using R.-H.Chang et al.
Triangle systolic array architecture QR decomposition circuit QR decomposition is carried out to one 4 × 4 matrix A one 4 × 4 matrix is made
Carrying out QR decomposition with the iteration structure based on MGS algorithm needs seven steps can be completed, and each step needs a time quantum, needs altogether
Want seven time quantums.Although it can be seen that the QR decomposition method for the triangle systolic array architecture that R.-H.Chang et al. is proposed
It greatly reduces and calculates the time, but intentionally get the faster QR of speed in the base band signal process of practical communication system and decompose knot
Structure.And the document that the QR for also only only relating to 4 × 4 matrixes at present is decomposed, there is not n × n matrix QR of announcement to decompose hard
Part circuit.
Summary of the invention
The technical problem to be solved in the present invention is that, for technical problem of the existing technology, the present invention provides one
Kind principle is simple, easily realize, decomposition rate is fast, the high-efficient triangle systolic array architecture QR decomposer based on advanced iterative
And decomposition method.
In order to solve the above technical problems, the invention adopts the following technical scheme:
A kind of triangle systolic array architecture QR decomposer based on advanced iterative is used to carry out QR to the matrix A of n × n
It decomposes, it includes diagonal processing module, iterative processing module and triangulation process module;Wherein, n diagonal processing modules, (n-1)
+ (n-2)+...+1=n × (n-1)/2 iterative processing module, when n is even number, using n/2+ (n-2)+(n-4)+(n-6)
+ ...+2=n2/4 triangulation process modules, when n is odd number, using (n-1)+(n-3)+(n-5)+...+2=(n+1)
(n-1)/4 triangulation process module;First diagonal processing module is received externally first column vector a of matrix A1, meter
Calculate result q1And r11As the output of entire QR decomposing module, and by q1It is output to the triangulation process module of next step, is being calculated
The r of generation is calculated in the processjj 2Signal is output to all iterative processing modules in the first step;- 1 iterative processing module of jth will
It is externally received j-th of column vector a of matrix Aj, wherein j is more than or equal to 2 and is less than or equal to n-1, first column vector a of matrix A1
With the r of first diagonal processing module outputjj 2As input, next iteration matrix A is calculated1J-th of column vector
aj 1, wherein a1 1As the input of second diagonal processing module, A1Remaining column vector as second step iterative processing module
Input while input as third step triangulation process module;And so on, it is obtained finally by after triangulation process resume module
To the output signal r of QR decomposing modulen-1,。
Further improvement as decomposer of the present invention: i-th of diagonal processing module is to a exported from the (i-1)-th stepi i-1
Signal carries out the output r that QR decomposing module is calculatediiAnd qi, and r has been calculatedii 2, wherein qiVector is as in next step three
The input of angle processing module, rii 2As the input of all iterative processing modules in the i-th step, iterative processing module connects from the (i-1)-th step
Receive column vector ai i-1Signal and ai1 i-1Signal, and r is obtained from diagonal processing moduleii 2Signal obtains after processing as input
Iterative Matrix A next timeiThe i-th 1 column vectors, wherein ai+1 iAs the input of the diagonal processing module of i+1, AiIts
Input while remaining column vector is inputted as next step iteration module as the i-th+2 step triangulation process module, triangulation process mould
Block receives input signal q from the (i-1)-th stepi-1While from the i-th -2 step receive ai2 i-2Signal and ai2+1 i-2Signal handles it
The output signal r of QR decomposing module is obtained afterwardsi-1,i2Signal and ri-1,i2+1;
N-th of diagonal processing module is to a from n-1 step outputn n-1Signal is handled to obtain QR decomposing module output letter
Number rnnAnd qn, 4 triangulation process modules of kth from n-1 step receive signal qn-1And signal a is received from n-2 stepn n-2, after processing
Obtain the output signal r of QR decomposing modulen-1,n。
Further improvement as decomposer of the present invention: the diagonal processing module includes multiplier, adder, radical sign
Operator block and divider, multiplier e are received externally input vector ajE-th of element, wherein e is more than or equal to 1 small
In being equal to n, it is output to adder after involution processing is carried out to it, adder receives signal from multiplier 1 to multiplier n, into
As the output signal r of entire module while being output to radical sign operator block after row accumulation processjj 2, radical sign operation
After device module receives signal from adder, divider 1 is output to divider n as divider after carrying out extraction of square root processing
1 arrives the divisor of divider n, while the output signal r as entire modulejj, divider e1 is received externally input vector aj
The e1 element as dividend, and using the signal received from radical sign arithmetic unit as divisor, wherein it is small to be more than or equal to 1 by e1
In being equal to n, operation result is as entire module output vector qj2The e1 element.
Further improvement as decomposer of the present invention: the iterative processing module include the first shared hardware, first
Shared hardware contains a multiple selector and multiplier to multiplier n, and multiple selector is that multiplier 1 is selected to multiplier n
Different inputs is selected as multiplier, a of the multiple selector from outsidej3 pThe output signal of vector sum divider receive input into
Multiplier 1 is outputted results to after row selection to multiplier n, when enable signal is ' 0 ', multiplier e2 is received from multiple selector
The signal arrived is as a multiplier, and wherein e2 is more than or equal to 1 and is less than or equal to n, from external received aj pThe e2 element of vector
As another multiplier, result is output to adder Module after progress multiplication operation, adder Module is from multiplier 1 to multiplication
Device n receives input signal, carries out being output to divider module after accumulation process, divider is received from adder Module
Signal is as dividend, the signal r that is received externallyjj 2As divisor, multiple selector 1 is output to after carrying out division operation
Input, when enable signal is ' 1 ', operation result is output to subtracter e3 by multiplier e2, and wherein e3 is more than or equal to and 1 is less than
Equal to n, subtracter e3 receives signal as subtrahend from multiplier e2, is received externally aj3 pThe e3 element of signal is made
For minuend, result is carried out after subtracting each other processing as entire module output signal aj3 p+1The e3 element of vector.
Further improvement as decomposer of the present invention: the triangulation process module includes the second shared hardware, multichannel
The input of selector 1 is respectively aj3The n element and a of vectorj3+1N element of vector, when multiple selector enable signal is
When ' 0 ', multiple selector 1 gates aj3The element of vector is output to multiplier 1 to multiplier n, and multiple selector enable signal is
When ' 1 ', multiple selector 1 gates aj3+1The element of vector is output to multiplier 1 to multiplier n, and multiplier e4 is from multi-path choice
The data that device receives are received externally q as a multiplierj2The e4 element of vector is carried out as another multiplier
Adder is output to after multiplication operation, adder carries out accumulating operation after receiving signal from multiplier, works as multiple selector
When enable signal is ' 0 ', output signal r of the accumulator output signal as triangulation process modulej2,j3, when multiple selector is enabled
When signal is ' 1 ', output signal r of the accumulator output signal as triangulation process modulej2,j3+1。
A kind of QR decomposition method based on above-mentioned decomposer, the steps include:
Step S1: n column vector a of matrix A1,……anAs the input signal of QR decomposing module, a1As first
The input of diagonal processing module, the output of diagonal processing module are r11, and q1, the iteration square of iterative processing module calculating next time
Battle array, input are a1And aj, wherein 1 < j < n+1, j are positive integer, export as Iterative Matrix a next timej 1;
Step S2~Sj step: j is more than or equal to 2 and is less than n, the signal a that -1 step of jth is inputtedj j-2... ..., an j-2And jth-
The signal q of 1 step outputj-1, aj j-1... ..., an j-1As the input signal of second step, wherein aj j-1As diagonal processing module
Input, for calculating rjjAnd qj, the input signal of kth 3 diagonal processing modules is qj-1, aj3 j-2And aj3+1 j-2;When n-j is surprise
When number, j3 is more than or equal to j and is less than or equal to n-1 positive integer, and when n-j is even number, j3 is just whole less than or equal to n more than or equal to j
Number;For calculating rJ-1, j3And rJ-1, j3+1, similar with the first step, iterative processing module is used to calculate Iterative Matrix next time,
Input is aj j-1... ..., an j-1, export as aj+1 j... ..., an j; (2)
Step Sn: by the input a of the (n-1)th stepn n-2And (n-1)th step output qn-1And an n-1As input, wherein an n-1
As the input of block1, the output of block1 is rn,nAnd qn, qn-1And an n-2As the input of block3, the output of block3
For rn-1,n。
Compared with the prior art, the advantages of the present invention are as follows: the triangle systolic arrays knot of the invention based on advanced iterative
Structure QR decomposer and decomposition method, principle is simple, easily realizes, can dramatically speed up the speed of QR decomposition;For a n × n
Carry out QR decomposition, the mentioned structure of the present invention only needs n time quantum can be completed, and use R.-H.Chang et al. proposition
Triangle systolic array architecture need 2n-1 time quantum, such as matrix A for above-mentioned 4 × 4, using the present invention carry out QR
It decomposes, it is only necessary to which 4 time quantums can be completed, and compares 7, has lacked 3 time quantums.
Detailed description of the invention
Fig. 1 is the topological structure schematic diagram of decomposer of the present invention.
Fig. 2 is the principle schematic diagram of present invention processing module diagonal in specific application example.
Fig. 3 is the principle schematic diagram of present invention iterative processing module in specific application example.
Fig. 4 is principle schematic diagram of the present invention in specific application example intermediate cam processing module.
Specific embodiment
The present invention is described in further details below with reference to Figure of description and specific embodiment.
As shown in Figure 1, being used to the present invention is based on the triangle systolic array architecture QR decomposer of advanced iterative to n × n's
Matrix A carries out QR decomposition, it includes diagonal processing module, iterative processing module and triangulation process module;Wherein, n diagonal angle
Reason module, (n-1)+(n-2)+...+1=n × (n-1)/2 iterative processing module needs n/2+ (n-2) when n is even number
+ (n-4)+(n-6)+...+2=n2/4 triangulation process modules need (n-1)+(n-3)+(n-5) when n is odd number
+ ...+2=(n+1) (n-1)/4 triangulation process module composition.
First diagonal processing module is received externally first column vector a of matrix A1, calculated result q1And r11Make
For the output of entire QR decomposing module, and by q1It is output to the triangulation process module of next step, calculates and generates in calculating process
Rjj 2Signal is output to all iterative processing modules in the first step;Jth -1 (j is more than or equal to 2 and is less than or equal to n-1) a iteration
Processing module will be externally received j-th of column vector a of matrix Aj, first column vector a of matrix A1With first diagonal processing
The r of module outputjj 2As input, next iteration matrix A is calculated1J-th of column vector aj 1, wherein a1 1As second
The input of a diagonal processing module, A1Input of remaining column vector as second step iterative processing module while as third
Walk the input of triangulation process module.Iterative processing module needs rjj 2When signal, first diagonal processing module is by rjj 2Letter
It number calculates and to complete, so diagonal processing module and iterative processing module can execute parallel;
I-th of diagonal processing module is to a exported from the (i-1)-th stepi i-1Signal carries out that the defeated of QR decomposing module is calculated
R outiiAnd qi, and r has been calculatedii 2, wherein qiInput of the vector as next step triangulation process module, rii 2As the i-th step
In all iterative processing modules input, iterative processing module receives column vector a from the (i-1)-th stepi i-1Signal and ai1 i-1Signal,
And r is obtained from diagonal processing moduleii 2Signal obtains Iterative Matrix A next time as input after processingiThe i-th 1 column
Vector, wherein ai+1 iAs the input of the diagonal processing module of i+1, AiRemaining column vector it is defeated as next step iteration module
Input while entering as the i-th+2 step triangulation process module, triangulation process module receive input signal q from the (i-1)-th stepi-1
While from the i-th -2 step receive ai2 i-2Signal and ai2+1 i-2Signal obtains the output signal of QR decomposing module after processing
ri-1,i2Signal and ri-1,i2+1;
N-th of diagonal processing module is to a from n-1 step outputn n-1Signal is handled to obtain QR decomposing module output letter
Number rnnAnd qn, 4 triangulation process modules of kth from n-1 step receive signal qn-1And signal a is received from n-2 stepn n-2, after processing
Obtain the output signal r of QR decomposing modulen-1,n。
As shown in Fig. 2, diagonal processing module includes multiplier, adder, radical sign arithmetic unit mould in specific application example
Block and divider, multiplier e (e is more than or equal to 1 and is less than or equal to n) are received externally input vector ajE-th of element, to it
It is output to adder after carrying out involution processing, adder receives signal from multiplier 1 to multiplier n, after carrying out accumulation process
As the output signal r of entire module while being output to radical sign operator blockjj 2, radical sign operator block is from addition
After device receives signal, divider 1 is output to divider n as divider 1 to divider n after carrying out extraction of square root processing
Divisor, while the output signal r as entire modulejj, (e1 is more than or equal to 1 and is received externally less than or equal to n) divider e1
Input vector ajThe e1 element as dividend, and using the signal received from radical sign arithmetic unit as divisor, operation knot
Fruit is as entire module output vector qj2The e1 element.
Above-mentioned diagonal processing module is used to calculate 2 column vector q of jth of Q matrixj2, the diagonal entry r of R matrixjjWith
And square r of diagonal entryjj 2, wherein rjj 2It will be used for the input of iterative processing module, since diagonal processing module is defeated
R outjj 2At the time of and iterative processing module need to use rjj 2At the time of it is identical, so two modules can execute parallel, thus
Improve the speed of QR decomposition.
As shown in figure 3, iterative processing module includes the first shared hardware, the first shared hardware in specific application example
A multiple selector and multiplier 1 are contained to multiplier n, multiple selector selects for multiplier 1 to multiplier n different
Input is used as multiplier, a of the multiple selector from outsidej3 pThe output signal of vector sum divider receives after input selected
Multiplier 1 is outputted results to multiplier n, when enable signal is ' 0 ', multiplier e2 (e2 is more than or equal to 1 and is less than or equal to n) from
The signal that multiple selector receives is as a multiplier, from external received aj pThe e2 element of vector multiplies as another
It counts, result is output to adder Module after progress multiplication operation, adder Module receives defeated from multiplier 1 to multiplier n
Enter signal, carries out being output to divider module after accumulation process, the signal that divider is received from adder Module is as quilt
Divisor, the signal r being received externallyjj 2As divisor, the input that multiple selector 1 is output to after division operation is carried out, when making
When energy signal is ' 1 ', operation result is output to subtracter e3 (e3 is more than or equal to 1 and is less than or equal to n), subtracter e3 by multiplier e2
Signal is received as subtrahend from multiplier e2, is received externally aj3 pThe e3 element of signal carries out phase as minuend
Result is as entire module output signal a after subtracting processingj3 p+1The e3 element of vector.
The jth 3 that above-mentioned iterative processing module is used to calculate next iteration matrix arranges, and it is total to need to use first as shown in the figure
Two positions for enjoying hardware module are mutually indepedent, it is possible to save hardware resource by the timesharing technology of sharing of hardware.
As shown in figure 4, triangulation process module includes the second shared hardware, multiple selector 1 in specific application example
Input is respectively aj3The n element and a of vectorj3+1N element of vector, when multiple selector enable signal is ' 0 ', multichannel
Selector 1 gates aj3The element of vector is output to multiplier 1 to multiplier n, when multiple selector enable signal is ' 1 ', multichannel
Selector 1 gates aj3+1The element of vector is output to multiplier 1 and receives to multiplier n, multiplier e4 from multiple selector
Data are received externally q as a multiplierj2The e4 element of vector is as another multiplier, after carrying out multiplication operation
It is output to adder, adder carries out accumulating operation after receiving signal from multiplier, when multiple selector enable signal is
When ' 0 ', output signal r of the accumulator output signal as triangulation process modulej2,j3, when multiple selector enable signal is ' 1 '
When, output signal r of the accumulator output signal as triangulation process modulej2,j3+1。
Above-mentioned triangulation process module is located at the element at coordinate [j2, j3] and coordinate [j2, j3+1] for calculating matrix R,
The comparison of Fig. 4 and Fig. 2, Fig. 3 it is found that at coordinates computed [j2, j3] time of element value less than the second basic module and the first base
This module executes the 50% of time, therefore in the present invention that the hardware resource timesharing of element value at coordinates computed [j2, j3] is multiple
With, achieve the purpose that save hardware resource.
The present invention further provides a kind of decomposition methods based on above-mentioned decomposer, use the matrix A of a n × n
The circuit of above-mentioned decomposer carries out QR decomposition and needs to walk by n altogether, the specific steps are that:
Step S1: n column vector a of matrix A1,……anAs the input signal of QR decomposing module, a1As first
The input of diagonal processing module, the output of diagonal processing module are r11, and q1, the iteration square of iterative processing module calculating next time
Battle array, input are a1And aj(1 < j < n+1, j are positive integer) exports as Iterative Matrix a next timej 1.It is each defeated in the first step
Out shown in the value of signal such as formula (1);
It can be found that the present invention is with traditional maximum difference of QR decomposition method from step S1, jump ahead of the present invention
Iterative Matrix next time is calculated, it is because repeatedly that traditional QR, which is decomposed and why calculated next iteration matrix in second step,
Use the output of the first step as a result, the present invention uses the first step by the improvement to conventional method for the calculating needs of matrix
Input calculates Iterative Matrix next time, substantially increases QR decomposition rate.
Step S2~Sj step: j is more than or equal to 2 and is less than n, the signal a that -1 step of jth is inputtedj j-2... ..., an j-2And jth-
The signal q of 1 step outputj-1, aj j-1... ..., an j-1As the input signal of second step, wherein aj j-1As diagonal processing module
Input, for calculating rjjAnd qj, the input signal of kth 3 diagonal processing modules is qj-1, aj3 j-2And aj3+1 j-2(when n-j is surprise
When number, j3 is more than or equal to j and is less than or equal to n-1 positive integer, and when n-j is even number, j3 is just whole less than or equal to n more than or equal to j
Number), for calculating rJ-1, j3And rJ-1, j3+1, similar with the first step, iterative processing module is used to calculate Iterative Matrix next time,
It is a that it, which is inputted,j j-1... ..., an j-1, export as aj+1 j... ..., an j, jth step in respectively export as shown in formula (2);
Step Sn: by the input a of the (n-1)th stepn n-2And (n-1)th step output qn-1And an n-1As input, wherein an n-1
As the input of block1, the output of block1 is rn,nAnd qn, qn-1And an n-2As the input of block3, the output of block3
For rn-1,n, respectively export as shown in formula (3) in the n-th step;
From the foregoing, it will be observed that the carry out QR decomposition for a n × n, the mentioned structure of the present invention only needs n time quantum
It completes, and needs 2n-1 time quantum using the triangle systolic array architecture that R.-H.Chang et al. is proposed, such as aforementioned
4 × 4 matrix A, using the present invention carry out QR decomposition, it is only necessary to 4 time quantums can be completed, compare 7, lacked 3
Time quantum.Therefore, the present invention, which proposes the decomposition of the triangle systolic array architecture QR based on advanced iterative, can dramatically speed up QR points
The speed of solution.
The above is only the preferred embodiment of the present invention, protection scope of the present invention is not limited merely to above-described embodiment,
All technical solutions belonged under thinking of the present invention all belong to the scope of protection of the present invention.It should be pointed out that for the art
For those of ordinary skill, several improvements and modifications without departing from the principles of the present invention should be regarded as protection of the invention
Range.
Claims (6)
1. a kind of triangle systolic array architecture QR decomposer based on advanced iterative is used to carry out QR points to the matrix A of n × n
Solution, which is characterized in that it includes diagonal processing module, iterative processing module and triangulation process module;Wherein, n diagonal processing
Module, (n-1)+(n-2)+...+1=n × (n-1)/2 iterative processing module, when n be even number when, using n/2+ (n-2)+
(n-4)+(n-6)+...+2=n2/ 4 triangulation process modules, when n be odd number when, using (n-1)+(n-3)+(n-5)+...
+ 2=(n+1) (n-1)/4 triangulation process module;First diagonal processing module is received externally first column of matrix A
Vector a1, calculated result q1And r11As the output of entire QR decomposing module, and by q1It is output to the triangulation process mould of next step
Block calculates the r of generation in calculating process11 2Signal is output to all iterative processing modules in the first step;- 1 iteration of jth
Processing module will be externally received j-th of column vector a of matrix Aj, wherein j is more than or equal to 2 and is less than or equal to n-1, and the of matrix A
One column vector a1With the r of first diagonal processing module outputjj 2As input, next iteration matrix A is calculated1?
J column vector aj 1, wherein a1 1As the input of second diagonal processing module, A1Remaining column vector as second step iteration at
Input while managing the input of module as third step triangulation process module;And so on, finally by triangulation process module
The output signal r of QR decomposing module is obtained after processingn-1,n。
2. the triangle systolic array architecture QR decomposer according to claim 1 based on advanced iterative, which is characterized in that
I-th of diagonal processing module is to a exported from the (i-1)-th stepi i-1Signal carries out the output r that QR decomposing module is calculatediiAnd qi,
And r is calculatedii 2, wherein qiInput of the vector as next step triangulation process module, rii 2As all iteration in the i-th step
The input of processing module, iterative processing module receive column vector a from the (i-1)-th stepi i-1Signal and ai1 i-1Signal, and from diagonal angle
Reason module obtains rii 2Signal obtains Iterative Matrix A next time as input after processingiThe i-th 1 column vectors, wherein
ai+1 iAs the input of the diagonal processing module of i+1, AiRemaining column vector as next step iteration module input while make
For the input of the i-th+2 step triangulation process module, triangulation process module receives input signal q from the (i-1)-th stepi-1While from
I-2 step receives ai2 i-2Signal and ai2+1 i-2Signal obtains the output signal r of QR decomposing module after processingi-1,i2Signal and
ri-1,i2+1;
N-th of diagonal processing module is to a from n-1 step outputn n-1Signal is handled to obtain QR decomposing module output signal rnn
And qn, 4 triangulation process modules of kth from n-1 step receive signal qn-1And signal a is received from n-2 stepn n-2, obtained after processing
The output signal r of QR decomposing modulen-1,n。
3. the triangle systolic array architecture QR decomposer according to claim 1 or 2 based on advanced iterative, feature exist
In the diagonal processing module includes multiplier, adder, radical sign operator block and divider, and multiplier e is received from outside
To input vector ajE-th of element, wherein e be more than or equal to 1 be less than or equal to n, be output to addition after carrying out involution processing to it
Device, adder receive signal from multiplier 1 to multiplier n, are output to the same of radical sign operator block after carrying out accumulation process
When as entire module output signal rjj 2, after radical sign operator block receives signal from adder, carry out out flat
The divisor that divider 1 arrives divider n to divider n as divider 1 is output to after side's processing, while as the defeated of entire module
Signal r outjj, divider e1 is received externally input vector ajThe e1 element as dividend, and will be from radical sign operation
The signal that device receives is as divisor, and wherein e1 is more than or equal to 1 and is less than or equal to n, and operation result is as entire module output vector
qj2The e1 element.
4. the triangle systolic array architecture QR decomposer according to claim 1 or 2 based on advanced iterative, feature exist
In the iterative processing module includes the first shared hardware, and the first shared hardware contains a multiple selector and multiplier 1
To multiplier n, multiple selector is multiplier 1 selects different inputs as multiplier to multiplier n, and multiple selector is from outside
Aj3 pThe output signal of vector sum divider, which receives, outputs results to multiplier 1 to multiplier n after input is selected, when
When enable signal is ' 0 ', the signal that multiplier e2 is received from multiple selector is as a multiplier, and wherein e2 is more than or equal to 1
Less than or equal to n, from external received aj pThe e2 element of vector carries out result is defeated after multiplication operation as another multiplier
Arrive adder Module out, adder Module receives input signal from multiplier 1 to multiplier n, carries out defeated after accumulation process
Divider module is arrived out, and the signal that divider is received from adder Module is as dividend, the signal r that is received externallyjj 2
As divisor, the input of multiple selector 1 is output to after progress division operation, when enable signal is ' 1 ', multiplier e2 will be transported
It calculates result and is output to subtracter e3, wherein e3 is more than or equal to 1 and is less than or equal to n, and subtracter e3 receives signal from multiplier e2 and makees
For subtrahend, it is received externally aj3 pThe e3 element of signal is as minuend, and result is as entire mould after subtract each other processing
Block output signal aj3 p+1The e3 element of vector.
5. the triangle systolic array architecture QR decomposer according to claim 4 based on advanced iterative, which is characterized in that
The triangulation process module includes the second shared hardware, and the input of multiple selector 1 is respectively aj3The n element and a of vectorj3+1
N element of vector, when multiple selector enable signal is ' 0 ', multiple selector 1 gates aj3The element of vector, which is output to, to be multiplied
Musical instruments used in a Buddhist or Taoist mass 1 arrives multiplier n, and when multiple selector enable signal is ' 1 ', multiple selector 1 gates aj3+1The element of vector is output to
Multiplier 1 arrives multiplier n, and the data that multiplier e4 is received from multiple selector are received externally q as a multiplierj2
The e4 element of vector is output to adder after carrying out multiplication operation, adder is received from multiplier as another multiplier
Accumulating operation is carried out after to signal, when multiple selector enable signal is ' 0 ', accumulator output signal is as triangulation process
The output signal r of modulej2,j3, when multiple selector enable signal is ' 1 ', accumulator output signal is as triangulation process module
Output signal rj2,j3+1。
6. a kind of QR decomposition method based on any one of the claims 1~5 decomposer, which is characterized in that step
Are as follows:
Step S1: n column vector a of matrix A1,……anAs the input signal of QR decomposing module, a1It is diagonal as first
The input of processing module, the output of diagonal processing module are r11, and q1, the Iterative Matrix of iterative processing module calculating next time,
It is a that it, which is inputted,1And aj, wherein 1 < j < n+1, j are positive integer, export as Iterative Matrix a next timej 1;
Step S2~Sj step: j is more than or equal to 2 and is less than n, the signal a that -1 step of jth is inputtedj j-2... ..., an j-2And -1 step of jth
The signal q of outputj-1, aj j-1... ..., an j-1As the input signal of second step, wherein aj j-1As the defeated of diagonal processing module
Enter, for calculating rjjAnd qj, the input signal of kth 3 diagonal processing modules is qj-1, aj3 j-2And aj3+1 j-2;When n-j is odd number
When, j3 is more than or equal to j and is less than or equal to n-1 positive integer, and when n-j is even number, j3 is the positive integer for being less than or equal to n more than or equal to j;
For calculating rJ-1, j3And rJ-1, j3+1, similar with the first step, iterative processing module is used to calculate Iterative Matrix next time, defeated
Enter for aj j-1... ..., an j-1, export as aj+1 j... ..., an j;
Step Sn: by the input a of the (n-1)th stepn n-2And (n-1)th step output qn-1And an n-1As input, wherein an n-1As
The input of the diagonal processing module, the output of the diagonal processing module are rn,nAnd qn, qn-1And an n-2At the triangle
The input of module is managed, the output of the triangulation process module is rn-1,n。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610173392.0A CN105846873B (en) | 2016-03-24 | 2016-03-24 | Triangle systolic array architecture QR decomposer and decomposition method based on advanced iterative |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610173392.0A CN105846873B (en) | 2016-03-24 | 2016-03-24 | Triangle systolic array architecture QR decomposer and decomposition method based on advanced iterative |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105846873A CN105846873A (en) | 2016-08-10 |
CN105846873B true CN105846873B (en) | 2018-12-18 |
Family
ID=56583444
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610173392.0A Active CN105846873B (en) | 2016-03-24 | 2016-03-24 | Triangle systolic array architecture QR decomposer and decomposition method based on advanced iterative |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105846873B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113779501A (en) * | 2021-08-23 | 2021-12-10 | 华控清交信息科技(北京)有限公司 | Data processing method and device and data processing device |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101170525A (en) * | 2006-10-25 | 2008-04-30 | 中兴通讯股份有限公司 | MLSE simplification detection method and its device based on blocked QR decomposition |
CN101674160A (en) * | 2009-10-22 | 2010-03-17 | 复旦大学 | Signal detection method and device for multiple-input-multiple-output wireless communication system |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR100981121B1 (en) * | 2007-12-18 | 2010-09-10 | 한국전자통신연구원 | Apparatus and Method for Receiving Signal for MIMO System |
-
2016
- 2016-03-24 CN CN201610173392.0A patent/CN105846873B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101170525A (en) * | 2006-10-25 | 2008-04-30 | 中兴通讯股份有限公司 | MLSE simplification detection method and its device based on blocked QR decomposition |
CN101674160A (en) * | 2009-10-22 | 2010-03-17 | 复旦大学 | Signal detection method and device for multiple-input-multiple-output wireless communication system |
Non-Patent Citations (1)
Title |
---|
"用于MIMO-OFDM系统QR分解的分布式脉动阵列处理算法";朱勇旭,吴斌,周玉梅,蔡菁菁,夏凯锋;《电子与信息学报》;20120831;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN105846873A (en) | 2016-08-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10445065B2 (en) | Constant depth, near constant depth, and subcubic size threshold circuits for linear algebraic calculations | |
Wu et al. | Implementation of a high throughput soft MIMO detector on GPU | |
CN110276450A (en) | Deep neural network structural sparse system and method based on more granularities | |
CN109767000A (en) | Neural network convolution method and device based on Winograd algorithm | |
CN110361691B (en) | Implementation method of coherent source DOA estimation FPGA based on non-uniform array | |
CN108733348A (en) | The method for merging vector multiplier and carrying out operation using it | |
CN110163359A (en) | A kind of computing device and method | |
Salmela et al. | Complex-valued QR decomposition implementation for MIMO receivers | |
CN110276447A (en) | A kind of computing device and method | |
CN108733627A (en) | A kind of FPGA implementation method that positive definite matrix Cholesky is decomposed | |
CN105182378A (en) | LLL (Lenstra-Lenstra-LovaszLattice) ambiguity decorrelation algorithm | |
CN107612523A (en) | A kind of FIR filter implementation method based on software checking book method | |
CN108736935B (en) | Universal descent search method for large-scale MIMO system signal detection | |
CN104933080B (en) | A kind of method and device of determining abnormal data | |
Guenther et al. | A scalable, multimode SVD precoding ASIC based on the cyclic Jacobi method | |
CN105846873B (en) | Triangle systolic array architecture QR decomposer and decomposition method based on advanced iterative | |
CN110110285B (en) | Parallel Jacobi calculation acceleration implementation method for FPGA | |
CN111198670B (en) | Method, circuit and SOC for executing matrix multiplication operation | |
CN102567283B (en) | Method for small matrix inversion by using GPU (graphic processing unit) | |
Auger et al. | Multiplier-free divide, square root, and log algorithms [DSP tips and tricks] | |
CN108960420A (en) | Processing method and accelerator | |
CN105847200B (en) | Iteration structure QR decomposer and decomposition method based on advanced iterative | |
CN104360986B (en) | A kind of implementation method of parallelization matrix inversion hardware unit | |
Liu et al. | A high speed VLSI implementation of 256-bit scalar point multiplier for ECC over GF (p) | |
CN105335784B (en) | A kind of method of the optimal soft protection of dsp system of selection based on genetic algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |