CN105846873A - Triangular systolic array structure QR decomposition device based on advanced iteration and decomposition method thereof - Google Patents
Triangular systolic array structure QR decomposition device based on advanced iteration and decomposition method thereof Download PDFInfo
- Publication number
- CN105846873A CN105846873A CN201610173392.0A CN201610173392A CN105846873A CN 105846873 A CN105846873 A CN 105846873A CN 201610173392 A CN201610173392 A CN 201610173392A CN 105846873 A CN105846873 A CN 105846873A
- Authority
- CN
- China
- Prior art keywords
- signal
- module
- output
- multiplier
- input
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04B—TRANSMISSION
- H04B7/00—Radio transmission systems, i.e. using radiation field
- H04B7/02—Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas
- H04B7/04—Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas
- H04B7/0413—MIMO systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04B—TRANSMISSION
- H04B7/00—Radio transmission systems, i.e. using radiation field
- H04B7/02—Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas
- H04B7/04—Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas
- H04B7/08—Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the receiving station
- H04B7/0837—Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the receiving station using pre-detection combining
- H04B7/0842—Weighted combining
- H04B7/0848—Joint weighting
- H04B7/0854—Joint weighting using error minimizing algorithms, e.g. minimum mean squared error [MMSE], "cross-correlation" or matrix inversion
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Complex Calculations (AREA)
Abstract
A triangular systolic array structure QR decomposition device based on advanced iteration and decomposition method thereof are used for performing QR decomposition on an n*n matrix A. The triangular systolic array structure QR decomposition device comprises diagonal processing modules, iteration processing modules and triangular processing modules. A first diagonal processing module receives a first column vector a1 of the matrix A from outside. Results q1 and r11 are used as output of a QR decomposition module. Furthermore the q1 is output to a next triangular processing module. A generated rjj2 signal is output to all iteration processing modules in the first step. A (j-1)th iteration processing module utilizes a j-th column vector aj of the matrix A received from outside, the first column vector a1 of the matrix A and the rjj2 signal output from the first diagonal processing module as input, thereby obtaining a j-th column vector aj1 of a next iteration matrix A1. An so on, the output signal rn-1,n of the QR decomposition module is obtained after processing of the triangular processing modules. The decomposition method is applied based on the decomposition device. The triangular systolic array structure QR decomposition device and the decomposition method thereof have advantages of simple principle, high decomposition speed, high efficiency, etc.
Description
Technical field
Present invention relates generally to wireless communication system base band signal process field, refer in particular to a kind of based on advanced iterative three
Angle systolic array architecture QR decomposer and decomposition method.
Background technology
OFDM (OFDM, orthogonal frequency division multiplexing) technology is with many
Input multi output technology (MIMO, multiple input multiple output) technology because of its have high spectrum utilization and
High transfer rate is paid close attention to widely, and a series of progress about precoding technique make based on MIMO-in recent years
The multi-user wireless communication system of OFDM technology can realize servicing for multiple users simultaneously.But based on MIMO-OFDM technology
Multi-user wireless communication system base band signal process algorithm computation complexity be greatly increased, the design to baseband signal processor
Propose unprecedented challenge.
In base band signal process link based on MIMO-OFDM wireless communication system, precoding algorithms and MIMO detection
Algorithm is complex two base band signal process algorithms, obtains the extensive concern of researcher in recent years.Nineteen eighty-three, Costa
The dirty paper code algorithm proposed in its classic paper " Writing on dirty paper " (" dirty paper code ") is considered as
The nonlinear precoding algorithm that performance is best, but its computation complexity is the highest, as a consequence it is hardly possible on hardware circuit in real time
Ground performs, and Wei Yu in 2005 et al. is at its paper " Trellis and Convolutional Precoding for
Transmitter-Based Interference Presubtraction " (" based on grid and the transmitter of convolution precoding
Interference is pre-to be eliminated ") in THP (Tomlinson-Harashima Precoding) algorithm for nonlinear precoding and is obtained
Preferable interference eradicating efficacy, although its performance relatively dirty paper code algorithm decreases, but its computation complexity drops significantly
Low so that hardware realizes nonlinear precoding algorithm and is possibly realized, the part the highest at THP algorithm complexity is to letter
Road matrix H performs the part that QR decomposes, and the most quickly QR Knock-Down Component is favorably improved THP precoding algorithms overall performance.
Maximum-likelihood estimation is the algorithm that MIMO detects that in all algorithms, accuracy of detection is the highest, but its computation complexity is suitable
Height, therefore, M.Shabany et al. is at " A 0.13 μm CMOS 655Mb/s 4 × 4 64-QAM k-best MIMO
Detector " (" when using 64-QAM modulation system under 0.13 μm CMOS technology, 4 × 4MIMO detector of 655Mb/s sets
Meter ") in use approximate data spherical detection (SD) algorithm of maximum-likelihood estimation to carry out MIMO detection, achieve well
Detection results, QR decomposes one of bottleneck as SD algorithm, governs it and performs speed.
It is widely used in multi-user's baseband signal processor based on MIMO-OFDM technology owing to QR decomposes,
And be the bottleneck restricting processing speed in the case of a lot, therefore, in the design of a lot of baseband signal processors, QR is decomposed and make
It is that an important arithmetic unit is optimized.So-called QR decomposes, it is simply that the matrix A of n × n is decomposed into unitary matrice Q of n × n
With the upper triangular matrix R of n × n, current QR decomposition algorithm is broadly divided into three classes, be based respectively on Householder conversion,
Given rotates and MGS (modified Gram-Schmidt) algorithm, owing to the QR converted based on Householder decomposes very
Difficult with hardware realization, so using less, although QR decomposition algorithm based on Given rotation greatly reduces used hardware
Resource, but its required execution time is longer, does not meets the requirement of communication system real-time, and QR based on MGS algorithm decomposes
Because taking, hardware resource is less and the shorter actual demand meeting communication system of execution time.
Practitioner R.-H.Chang et al. is had to publish an article " Iterative QR decomposition
architecture using the modified Gram-Schmidt algorithm for MIMO systems”
(" Iterative QR Decomposition structure based on MGS algorithm in mimo system ") proposes a kind of triangle systolic arrays based on MGS algorithm
Structure QR decomposes hardware circuit, and the QR completing a n (n is the positive integer more than or equal to 2) rank square formation decomposes, the triangle proposed
Systolic array architecture QR decomposition circuit only needs 2n-1 time quantum.When specifically applying, R.-H.Chang et al. is used to propose
Triangle systolic array architecture QR decomposition circuit the matrix A of 4 × 4 is carried out QR decomposition, for the matrix of 4 × 4, make
Carrying out QR decomposition with iteration structure based on MGS algorithm needs seven steps to complete, and each step needs a time quantum, needs altogether
Want seven time quantums.As can be seen here, although the QR decomposition method of the triangle systolic array architecture that R.-H.Chang et al. proposes
Greatly reduce the calculating time, but the base band signal process of practical communication system intentionally gets the faster QR of speed and decomposes knot
Structure.And the document that the QR the most only relating to 4 × 4 matrixes decomposes, there is not the QR of the n × n matrix of announcement to decompose hard
Part circuit.
Summary of the invention
The technical problem to be solved in the present invention is that the technical problem existed for prior art, and the present invention provides one
The realization simple, easy of kind principle, the triangle systolic array architecture QR decomposer based on advanced iterative that decomposition rate is fast, efficiency is high
And decomposition method.
For solve above-mentioned technical problem, the present invention by the following technical solutions:
A kind of triangle systolic array architecture QR decomposer based on advanced iterative, is used for the matrix A of n × n is carried out QR
Decomposing, it includes diagonal angle processing module, iterative processing module and triangulation process module;Wherein, n diagonal angle processing module, (n-1)
+ (n-2)+...+1=n × (n-1)/2 iterative processing module, when n is even number, use n/2+ (n-2)+(n-4)+(n-6)
+ ...+2=n2/4 triangulation process module, when n is odd number, employing (n-1)+(n-3)+(n-5)+...+2=(n+1)
(n-1)/4 triangulation process modules;First diagonal angle processing module is received externally first column vector a of matrix A1, meter
Calculate result q1And r11As the output of whole QR decomposing module, and by q1Output, to next step triangulation process module, is calculating
During calculate the r of generationjj 2Signal exports all iterative processing module in the first step;-1 iterative processing module of jth will
It is externally received jth column vector a of matrix Aj, wherein j is less than or equal to n-1, first column vector a of matrix A more than or equal to 21
R with first diagonal angle processing module outputjj 2As input, it is calculated next iteration matrix A1Jth column vector
aj 1, wherein a1 1As the input of second diagonal angle processing module, A1Remaining column vector as second step iterative processing module
As the input of the 3rd step triangulation process module while input;By that analogy, obtain finally by after triangulation process resume module
Output signal r to QR decomposing modulen-1,。
Further improvement as decomposer of the present invention: i-th diagonal angle processing module is to a from the i-th-1 step outputi i-1
Signal carries out being calculated the output r of QR decomposing moduleiiAnd qi, and it has been calculated rii 2, wherein qiVector as next step three
The input of angle processing module, rii 2As the input of iterative processing module all in the i-th step, iterative processing module connects from the i-th-1 step
Receive column vector ai i-1Signal and ai1 i-1Signal, and obtain r from diagonal angle processing moduleii 2Signal, as input, obtains after process
Iterative Matrix A next timeiThe i-th 1 column vectors, wherein ai+1 iAs the input of i+1 diagonal angle processing module, AiIts
Remaining column vector input as next step iteration module while as the input of the i-th+2 step triangulation process module, triangulation process mould
Block receives input signal q from the i-th-1 stepi-1While receive a from the i-th-2 stepi2 i-2Signal and ai2+1 i-2Signal, processes it
After obtain output signal r of QR decomposing modulei-1,i2Signal and ri-1,i2+1;
N-th diagonal angle processing module is to a from n-1 step outputn n-1Signal carries out process and obtains QR decomposing module output letter
Number rnnAnd qn, 4 triangulation process modules of kth receive signal q from n-1 stepn-1And receive signal a from n-2 stepn n-2, after process
Obtain output signal r of QR decomposing modulen-1,n。
Further improvement as decomposer of the present invention: described diagonal angle processing module includes multiplier, adder, radical sign
Operator block and divider, multiplier e is received externally input vector ajThe e element, wherein e is more than or equal to 1 little
In equal to n, after it is carried out involution process, output is to adder, and adder receives signal from multiplier 1 to multiplier n, enters
As output signal r of whole module while after row accumulation process, radical sign operator block is arrived in outputjj 2, radical sign computing
Device module is after adder receives signal, and after carrying out extraction of square root process, output arrives divider n as divider to divider 1
The divisor of 1 to divider n, simultaneously as output signal r of whole modulejj, divider e1 is received externally input vector aj
The e1 element as dividend, and using the signal that receives arithmetical unit from radical sign as divisor, wherein e1 is little more than or equal to 1
In equal to n, operation result is as whole module output vector qj2The e1 element.
Further improvement as decomposer of the present invention: described iterative processing module includes that first shares hardware, first
Shared hardware contains a MUX and multiplier to multiplier n, and MUX is that multiplier 1 selects to multiplier n
Selecting different inputs as multiplier, MUX is from outside aj3 pThe output signal of vector sum divider receive input into
Row outputs results to multiplier 1 and arrives multiplier n after selecting, when enabling signal and being ' 0 ', multiplier e2 receives from MUX
The signal arrived is as a multiplier, and wherein e2 is less than or equal to n more than or equal to 1, from a of external receptionj pThe e2 element of vector
As another multiplier, after carrying out multiplication operation, result is exported adder Module, adder Module from multiplier 1 to multiplication
Device n receives input signal, and after carrying out accumulation process, output is to divider module, and divider receives from adder Module
Signal is as dividend, the signal r being received externallyjj 2As divisor, after carrying out division operation, output is to MUX 1
Input, when enabling signal and be ' 1 ', operation result is exported subtractor e3 by multiplier e2, and wherein e3 is more than or equal to 1 and is less than
Receive signal as subtrahend equal to n, subtractor e3 from multiplier e2, be received externally aj3 pThe e3 element of signal is made
For minuend, after carrying out subtracting each other process, result is as whole module output signal aj3 p+1The e3 element of vector.
Further improvement as decomposer of the present invention: described triangulation process module includes that second shares hardware, multichannel
The input of selector 1 is respectively aj3N element of vector and aj3+1N element of vector, when MUX enables signal be
When ' 0 ', MUX 1 gates aj3The element of vector exports multiplier 1
When ' 1 ', MUX 1 gates aj3+1The element of vector exports multiplier 1 and arrives multiplier n, and multiplier e4 is from multi-path choice
The data that device receives, as a multiplier, are received externally qj2The e4 element of vector, as another multiplier, is carried out
After multiplication operation, output is to adder, and adder carries out accumulating operation after multiplier receives signal, works as MUX
When enable signal is ' 0 ', accumulator output signal is as output signal r of triangulation process modulej2,j3, when MUX enables
When signal is ' 1 ', accumulator output signal is as output signal r of triangulation process modulej2,j3+1。
A kind of QR decomposition method based on above-mentioned decomposer, the steps include:
Step S1: n column vector a of matrix A1,……anAs the input signal of QR decomposing module, a1As first
The input of diagonal angle processing module, diagonal angle processing module is output as r11, and q1, iterative processing module calculates iteration square next time
Battle array, its input is a1And aj, wherein 1 < j < n+1, j is positive integer, is output as Iterative Matrix a next timej 1;
Step S2~Sj step: j are less than n, signal a jth-1 step inputted more than or equal to 2j j-2..., an j-2And jth-
The signal q of 1 step outputj-1, aj j-1..., an j-1As the input signal of second step, wherein aj j-1As diagonal angle processing module
Input, is used for calculating rjjAnd qj, the input signal of 3 diagonal angle processing modules of kth is qj-1, aj3 j-2And aj3+1 j-2;When n-j is strange
During number, j3 is more than or equal to j less than or equal to n-1 positive integer, and when n-j is even number, j3 is to be less than or equal to the most whole of n more than or equal to j
Number;For calculating rJ-1, j3And rJ-1, j3+1, similar with the first step, iterative processing module is used for the Iterative Matrix calculated next time, its
Input is aj j-1..., an j-1, it is output as aj+1 j..., an j; (2)
Step Sn: by the input a of the (n-1)th stepn n-2And the (n-1)th output q of stepn-1And an n-1As input, wherein an n-1
As the input of block1, block1 is output as rn,nAnd qn, qn-1And an n-2As the input of block3, the output of block3
For rn-1,n。
Compared with prior art, it is an advantage of the current invention that: triangle systolic arrays based on the advanced iterative knot of the present invention
Structure QR decomposer and decomposition method, principle simply, easily realize, can dramatically speed up the speed that QR decomposes;For a n × n
Carry out QR decomposition, the carried structure of the present invention only needs n time quantum to complete, and uses R.-H.Chang et al. to propose
Triangle systolic array architecture need 2n-1 time quantum, as aforesaid 4 × 4 matrix A, use the present invention carry out QR
Decompose, it is only necessary to 4 time quantums can complete, and compares 7, has lacked 3 time quantums.
Accompanying drawing explanation
Fig. 1 is the topological structure schematic diagram of decomposer of the present invention.
Fig. 2 is present invention structural principle schematic diagram of diagonal angle processing module in concrete application example.
Fig. 3 is present invention structural principle schematic diagram of iterative processing module in concrete application example.
Fig. 4 is the present invention structural principle schematic diagram in concrete application example intermediate cam processing module.
Detailed description of the invention
Below with reference to Figure of description and specific embodiment, the present invention is described in further details.
As it is shown in figure 1, present invention triangle based on advanced iterative systolic array architecture QR decomposer, it is used for n × n's
Matrix A carries out QR decomposition, and it includes diagonal angle processing module, iterative processing module and triangulation process module;Wherein, at n diagonal angle
Reason module, (n-1)+(n-2)+...+1=n × (n-1)/2 iterative processing module, when n is even number, need n/2+ (n-2)
+ (n-4)+(n-6)+...+2=n2/4 triangulation process module, when n is odd number, need (n-1)+(n-3)+(n-5)
+ ...+2=(n+1) (n-1)/4 triangulation process module composition.
First diagonal angle processing module is received externally first column vector a of matrix A1, result of calculation q1And r11Make
For the output of whole QR decomposing module, and by q1Output, to next step triangulation process module, calculates during calculating and produces
Rjj 2Signal exports all iterative processing module in the first step;Jth-1 (j is less than or equal to n-1 more than or equal to 2) individual iteration
Processing module will be externally received jth column vector a of matrix Aj, first column vector a of matrix A1Process with first diagonal angle
The r of module outputjj 2As input, it is calculated next iteration matrix A1Jth column vector aj 1, wherein a1 1As second
The input of individual diagonal angle processing module, A1Remaining column vector as while the input of second step iterative processing module as the 3rd
The input of step triangulation process module.Iterative processing module needs rjj 2During signal, first diagonal angle processing module is by rjj 2Letter
Number calculating completes, so diagonal angle processing module and iterative processing module can be with executed in parallel;
I-th diagonal angle processing module is to a from the i-th-1 step outputi i-1Signal carries out being calculated the defeated of QR decomposing module
Go out riiAnd qi, and it has been calculated rii 2, wherein qiVector is as the input of next step triangulation process module, rii 2As the i-th step
In the input of all iterative processing module, iterative processing module receives column vector a from the i-th-1 stepi i-1Signal and ai1 i-1Signal,
And obtain r from diagonal angle processing moduleii 2Signal, as input, obtains Iterative Matrix A next time after processiThe i-th 1 row
Vector, wherein ai+1 iAs the input of i+1 diagonal angle processing module, AiRemaining column vector defeated as next step iteration module
As the input of the i-th+2 step triangulation process module while entering, triangulation process module receives input signal q from the i-th-1 stepi-1
While receive a from the i-th-2 stepi2 i-2Signal and ai2+1 i-2Signal, obtains the output signal of QR decomposing module after process
ri-1,i2Signal and ri-1,i2+1;
N-th diagonal angle processing module is to a from n-1 step outputn n-1Signal carries out process and obtains QR decomposing module output letter
Number rnnAnd qn, 4 triangulation process modules of kth receive signal q from n-1 stepn-1And receive signal a from n-2 stepn n-2, after process
Obtain output signal r of QR decomposing modulen-1,n。
As in figure 2 it is shown, in concrete application example, diagonal angle processing module includes multiplier, adder, radical sign mould arithmetical unit
Block and divider, multiplier e (e is less than or equal to n more than or equal to 1) is received externally input vector ajThe e element, to it
After carrying out involution process, output is to adder, and adder receives signal from multiplier 1 to multiplier n, after carrying out accumulation process
Output is to output signal r as whole module while radical sign operator blockjj 2, radical sign operator block is from addition
After device receives signal, after carrying out extraction of square root process, output arrives divider n to divider n as divider 1 to divider 1
Divisor, simultaneously as output signal r of whole modulejj, divider e1 (e1 is less than or equal to n more than or equal to 1) is received externally
Input vector ajThe e1 element as dividend, and the signal received arithmetical unit from radical sign is tied as divisor, computing
Fruit is as whole module output vector qj2The e1 element.
Above-mentioned diagonal angle processing module is for calculating 2 column vectors q of jth of Q matrixj2, the diagonal entry r of R matrixjjWith
And square r of diagonal entryjj 2, wherein rjj 2The input of iterative processing module will be used for, owing to diagonal angle processing module is defeated
Go out rjj 2Moment and iterative processing module need to use rjj 2Moment identical, so two modules can with executed in parallel, thus
Improve the speed that QR decomposes.
As it is shown on figure 3, in concrete application example, iterative processing module includes that first shares hardware, and first shares hardware
Containing a MUX and multiplier 1 arrives multiplier n, MUX is that multiplier 1 selects different to multiplier n
Inputting as multiplier, MUX is from outside aj3 pThe output signal of vector sum divider receives after input selects
Output results to multiplier 1 to multiplier n, when enabling signal and be ' 0 ', multiplier e2 (e2 is less than or equal to n more than or equal to 1) from
The signal that MUX receives is as a multiplier, from a of external receptionj pThe e2 element of vector is taken advantage of as another
Number, exports adder Module by result after carrying out multiplication operation, adder Module receives defeated from multiplier 1 to multiplier n
Entering signal, carry out after accumulation process output to divider module, the signal that divider receives from adder Module is as quilt
Divisor, the signal r being received externallyjj 2As divisor, after carrying out division operation, output arrives the input of MUX 1, when making
When energy signal is ' 1 ', operation result is exported subtractor e3 (e3 is less than or equal to n more than or equal to 1), subtractor e3 by multiplier e2
Receive signal as subtrahend from multiplier e2, be received externally aj3 pThe e3 element of signal, as minuend, carries out phase
After subtracting process, result is as whole module output signal aj3 p+1The e3 element of vector.
Above-mentioned iterative processing module arranges for the jth 3 calculating next iteration matrix, needs to use first altogether shown in figure
Two positions enjoying hardware module are separate, it is possible to save hardware resource by the timesharing technology of sharing of hardware.
As shown in Figure 4, in concrete application example, triangulation process module includes that second shares hardware, MUX 1
Input is respectively aj3N element of vector and aj3+1N element of vector, when MUX enable signal is ' 0 ', multichannel
Selector 1 gates aj3The element of vector exports multiplier 1 and arrives multiplier n, when MUX enable signal is ' 1 ', and multichannel
Selector 1 gates aj3+1The element of vector exports multiplier 1 and receives from MUX to multiplier n, multiplier e4
Data, as a multiplier, are received externally qj2The e4 element of vector is as another multiplier, after carrying out multiplication operation
Output is to adder, and adder carries out accumulating operation after multiplier receives signal, when MUX enables signal is
When ' 0 ', accumulator output signal is as output signal r of triangulation process modulej2,j3, it is ' 1 ' when MUX enables signal
Time, accumulator output signal is as output signal r of triangulation process modulej2,j3+1。
Above-mentioned triangulation process module is used for calculating matrix R and is positioned at coordinate [j2, j3] and the element at coordinate [j2, j3+1] place,
Fig. 4 Yu Fig. 2, the contrast of Fig. 3 understand, and the time of coordinates computed [j2, j3] place element value is less than the second basic module and the first base
This module performs the 50% of time, in the present invention that the hardware resource timesharing of coordinates computed [j2, j3] place element value is the most multiple
With, reach to save the purpose of hardware resource.
The present invention further provides a kind of decomposition method based on above-mentioned decomposer, the matrix A of a n × n is used
The circuit of above-mentioned decomposer carries out QR decomposition to be needed to walk through n altogether, and it concretely comprises the following steps:
Step S1: n column vector a of matrix A1,……anAs the input signal of QR decomposing module, a1As first
The input of diagonal angle processing module, diagonal angle processing module is output as r11, and q1, iterative processing module calculates iteration square next time
Battle array, its input is a1And aj(1 < j < n+1, j is positive integer), is output as Iterative Matrix a next timej 1.Each defeated in the first step
Go out shown in the value such as formula (1) of signal;
From step S1 it appeared that maximum different of the present invention and traditional QR decomposition method being, jump ahead of the present invention
Having calculated Iterative Matrix next time, why traditional QR decomposition calculates next iteration matrix at second step is because repeatedly
The needs that calculate for matrix use the output result of the first step, and the present invention uses the first step by improving traditional method
Input calculates Iterative Matrix next time, substantially increases QR decomposition rate.
Step S2~Sj step: j are less than n, signal a jth-1 step inputted more than or equal to 2j j-2..., an j-2And jth-
The signal q of 1 step outputj-1, aj j-1..., an j-1As the input signal of second step, wherein aj j-1As diagonal angle processing module
Input, is used for calculating rjjAnd qj, the input signal of 3 diagonal angle processing modules of kth is qj-1, aj3 j-2And aj3+1 j-2(when n-j is strange
During number, j3 is more than or equal to j less than or equal to n-1 positive integer, and when n-j is even number, j3 is to be less than or equal to the most whole of n more than or equal to j
Number), it is used for calculating rJ-1, j3And rJ-1, j3+1, similar with the first step, iterative processing module is used for the Iterative Matrix calculated next time,
Its input is aj j-1..., an j-1, it is output as aj+1 j..., an j, jth step respectively exports as shown in formula (2);
Step Sn: by the input a of the (n-1)th stepn n-2And the (n-1)th output q of stepn-1And an n-1As input, wherein an n-1
As the input of block1, block1 is output as rn,nAnd qn, qn-1And an n-2As the input of block3, the output of block3
For rn-1,n, the n-th step respectively exports as shown in formula (3);
From the foregoing, it will be observed that carry out QR decomposition for a n × n, the carried structure of the present invention only needs n time quantum
Complete, and the triangle systolic array architecture using R.-H.Chang et al. to propose needs 2n-1 time quantum, as aforementioned
4 × 4 matrix A, use the present invention carry out QR decomposition, it is only necessary to 4 time quantums can complete, and compares 7, has lacked 3
Time quantum.Therefore, the present invention is carried triangle systolic array architecture QR based on advanced iterative and decomposes and can dramatically speed up QR and divide
The speed solved.
Below being only the preferred embodiment of the present invention, protection scope of the present invention is not limited merely to above-described embodiment,
All technical schemes belonged under thinking of the present invention belong to protection scope of the present invention.It should be pointed out that, for the art
For those of ordinary skill, some improvements and modifications without departing from the principles of the present invention, should be regarded as the protection of the present invention
Scope.
Claims (6)
1. a triangle systolic array architecture QR decomposer based on advanced iterative, is used for that the matrix A of n × n is carried out QR and divides
Solve, it is characterised in that it includes diagonal angle processing module, iterative processing module and triangulation process module;Wherein, n diagonal angle processes
Module, (n-1)+(n-2)+...+1=n × (n-1)/2 iterative processing module, when n is even number, employing n/2+ (n-2)+
(n-4)+(n-6)+...+2=n2/ 4 triangulation process modules, when n is odd number, employing (n-1)+(n-3)+(n-5)+...
+ 2=(n+1) (n-1)/4 triangulation process module;First diagonal angle processing module is received externally first row of matrix A
Vector a1, result of calculation q1And r11As the output of whole QR decomposing module, and by q1Output is to next step triangulation process mould
Block, calculates the r of generation during calculatingjj 2Signal exports all iterative processing module in the first step;-1 iteration of jth
Processing module will be externally received jth column vector a of matrix Aj, wherein j more than or equal to 2 less than or equal to n-1, the of matrix A
One column vector a1R with first diagonal angle processing module outputjj 2As input, it is calculated next iteration matrix A 1
Jth column vector aj 1, wherein a1 1As the input of second diagonal angle processing module, A1Remaining column vector as second step iteration
As the input of the 3rd step triangulation process module while the input of processing module;By that analogy, finally by triangulation process mould
Block obtains output signal r of QR decomposing module after processingn-1,n。
Triangle systolic array architecture QR decomposer based on advanced iterative the most according to claim 1, it is characterised in that
I-th diagonal angle processing module is to a from the i-th-1 step outputi i-1Signal carries out being calculated the output r of QR decomposing moduleiiAnd qi,
And it has been calculated rii 2, wherein qiVector is as the input of next step triangulation process module, rii 2As iteration all in the i-th step
The input of processing module, iterative processing module receives column vector a from the i-th-1 stepi i-1Signal and ai1 i-1Signal, and at diagonal angle
Reason module obtains rii 2Signal, as input, obtains Iterative Matrix A next time after processiThe i-th 1 column vectors, wherein
ai+1 iAs the input of i+1 diagonal angle processing module, AiRemaining column vector as next step iteration module input while make
Being the input of the i-th+2 step triangulation process module, triangulation process module receives input signal q from the i-th-1 stepi-1While from
I-2 step receives ai2 i-2Signal and ai2+1 i-2Signal, obtains output signal r of QR decomposing module after processi-1,i2Signal and
ri-1,i2+1;
N-th diagonal angle processing module is to a from n-1 step outputn n-1Signal carries out process and obtains QR decomposing module output signal rnn
And qn, 4 triangulation process modules of kth receive signal q from n-1 stepn-1And receive signal a from n-2 stepn n-2, obtain after process
Output signal r of QR decomposing modulen-1,n。
Triangle systolic array architecture QR decomposer based on advanced iterative the most according to claim 1 and 2, its feature exists
In, described diagonal angle processing module includes multiplier, adder, radical sign operator block and divider, and multiplier e is from external reception
To input vector ajThe e element, wherein e is more than or equal to 1 less than or equal to n, and after it is carried out involution process, output is to addition
Device, adder receives signal from multiplier 1 to multiplier n, and after carrying out accumulation process, the same of radical sign operator block is arrived in output
Time as output signal r of whole modulejj 2, radical sign operator block, after adder receives signal, carries out out flat
Side process after output to divider 1 to divider n as the divisor of divider 1 to divider n, defeated simultaneously as whole module
Go out signal rjj, divider e1 is received externally input vector ajThe e1 element as dividend, and will be from radical sign computing
The signal that device receives is as divisor, and wherein e1 is more than or equal to 1 less than or equal to n, and operation result is as whole module output vector
qj2The e1 element.
Triangle systolic array architecture QR decomposer based on advanced iterative the most according to claim 1 and 2, its feature exists
In, described iterative processing module includes that first shares hardware, and first shares hardware contains a MUX and multiplier
To multiplier n, MUX is that multiplier 1 selects different inputs as multiplier to multiplier n, and MUX is from outside
Aj3 pThe output signal of vector sum divider receives and outputs results to multiplier 1 after input selects to multiplier n, when
Enabling signal when be ' 0 ', the signal that multiplier e2 receives from MUX is as a multiplier, and wherein e2 is more than or equal to 1
Less than or equal to n, from a of external receptionj pThe e2 element of vector is as another multiplier, after carrying out multiplication operation, result is defeated
Going out to adder Module, adder Module receives input signal from multiplier 1 to multiplier n, defeated after carrying out accumulation process
Going out to divider module, the signal that divider receives from adder Module is as dividend, the signal r being received externallyjj 2
As divisor, after carrying out division operation, the input of MUX 1 is arrived in output, and when enabling signal and being ' 1 ', multiplier e2 will transport
Calculating result and export subtractor e3, wherein e3 receives signal work less than or equal to n, subtractor e3 from multiplier e2 more than or equal to 1
For subtrahend, it is received externally aj3 pThe e3 element of signal is as minuend, and after carrying out subtracting each other process, result is as whole mould
Block output signal aj3 p+1The e3 element of vector.
Triangle systolic array architecture QR decomposer based on advanced iterative the most according to claim 4, it is characterised in that
Described triangulation process module includes that second shares hardware, and the input of MUX 1 is respectively aj3N element of vector and aj3+1
N element of vector, when MUX enable signal is ' 0 ', MUX 1 gates aj3The element of vector exports to be taken advantage of
Musical instruments used in a Buddhist or Taoist mass 1 is to multiplier n, and when MUX enable signal is ' 1 ', MUX 1 gates aj3+1The element of vector exports
The data that multiplier 1 receives to multiplier n, multiplier e4 from MUX, as a multiplier, are received externally qj2
The e4 element of vector is as another multiplier, and after carrying out multiplication operation, output receives to adder, adder from multiplier
Carrying out accumulating operation after signal, when MUX enable signal is ' 0 ', accumulator output signal is as triangulation process
Output signal r of modulej2,j3, when MUX enable signal is ' 1 ', accumulator output signal is as triangulation process module
Output signal rj2,j3+1。
6. one kind based on the QR decomposition method of any one decomposer in the claims 1~5, it is characterised in that step
For:
Step S1: n column vector a of matrix A1,……anAs the input signal of QR decomposing module, a1As first diagonal angle
The input of processing module, diagonal angle processing module is output as r11, and q1, iterative processing module calculates Iterative Matrix next time,
Its input is a1And aj, wherein 1 < j < n+1, j is positive integer, is output as Iterative Matrix a next timej 1;
Step S2~Sj step: j are less than n, signal a jth-1 step inputted more than or equal to 2j j-2..., an j-2And jth-1 step
The signal q of outputj-1, aj j-1..., an j-1As the input signal of second step, wherein aj j-1Defeated as diagonal angle processing module
Enter, be used for calculating rjjAnd qj, the input signal of 3 diagonal angle processing modules of kth is qj-1, aj3 j-2And aj3+1 j-2;When n-j is odd number
Time, j3 is more than or equal to j less than or equal to n-1 positive integer, and when n-j is even number, j3 is the positive integer being less than or equal to n more than or equal to j;
For calculating rJ-1, j3And rJ-1, j3+1, similar with the first step, iterative processing module is used for the Iterative Matrix calculated next time, and it is defeated
Enter for aj j-1..., an j-1, it is output as aj+1 j..., an j;
Step Sn: by the input a of the (n-1)th stepn n-2And the (n-1)th output q of stepn-1And an n-1As input, wherein an n-1As
The input of block1, block1 is output as rn,nAnd qn, qn-1And an n-2As the input of block3, block3 is output as
rn-1,n。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610173392.0A CN105846873B (en) | 2016-03-24 | 2016-03-24 | Triangle systolic array architecture QR decomposer and decomposition method based on advanced iterative |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610173392.0A CN105846873B (en) | 2016-03-24 | 2016-03-24 | Triangle systolic array architecture QR decomposer and decomposition method based on advanced iterative |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105846873A true CN105846873A (en) | 2016-08-10 |
CN105846873B CN105846873B (en) | 2018-12-18 |
Family
ID=56583444
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610173392.0A Active CN105846873B (en) | 2016-03-24 | 2016-03-24 | Triangle systolic array architecture QR decomposer and decomposition method based on advanced iterative |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105846873B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113779501A (en) * | 2021-08-23 | 2021-12-10 | 华控清交信息科技(北京)有限公司 | Data processing method and device and data processing device |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101170525A (en) * | 2006-10-25 | 2008-04-30 | 中兴通讯股份有限公司 | MLSE simplification detection method and its device based on blocked QR decomposition |
US20090154608A1 (en) * | 2007-12-18 | 2009-06-18 | Electronics And Telecommunications Research Institute | Receiving apparatus and method for mimo system |
CN101674160A (en) * | 2009-10-22 | 2010-03-17 | 复旦大学 | Signal detection method and device for multiple-input-multiple-output wireless communication system |
-
2016
- 2016-03-24 CN CN201610173392.0A patent/CN105846873B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101170525A (en) * | 2006-10-25 | 2008-04-30 | 中兴通讯股份有限公司 | MLSE simplification detection method and its device based on blocked QR decomposition |
US20090154608A1 (en) * | 2007-12-18 | 2009-06-18 | Electronics And Telecommunications Research Institute | Receiving apparatus and method for mimo system |
CN101674160A (en) * | 2009-10-22 | 2010-03-17 | 复旦大学 | Signal detection method and device for multiple-input-multiple-output wireless communication system |
Non-Patent Citations (1)
Title |
---|
朱勇旭,吴斌,周玉梅,蔡菁菁,夏凯锋: ""用于MIMO-OFDM系统QR分解的分布式脉动阵列处理算法"", 《电子与信息学报》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113779501A (en) * | 2021-08-23 | 2021-12-10 | 华控清交信息科技(北京)有限公司 | Data processing method and device and data processing device |
CN113779501B (en) * | 2021-08-23 | 2024-06-04 | 华控清交信息科技(北京)有限公司 | Data processing method and device for data processing |
Also Published As
Publication number | Publication date |
---|---|
CN105846873B (en) | 2018-12-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Yepez et al. | Stride 2 1-D, 2-D, and 3-D Winograd for convolutional neural networks | |
CN107704916B (en) | Hardware accelerator and method for realizing RNN neural network based on FPGA | |
CN110263925B (en) | Hardware acceleration implementation device for convolutional neural network forward prediction based on FPGA | |
CN108733348B (en) | Fused vector multiplier and method for performing operation using the same | |
CN101504638B (en) | Point-variable assembly line FFT processor | |
CN106445471A (en) | Processor and method for executing matrix multiplication on processor | |
CN110543939B (en) | Hardware acceleration realization device for convolutional neural network backward training based on FPGA | |
CN111966324B (en) | Implementation method and device for multi-elliptic curve scalar multiplier and storage medium | |
CN110276447A (en) | A kind of computing device and method | |
CN108170640A (en) | The method of its progress operation of neural network computing device and application | |
CN102521211B (en) | Parallel device for solving linear equation set on finite field | |
Liu et al. | WinoCNN: Kernel sharing Winograd systolic array for efficient convolutional neural network acceleration on FPGAs | |
CN110163350A (en) | A kind of computing device and method | |
CN110069444A (en) | A kind of computing unit, array, module, hardware system and implementation method | |
CN104360986B (en) | A kind of implementation method of parallelization matrix inversion hardware unit | |
Guenther et al. | A scalable, multimode SVD precoding ASIC based on the cyclic Jacobi method | |
CN110110285B (en) | Parallel Jacobi calculation acceleration implementation method for FPGA | |
CN209708122U (en) | A kind of computing unit, array, module, hardware system | |
CN112799634B (en) | Based on base 2 2 MDC NTT structured high performance loop polynomial multiplier | |
CN107992283A (en) | A kind of method and apparatus that finite field multiplier is realized based on dimensionality reduction | |
CN105846873A (en) | Triangular systolic array structure QR decomposition device based on advanced iteration and decomposition method thereof | |
CN117692126A (en) | Paillier homomorphic encryption method and system based on low-complexity modular multiplication algorithm | |
CN102624653B (en) | Extensible QR decomposition method based on pipeline working mode | |
Hu et al. | Efficient homomorphic convolution designs on FPGA for secure inference | |
CN111401533A (en) | Special calculation array for neural network and calculation method thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |