CN117370717B - Iterative optimization method for binary coordinate reduction - Google Patents

Iterative optimization method for binary coordinate reduction Download PDF

Info

Publication number
CN117370717B
CN117370717B CN202311664792.8A CN202311664792A CN117370717B CN 117370717 B CN117370717 B CN 117370717B CN 202311664792 A CN202311664792 A CN 202311664792A CN 117370717 B CN117370717 B CN 117370717B
Authority
CN
China
Prior art keywords
matrix
iteration
input
bit
dcd
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311664792.8A
Other languages
Chinese (zh)
Other versions
CN117370717A (en
Inventor
刘保
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhuhai Chixin Semiconductor Co ltd
Original Assignee
Zhuhai Chixin Semiconductor Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhuhai Chixin Semiconductor Co ltd filed Critical Zhuhai Chixin Semiconductor Co ltd
Priority to CN202311664792.8A priority Critical patent/CN117370717B/en
Publication of CN117370717A publication Critical patent/CN117370717A/en
Application granted granted Critical
Publication of CN117370717B publication Critical patent/CN117370717B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/11Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Operations Research (AREA)
  • Computing Systems (AREA)
  • Complex Calculations (AREA)

Abstract

The invention relates to an iterative optimization method for bipartite coordinate descent, which is characterized in that V=H and B=H are firstly carried out H V, reducing B and identity matrix I by iteration K The difference between them allows only column operations. In the column operation, the same operation is applied to each element in a column of the matrix, finding a matrix V and a given oneM×KThe matrices H have the same dimensions such that H H V=t, where T is another givenK ×KDimensional matrix, thereby reducing computational complexity without requiring first-order matrix inversion (HH) H ) ‑1 Matrix multiplication is performed again, and instead, a matrix V is found by a new DCD iteration as the pseudo-inverse of matrix H.

Description

Iterative optimization method for binary coordinate reduction
Technical Field
The invention relates to the technical field of iterative optimization of bipartite coordinate descent, in particular to an iterative optimization method of bipartite coordinate descent.
Background
The coordinate descent method (Coordinate Descent) is a simple but yet efficient non-gradient optimization algorithm. Unlike the gradient optimization algorithm which finds the minimum value of the function along the direction of the steepest descent of the gradient, the coordinate descent method sequentially minimizes the objective function value along the direction of the coordinate axis.
The related art of the iterative method is related to solving a system of equations, i.e., rh=β;
where R is an N matrix and h and β are N1 vectors, respectively. R and β are known and h is unknown. The technique is widely used, such as for matrix inversion. However, for large-scale systems of equations, direct solutions to the system of equations are too complex.
For this purpose, an indirect method is mostly used, i.e. the following quadratic function, i.e. f (h) =0.5 h, is minimized by means of linear search iterations T Rh-h T The straight line search method includes Conjugate Gradient (CG), coordinate Descent (CD), and bipartite coordinate descent (DCD) iterations.
Where the Coordinates Drop (CD) and euclidean coordinates are chosen as the iteration direction, which greatly simplifies the iteration. If the directions are selected in a circular order, i.e. n=1, …, N, a gaussian-seidel iteration is obtained. One of such loop CD iterationsThe update requires only N multiplications, n+1 additions and 1 division. However, the loop iteration direction is not efficient in solving a system of equations that only requires a small number of updates, such as in adaptive filtering, and therefore a more efficient method of selecting the (preamble) index n is important to accelerate convergence
While DCD iterations avoid division and multiplication in CD iterations by choosing a power of 2 as the step size in each iteration, so that multiplication can be achieved by a simple shift.
When the method is applied to an integrated circuit, how to obtain the pseudo-inverse of the matrix and how to use the pseudo-inverse of the matrix to obtain the optimal solution of the problem are the technical problems which need to be solved urgently, and therefore an iterative optimization method for the decrease of the bipartite coordinates is provided.
Disclosure of Invention
The invention aims to provide an iterative optimization method for binary coordinate reduction.
In order to achieve the above object, the technical scheme of the present invention is as follows.
An iterative optimization method for binary coordinate descent, which is used in an integrated circuit device and is used for realizing DCD iteration, comprises the following iterative steps:
I. reading the designated column B of the two matrices B and V according to the same index i :i And V :i The method comprises the steps of carrying out a first treatment on the surface of the The designated column B :i And V :i Through a shifter and other columns B of the matrix :j And V :j Performing addition and subtraction operation, and updating B by using the operation result :j And V :j
The enabling signal of the operation is generated by a comparator, one input of the comparator is from the shifter, namely one input of the comparator generating the enabling signal is generated by a round robin scheduling module, and the selection of (i, j) in the iterative step is that the I B ii -T ij Maximum;
III, with N M b A bit binary number as an input;
output log 2 An N-bit;
V.M b each of the logical OR gates has N bitsAn input, a jth input of an ith logic OR gate, an ith bit from a jth binary number;
a priority encoder receiving said M b The output of each logical OR gate is taken as input, and log is output 2 M b A bit, e.g., i;
a multiplexer for selecting N M from the output of the priority encoder b Each of the binary digits selects 1 bit, such as the ith bit;
another priority encoder which receives as input all outputs of the multiplexer, outputs log 2 N bits, e.g., j; the iterative method is given a matrix H of M x K dimension and a matrix T of another K x K dimension, and finds a matrix V of M x K dimension so that H H V=t, where H H Is the conjugate transpose of matrix H, the iterative method comprising the steps of:
a. initializing v=v 0
b. By column operation V :j =V :j -αV :i Altering V, where V :i And V :j The ith and jth columns of matrix V, respectively;
c. initializing b=h H V;
d. Through column operation B :j =B :j -αB :i Change B, wherein B :i And B :j The ith and jth columns of matrix B, respectively;
e. in the iterative step i, j is chosen such that |B ij -T ij Maximum; where α is a constant, t=i K Is an identity matrix, V 0 =h; the value of alpha is decreased by a power of 2; the matrix H, T contains complex values, the iteration includes real and imaginary part correlation steps, and in the iteration step i, j is chosen such that |B ij -T ij The real or imaginary part of the i is greatest.
Preferably, the other input of the comparator generating the enable signal is generated by the device.
Preferably, the device input is generated based on a round robin scheduling module.
Preferably, the device comprises a plurality of comparators, B which are to be updated ij And V ij Control logic excluded from the input of the comparator and a plurality of updates B :j And V :j Is provided.
Preferably, the signals generated by the outputs of a plurality of said comparators are all stored, updating B :j And V :j The operation of (a) is performed a certain clock period after the output of the aforementioned comparator.
The invention extends the binary coordinate descent method and its integrated circuit implementation to achieve matrix pseudo-inversion, typically for an MxK matrix H, where M > K, if an MxK matrix V is multiplied by it by a product equal to K xK identity matrix I K Then the M x K matrix V is the pseudo-inverse H of matrix H H V=i, where H H Is the conjugate transpose of matrix H. In Zero Forcing (ZF) beamforming, the combined vector is v=h (HH) H ) -1 Where V is the pseudo-inverse of the matrix H, because H H V=I K . In order to find the pseudo-inverse of this matrix, the present invention proposes an optimization method of DCD iteration, i.e. initially v=h, b=h H V, then decreasing B and identity matrix I by iteration K The difference between them allows only column operations in which each element in a column of the matrix performs the same operation, it being further possible to find a matrix V having the same dimensions as a given mxk matrix H, so that H H V=t, where T is another given matrix of K x K dimensions, to find the matrix pseudo-inverse, and to find the optimal solution of the problem using the matrix pseudo-inverse.
The invention reduces the computational complexity without requiring a first matrix inversion (HH) H ) -1 Then matrix multiplication is performed again, but matrix V is found by new DCD iteration as the pseudo-inverse of matrix H, i.e. it is observed that when M > K, there are more unknown elements in matrix V than H H V=I K The number of passes in the process. However, v=h (H H H) -1 Indicating that V is constrained as a result of the column weighted sum of matrix H. Thus, by means of the new DCD iteration proposed by the present invention, initially v=h, b=h H V, by iterationReducing B and identity matrix I K The difference between them allows only column operations. And in a column operation the same operation is applied to each element in a column of the matrix.
Drawings
Fig. 1 is a cyclic 1-bit DCD iteration of the invention with h in rh=β.
Fig. 2 is a first 1-bit DCD iteration flow chart of the invention for h in rh=β.
Fig. 3 is a 1-bit DCD loop iteration flow chart for h in rh=β according to the present invention.
Fig. 4 is a first 1-bit DCD iteration flow chart of the invention for h in rh=β.
Fig. 5 is a cyclic DCD iteration flow chart for solving m×k real-valued matrix V of the present invention.
Fig. 6 is a leading DCD iteration flow of the present invention for finding the M x K real value matrix V.
Fig. 7 is a cyclic DCD iteration flow chart for finding an mxk complex value matrix V of the present invention.
Fig. 8 is a leading DCD iteration flow chart for solving m×k complex value matrix V of the present invention.
FIG. 9 is a flow chart of the iterative search of M X K real value matrix V of the 1-bit DCD loop of the present invention.
FIG. 10 is a flow chart of the iterative search of M X K real value matrix V by 1 bit DCD according to the present invention. ,
FIG. 11 is a flow chart of the 1-bit DCD loop iteration M X K complex value matrix V of the present invention.
Fig. 12 is a flowchart of the first 1-bit DCD iteration of the present invention for finding the mxk complex value matrix V.
FIG. 13 shows an integrated circuit for implementing a DCD iterative M×K complex matrix V according to the present invention
FIG. 14 is an illustration of three implementations of the index generation module of FIG. 13 in accordance with the present invention.
Fig. 15 is an integrated circuit implementing an approximately arg max function in accordance with the present invention.
Fig. 16 is an integrated circuit implementing a 1-bit DCD iteration of the present invention.
Fig. 17 is an integrated circuit implementing DCD iteration of the present invention.
Fig. 18 is an integrated circuit in the form of a finite state machine of the present invention.
Fig. 19 is a state diagram of the integrated circuit of fig. 18 in accordance with the present invention.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings.
An iterative optimization method for binary coordinate reduction, wherein the iterative method is used for finding a matrix V with M x K dimensions for a matrix H with M x K dimensions and a matrix T with K x K dimensions for another matrix T with K x K dimensions, so that H H V=t, where H H Is the conjugate transpose of matrix H, the iterative method comprising the steps of:
a. initializing v=v 0
b. By column operation V :j =V :j -α V :i Altering V, wherein V: i and V :j The ith and jth columns of matrix V, respectively;
c. initializing b=h H V;
d. Through column operation B :j =B :j -αB :j Change B, wherein B :i And B :j The ith and jth columns of matrix B, respectively;
e. in the iterative step i, j is chosen such that |B ij -T ij Maximum;
where α is a constant, t=i K Is an identity matrix, V 0 =H。
Preferably, the value of α decreases by a factor of 2.
Preferably, the matrix H, T comprises complex values, the iteration comprises a real part and an imaginary part correlation step, in which the i, j is chosen such that |b ij -T ij The real or imaginary part of the i is greatest.
Preferably, the iteration method is used in an integrated circuit device and is used for realizing DCD iteration, and the iteration steps are as follows:
I. reading the designated column B of the two matrices B and V according to the same index i :i And V :i
Designated column B of the matrix :i And V :i Other columns B of the matrix after passing through the shifter :j And V :j Performing addition and subtraction operation, and updating B by using the operation result :j And V :j
The enabling signal of the operation is generated by a comparator, one input of the comparator is from a shifter, namely the other input of the comparator generating the enabling signal is generated by a loop scheduling module, i, j is selected in the iterative step so that |B ij -T ij Maximum;
III, with N M b A bit binary number as an input;
output log 2 An N-bit;
V.M b each of the logic OR gates has an N-bit input, and the j-th input of the i-th logic OR gate is from the i-th bit of the j-th binary number;
a priority encoder receiving said M b The output of each logical OR gate is taken as input, and log is output 2 M b The bit, e.g., i,
a multiplexer for selecting N M from the output of the priority encoder b Each of the binary digits selects 1 bit, such as the ith bit;
another priority encoder receiving as input the output of the multiplexer, output log 2 N bits, e.g., j.
Preferably, the other input of the comparator generating the enable signal is generated by the device.
Preferably, the device input is generated based on a round robin scheduling module.
Preferably, the device comprises a plurality of comparators comprising B which will wait for an update :j And V :j Control logic, including a plurality of updates B, exclusive of the input of the comparator :j And V :j Is provided.
Preferably, the signal generated by the output of the comparator is stored to update B :j And V :j The operation of (a) is performed a certain clock period after the output of the aforementioned comparator.
Example 1, as shown in fig. 1, shows a 1-bit DCD loop iteration flow diagram for h in rh=β, where R is an n×n real-valued system; the illustrated method gives a circular 1-bit DCD iteration of h in rh=β, where R is an n×n real-valued matrix. In the search direction of a certain coordinate, a power of 2, or 2, is selected 1-1 And taking any multiple of alpha as a step length, and sequentially increasing the parallelism of hardware implementation. In the case of 1=1, this iteration simplifies the existing loop DCD iteration. In other cases, this iteration performs the loop DCD iteration multiple times with multiple steps for a given exponent n.
Example 2, as shown in fig. 2, shows a leading 1-bit DCD iterative flow chart for h in rh=β, where R is an n×n real-valued system; the illustrated method gives the first 1-bit DCD iteration of h in rh=β, where R is an n×n real-valued matrix. In the search direction of a certain coordinate, a power of 2, or 2, is selected 1-1 Any multiple of a is used as the step size. This is to increase parallelism of hardware implementations. In the case of 1=1, this iteration is to simplify the existing leading DCD iteration i in other cases, this iteration is to perform the existing leading DCD iteration multiple times with multiple steps for a given exponent n.
Example 3, as shown in fig. 3, a 1-bit DCD loop iteration flow diagram for h in rh=β is given, where R is an N/2×n/2 complex valued system; the method shown gives a 1-bit DCD loop iteration of h in rh=β, where R is an N/2×n/2 real-valued matrix, h and β are N/2×1 complex-valued vectors, respectively, R and β are known, and h is unknown. Initial h=0, r=β.
Example 4, referring to fig. 4, there is shown a leading 1-bit DCD iterative flow diagram for h in R h = β, where R is an N/2 x N/2 complex valued system; the method shown gives a leading 1-bit DCD iteration of h in rh=β, where R is an N/2×n/2 real-valued matrix, h and β are N/2×1 complex-valued vectors, respectively, R and β are known, and h is unknown. Initial h=0, r=β where arg nax operation finds complex element r in vector r n Real or imaginary part of the maximum absolute value of (2), where when r n S=1 when the real part of (a) is the maximum absolute value, and when r n The imaginary part of (2) is the maximum absolute valueTime s=j.
Example 5, referring to fig. 5, a cyclic DCD iteration flow chart for m×k real-valued matrix V is given such that b=h H V approaches the target matrix T, where H is an MxK real value matrix and B and T are K xK real value matrices; the method shown gives a cyclic DCD iteration of finding the m×k real-valued matrix V such that b=h H V is close to the target matrix T, where H is an mxk complex matrix and B and T are k×k complex matrices.
In order to make matrix B as close as possible to target matrix T of the same dimension, iteration is applied to make each element B ij Near T ij . To reduce the complexity of the hardware implementation, for example by using shifters instead of multipliers, in such iterations only powers of 2 are chosen as step sizes for updating the matrices V and B, called binary coordinate reduction (DCD). In fig. 5, DCD iterations of a loop go through a nested loop of indices i and j.
Example 6, referring to fig. 6, a leading DCD iterative flow chart for M x K real value matrix V is shown such that b=h H V approaches the target matrix T, where H is an MxK real value matrix and B and T are K xK real value matrices; the method shown gives a leading DCD iteration of finding the M x K real value matrix V such that b=h H V approaches the target matrix T, where H is an mxk real value matrix and B and T are k×k real value matrices. arg max operates on matrix B ij Find element B in such a way that B ij -I ij With the largest absolute value. The off-diagonal elements of matrix B are reduced to 0, the diagonal elements are regularized to 1 in turn, and their order depends on arg max operations. I.e. arg max finds the maximum absolute value of the element-wise differences between the matrix indices i and j with respect to the matrix B and the target matrix T of the same dimension.
Example 7, referring to fig. 7, a cyclic DCD iteration flow chart of finding m×k complex value matrix V is given such that b=h H V approaches the target matrix T, where H is an M x K complex matrix and B and T are K x K complex matrices; the method shown gives a cyclic DCD iteration of finding the M x K complex-valued matrix V such that b=h H V is close to the target matrix T, where H is an mxk complex-valued matrix and B and T are k×k complex-valued matrices. Initial v=h, b=h H V is provided. Note H H Is the matrix HMeters or conjugate transposes, so that the diagonal elements B of matrix B ii Is a real value, and the off-diagonal element B of matrix B ij Is a complex value, in fig. 7, the loop DCD iterates through a nested loop of indices i, j and s, where s=1 or s 2 = -1 represents the buying part or imaginary part of the complex valued matrix element.
Example 8, referring to fig. 8, a leading DCD iterative flow chart for M x K complex value matrix V is illustrated such that b=h H V approaches the target matrix T, where H is an M x K complex matrix and B and T are K x K complex matrices; the method shown gives a leading DCD iteration of finding the M x K complex matrix V such that b=h H V is close to the target matrix T, where H is an mxk complex-valued matrix and B and T are k×k complex-valued matrices. Initial v=h, b=h H V is provided. Thus the diagonal element B of matrix B ii Is a real value, and the off-diagonal element B of matrix B ij Is a complex value. arg max operation finds one complex element B in matrix B ij So that B ij -I ij Has the maximum absolute value of the real part or the imaginary part of B ij -I ij S=1 when the real part of (B) has the maximum absolute value, and when B ij -I ij S when the imaginary part of (a) has the maximum absolute value 2 =-1。
arg max finds the matrix indices i, j and s, relative to the maximum absolute value of the real or imaginary part of the inter-element difference between matrix B and the target matrix T of the same dimension. s=1 represents the real part of the difference with the maximum absolute value from the element between B and T, s 2 -1 means that the maximum absolute value comes from the imaginary part.
Example 9, referring to fig. 9, a flow chart for iteratively finding an M x K real value matrix V for a 1-bit DCD loop is presented such that b=h H V approaches the target matrix T, where H is an MxK real-valued matrix and B and T are K xK real-valued matrices; the illustrated method gives a 1-bit DCD loop iteration of the m×k real-valued matrix V, such that b=h H V is close to the target matrix T, where H is an M x K real-valued matrix, B and T are K x K real-valued matrices, where the search direction is a coordinate, and the step size is a power of 2 alpha or 2 1-1 Any multiple of a. This is to increase parallelism in hardware implementations. In the case of 1=1, this iteration simplifies the existing loop DCD stackAnd (3) replacing. In other cases, this iteration performs the existing loop DCD iteration multiple times with multiple steps for a given index i and j.
The manner of this embodiment is through a nested loop of matrix indices i, j and a plurality of steps γ. This is functionally equivalent to the previous iteration shown in fig. 5, which may improve parallelism and throughput in a hardware implementation. In previous DCD iterations, one checked whether |ζ| > (α/2) s ii To obtain a tentative value of alpha and update V according to whether the inequality is true :j And B :j . In two such DCD iterations, one checks if |ζ| < (. Alpha./4) B ii 、(α/4)B ii <|ξ|<(34)αB ii Or |ζ| > (3/4) αB ii And moves to 0, alpha/2 or d, respectively. In three rounds of DCD iterations, check |ζ|/B ii Whether or not it is [0, (1/8) alpha],[(1/8)α,(3/8)α],[(3/8)α,(5/8)α],[(5/8)α,(7/8)α]Or [ (7/8) alpha, + ] in the range of + ], and moves to 0, alpha/4, alpha/2, (3/4) alpha, or alpha, respectively. In fig. 9, these are denoted as λ=1, 2,3, Δγ=α,2, α/4. Multiple steps checked in multiple rounds in the previous DCD iteration, and in a single round in the DCD iteration of this variant.
Embodiment 10, referring to fig. 10, a flowchart of a 1-bit DCD iterative search of an mxk real value matrix V is shown such that b=h H V approaches the target matrix T, where H is an mxk real-valued matrix and B and T are K xk real-valued matrices, the method shown gives a 1-bit DCD preamble iteration of solving for the mxk real-valued matrix V such that b=h H V is close to the target matrix T, where H is an M x K real-valued matrix, B and T are K x K real-valued matrices, where the search direction is a coordinate, and the step size is a power of 2 alpha or 2 1-1 Any multiple of a. This is to increase parallelism in hardware implementations. In the case of 1=1, this iteration reduces to the existing leading DCD iteration. In other cases, this iteration performs the existing leading DCD iteration multiple times with multiple steps for a given index i and j. For each arg max, a number of steps γ are calculated, if |ζ| > (γ - δγ/2) B ii Then update V :j And B :j
Example 11, referenceFig. 11, which shows a flow chart for 1-bit DCD loop iteration of the m×k complex matrix V, such that b=h H V is close to the target matrix T, where H is an MxK complex matrix and B and T are K xK complex matrices, the method shown gives a 1-bit DCD loop iteration of solving for the MxK complex matrix V such that B=H H V is close to the target matrix T, where H is an mxk complex-valued matrix and B and T are k×k complex-valued matrices. Initial v=h, b=h H In V, note H H Is the hermite value or conjugate transpose of matrix H, and thus the diagonal element B of matrix B ii Is a real value, and the off-diagonal element B of matrix B ij Is a complex value. It comprises a plurality of nested loops of matrix indices i, j and s and a plurality of step sizes gamma.
Example 12, referring to fig. 12, a leading 1-bit DCD iterative flow chart for finding an mxk complex-valued matrix V is shown such that b=h H V approaches the target matrix T, where H is an mxk complex matrix and B and T are K xk complex matrices, the method shown gives a 1-bit DCD iteration of solving for the mxk complex matrix V such that b=h H V is close to the target matrix T, where H is an MxK complex valued matrix, B and T are KxK complex valued matrices, where arg max operation finds one complex element B in matrix B ij So that B ij -I ij Has the maximum absolute value of the real part or the imaginary part of B ij -I i S=1 when the real part of j has the maximum absolute value, and s=1 when B ij -I ij S when the imaginary part of (a) has the maximum absolute value 2 =-1。
For each arg max, a number of steps y are calculated, and if |ζ| > (γ - δγ/2) Bii, then V is updated :j And B :j
Embodiment 13, referring to FIGS. 13 and 14, an integrated circuit implementing a DCD iterative M×K complex matrix V is provided such that B=H H V is close to the target matrix T, where H is an mxk complex value matrix, B and T are K xk complex value matrices, and three implementations of the index generation module: (a) based on a round robin scheduler, (b) based on an arg max function, and (c) based on a combination of the round robin scheduler and the arg max function for implementing the round robin DCD iteration of finding the mxk complex value matrix V or the leading DCD iteration of finding the mxk complex value matrix V given in example 7 or example 8.
Wherein the cyclic schedule module outputs (i, j, s), where i and j are matrix indices, s=1 or s 2 = -1. If s=1, the logic module gives the real part of the element difference between matrices B and TOr if s 2 = -1, then the imaginary partThe comparator compares the absolute value of xi with the ith element B in the diagonal of matrix B ii The comparison is made by multiplying by a/2, which is done at the output of the shifter. Shifter taking B ii And shift it by log 2 Distance of alpha-1. Alpha is chosen to be a power of 2, such that multiplication is by shifting log 2 Alpha-1. B (B) ii Is positive because b=h H V is provided. At |ζ| > (α2) B ii In the case of (a), the comparator output activates the data path logic, including saving V :j And B :j Memory elements, adders and shifters such that V :j And B :j Is updated, e.g. by V respectively :j =V :j -sign(ξ)sαV :I And B :j =B :j -sign(ξ)sαB :I
Fig. 14 illustrates three implementations of the index generation module of fig. 13: (a) based on a round robin schedule, e.g. based on a round robin counter; (b) based on arg max or an approximate arg max function, as in fig. 15; (c) Based on a combination of cyclic scheduling and arg max or an approximate arg max function, for example, the cyclic scheduling module gives a series of matrix indexes, where arg max givesOr->Matrix index of the maximum absolute value of (a), which is matrix B ij And T ij The real or imaginary part of the element-wise difference between.
FIG. 15 illustrates one implementationAn integrated circuit approximating an arg max function. The integrated circuit is provided with N M b A binary number of digits is taken as input, each binary number representing an integer, or the significand or exponent of a floating point number. M is M b Each of the logical OR gates has an N-bit input, and the j-th input of the i-th logical OR gate is from the i-th bit of the j-th binary number. M is M b The output of the logical OR gate is sent to a priority encoder which accepts the input of Mb bits, outputs log 2 M b Bits, the output of which is N M b Binary representation of the index of the most significant non-zero bit in a bit binary string, e.g. the jth M b And if the ith bit in the bit binary string is non-zero, the ith bit input by the priority encoder is non-zero, and the output of the priority encoder is a binary representation of i.
Based on the above-mentioned index of the most important non-zero bits, a set of multiplexers is formed from N M b Each 1 bit of the binary number is selected and sent to another priority encoder whose output is an approximation of the arg max function, the priority encoder having a complexity of O (Mog 2 N), the multiplexer has a complexity of O (M b log 2 (M b ) N), or the complexity of the logic gate is O (M b N), or the complexity of the priority encoder at the output of the logic gate is O (M b log 2 (M b ). Given a floating point number array, an approximate arg max may be achieved based on their exponents and then based on their significance. Given an array of complex numbers, their real and imaginary parts are treated as separate floating point numbers in the DCD iteration.
Example 14, referring to fig. 16, an integrated circuit implementing 1-bit DCD iteration is shown for finding an mxk complex value matrix V, such that b=h H V is close to the target matrix T, where H is an mxk complex value matrix and B and T are K xk complex value matrices, for implementing the 1-bit DCD loop iteration given in example 11 or example 12 to find the mxk complex value matrix V or to find the leading 1-bit DCD iteration of the mxk complex value matrix V.
The index generation module may be based on round robin scheduling, arg max, or an approximate arg max function, or a combination of both. The integrated circuit is formed by disposing a plurality of comparators and selection modulesTo improve the performance of the integrated circuit depicted in fig. 16. Each comparator compares: (1) The complex-valued element difference between the matrix B and the matrix T takes the real part or the imaginary part; (2) The ith element B on the diagonal of matrix B ii Multiplied by a power of 2 (gamma-delta gamma/2). The comparison with respect to different matrix indices i and j, and/or with respect to different step sizes gamma may be performed simultaneously to increase parallelism and throughput.
The integrated circuit of fig. 13 has only one comparator, which may take many cycles to find a positive comparison result |ζ| > (γ - Δγ2) B ii And activates update V :j And B :j In V16 of (c), a plurality of comparisons are performed simultaneously to reduce the number of cycles that the baseline integrated circuit of fig. 13 may use for the comparison. If it is not found that |ζ| > (γ -. Alpha.γ/2) B is satisfied ii The round robin scheduler gives the next index (i, j, s) and sends another set of inputs to the comparator. For example, in case the round robin scheduler gives all matrix indexes, the control logic decreases the step size γ, or the iteration terminates. If multiple comparisons result in meeting |ζ| > (γ - Δγ/2) B ii And then selects one of them to activate the data path logic, updating V :j And B :j . The selection criteria may be based on priority logic, e.g., a larger step size gamma or a smaller matrix index with higher priority, such as loop DCD iterations; or based on the absolute value of ζ, as in the lead DCD iteration.
To improve throughput, the integrated circuit shown in fig. 16 and even fig. 13 may be pipelined. I.e. the output at the selector (xi, j, s, y) may be stored in a memory element such as a flip-flop or a latch, forming a boundary between two pipeline stages: in one phase, tentative updates are evaluated, while in another phase, updates are committed, and in order to avoid any risk of read-after-write, the j-th columns of matrices V and B being updated in one pipeline phase cannot be estimated in the other pipeline phase. The control logic needs to ensure this. This can be easily ensured by modifying the round robin scheme or arg max function to exclude the j-th column under update. The j-th column that is excluded under the update will be included in the evaluation after the update is completed.
Such that the update phase needs to wait for a positive evaluation result or such that there are many positive evaluation results such that the evaluation phase needs to wait for the update phase. To better decouple the two phases and further improve performance, a buffer or FIFO or priority queue may be provided in which elements have a priority of processing at the boundary of the two phases, storing the positive evaluation results (ζ, i, j, s, γ). On this basis, the buffer or FIFO or priority queue is made available for simultaneous reading and writing, so that the operations of the evaluation phase and the update phase are not data dependent unless the buffer or FIFO or priority queue overflows.
Example 15, referring to fig. 17, an integrated circuit implementing DCD iteration is shown for finding an mxk complex value matrix V, such that b=h H V is close to the target matrix T, where H is M H The xK complex value matrix, B and T are KxK complex value matrix, including an evaluation stage and an update stage, and a buffer area or FIFO or priority queue is arranged on the boundary of the two stages to store the positive evaluation result (x, i, j, s, g).
In this example, the integrated circuit comprises a buffer or FIFO or priority queue implemented by flip-flops, latches or small memories storing the positive evaluation results (ζ, ij, s, γ), an evaluation phase consisting of comparators written into the buffer or FIFO or priority queue. And an update phase of reading and updating the memory containing the V and B elements from the buffer or FIFO or priority queue. The evaluate phase and update phase operate without data dependencies unless there is a read-after-write hazard. To avoid any risk of read-after-write, the index generation module is modified to exclude the index of the matrix column to be updated in the buffer or FIFO or priority queue. One low complexity solution is to check the lower and upper bounds of the matrix index j in the buffer or FIFO or priority queue so that the matrix index j at the output of the index generation module does not fall within a range between the lower and upper bounds. When the comparator gives a plurality of positive evaluation results, the results of different matrix indexes (i, j, s) are written into the buffer or FIFO or priority queue, whereas for the same matrix index (i, j, s) only the maximum step size g needs to be written into the buffer or FIFO or priority queue at any time.
Embodiment 16, referring to fig. 18 and 19, fig. 18 shows an integrated circuit in the form of a finite state machine, implementing an iteration of finding an mxk complex-valued matrix V such that b=h H V approximates the target matrix T, where H is an M K complex matrix and B and T are K complex matrices. Fig. 19 is a state diagram of the integrated circuit of fig. 18.
Fig. 18 shows that in order to increase the hardware efficiency, the multiplication of the input of the comparator and the input of the adder can be implemented by the same multiplication module, and the further comparison and addition can be implemented by the same adder/subtractor. It includes a multiplexer that selects the signal from the control logic module that directs the circuit through a finite set of states. Fig. 19 shows a state diagram of such an integrated circuit that undergoes at least two states in one iteration: in one state, the circuit picks up a set of metrics (i, j, s) and evaluates the heuristic movement with a step size; if the evaluation is positive, the circuit enters another state in which matrices V and B are updated.
The foregoing detailed description of the invention has been presented in conjunction with a specific embodiment, and it is not intended that the invention be limited to such detailed description. Several equivalent substitutions or obvious modifications will occur to those skilled in the art to which this invention pertains without departing from the spirit of the invention, and the same should be considered to be within the scope of this invention as defined in the appended claims.

Claims (5)

1. An iterative optimization method for binary coordinate reduction is characterized in that:
the iterative optimization method is given a matrix H with M x K dimension and a matrix T with K x K dimension, and finds a matrix V with M x K dimension so that H H V=t, where H H Is the conjugate transpose of matrix H, the iterative optimization method comprising the steps of:
a. initializing v=v 0
b. By column operation V j: = V j: - aV i: Altering V, where V i: And V j: Respectively the first of the matrix ViAnd (d)jA column;
c. initializing b=h H V;
d. Through column operation B j: = B j: - aB i: Change B, wherein B i: And B j: Respectively the first of the matrix BiAnd (d)jA column;
e. in an iterative stepi, jIs selected so that |B ij -T ij Maximum;
wherein the method comprises the steps ofaIs a constant, T=I K Is an identity matrix, V 0 =H;
The value of a is decreased by a power of 2;
the matrix H, T comprises complex values, the iteration comprising real and imaginary part correlation steps, in the iterative stepi, jIs selected so that |B ij -T ij The real or imaginary part of i is largest;
the iterative optimization method is used in an integrated circuit device and is used for realizing DCD iteration, and the method comprises the following steps:
i, according to the same indexiReading the designated column B of the two matrices B and V i: And V i:
The designated column B i: And V i: Through a shifter and other columns B of the matrix :j And V j: Performing addition and subtraction operation, and updating B by using the operation result :j And V j :
The enabling signal of the operation is generated by a comparator, one input of the comparator is from the shifter, namely one input of the comparator generating the enabling signal is generated by a loop scheduling module, and in the iterative stepi, jIs selected so that |B ij -T ij Maximum;
III. ToNPersonal (S)M b A bit binary number as an input;
IV, output log 2 NA bit;
Ⅴ.M b each of the logic OR gates hasNBit input, the firstiThe first of the logical OR gatesjInput is from the firstjThe first of the binary numbersiA bit;
VI. A priority encoder which accepts the foregoingM b The output of each logical OR gate is taken as input, and log is output 2 M b A bit;
a multiplexer for selecting the output of the priority encoder as a selection signalNPersonal (S)M b 1 bit is selected from the binary digits;
another priority encoder which receives as input all of the outputs of the multiplexers, outputs log 2 NBits.
2. The iterative optimization method for bipartite coordinate descent according to claim 1, wherein: the other input of the comparator that generates the enable signal is generated by the integrated circuit device.
3. The iterative optimization method for bipartite coordinate descent according to claim 2, wherein: the integrated circuit device input is generated based in part on a round robin scheduling module.
4. A method of iterative optimization of bipartite coordinate descent according to claim 3, wherein: the integrated circuit device includes a plurality of comparators including B to be updated :j And V j : Control logic and multiple updates B excluded from the input of the comparator :j And V j : Is provided.
5. A bipartite according to claim 4The iterative optimization method for coordinate descent is characterized by comprising the following steps of: the signals generated by the outputs of the comparators are all stored to update B :j And V j : The operation of (a) is performed a certain clock period after the output of the aforementioned comparator.
CN202311664792.8A 2023-12-06 2023-12-06 Iterative optimization method for binary coordinate reduction Active CN117370717B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311664792.8A CN117370717B (en) 2023-12-06 2023-12-06 Iterative optimization method for binary coordinate reduction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311664792.8A CN117370717B (en) 2023-12-06 2023-12-06 Iterative optimization method for binary coordinate reduction

Publications (2)

Publication Number Publication Date
CN117370717A CN117370717A (en) 2024-01-09
CN117370717B true CN117370717B (en) 2024-03-26

Family

ID=89394856

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311664792.8A Active CN117370717B (en) 2023-12-06 2023-12-06 Iterative optimization method for binary coordinate reduction

Country Status (1)

Country Link
CN (1) CN117370717B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111723336A (en) * 2020-06-01 2020-09-29 南京大学 Cholesky decomposition-based arbitrary-order matrix inversion hardware acceleration system adopting loop iteration mode
CN113159310A (en) * 2020-12-21 2021-07-23 江西理工大学 Intrusion detection method based on residual error sparse width learning system
CN114665938A (en) * 2022-04-21 2022-06-24 暨南大学 Multi-user RIS pre-coding method, device, computer equipment and storage medium
CN116155325A (en) * 2023-01-09 2023-05-23 中山大学 Mixed precoding design method based on element-by-element iteration in large-scale MIMO system
CN116719499A (en) * 2023-05-06 2023-09-08 华东师范大学 Self-adaptive pseudo-inverse calculation method applied to 5G least square positioning

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB0208329D0 (en) * 2002-04-11 2002-05-22 Univ York Data processing particularly in communication systems
KR100580843B1 (en) * 2003-12-22 2006-05-16 한국전자통신연구원 Channel transfer function matrix processing device and processing method therefor in V-BLAST
US7711762B2 (en) * 2004-11-15 2010-05-04 Qualcomm Incorporated Efficient computation for eigenvalue decomposition and singular value decomposition of matrices
US9935615B2 (en) * 2015-09-22 2018-04-03 Intel Corporation RLS-DCD adaptation hardware accelerator for interference cancellation in full-duplex wireless systems
CN109992742A (en) * 2017-12-29 2019-07-09 华为技术有限公司 A kind of signal processing method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111723336A (en) * 2020-06-01 2020-09-29 南京大学 Cholesky decomposition-based arbitrary-order matrix inversion hardware acceleration system adopting loop iteration mode
CN113159310A (en) * 2020-12-21 2021-07-23 江西理工大学 Intrusion detection method based on residual error sparse width learning system
CN114665938A (en) * 2022-04-21 2022-06-24 暨南大学 Multi-user RIS pre-coding method, device, computer equipment and storage medium
CN116155325A (en) * 2023-01-09 2023-05-23 中山大学 Mixed precoding design method based on element-by-element iteration in large-scale MIMO system
CN116719499A (en) * 2023-05-06 2023-09-08 华东师范大学 Self-adaptive pseudo-inverse calculation method applied to 5G least square positioning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Inexact Coordinate Descent: Complexity and Preconditioning;Rachael Tappenden et.al;《J Optim Theory Appl(2016)》;第144-176页 *
用二维坐标下降法实现x=b/a的FPGA设计;罗珍 等;广东工业大学学报;第28卷(第01期);第32-37页 *

Also Published As

Publication number Publication date
CN117370717A (en) 2024-01-09

Similar Documents

Publication Publication Date Title
US20240020518A1 (en) Deep neural network architecture using piecewise linear approximation
CN114816331B (en) Hardware unit for performing matrix multiplication with clock gating
US20220391172A1 (en) Implementation of Softmax and Exponential in Hardware
US20240126507A1 (en) Apparatus and method for processing floating-point numbers
US20230221924A1 (en) Apparatus and Method for Processing Floating-Point Numbers
CN113887710A (en) Digital format selection in recurrent neural networks
Lu et al. THETA: A high-efficiency training accelerator for DNNs with triple-side sparsity exploration
EP4345691A1 (en) Methods and systems for performing channel equalisation on a convolution layer in a neural network
CN117370717B (en) Iterative optimization method for binary coordinate reduction
EP3647937B1 (en) Selecting an ith largest or a pth smallest number from a set of n m-bit numbers
CN113887714A (en) Method and system for running dynamic recurrent neural networks in hardware
Batabyal et al. A quantum pipeline for an executable quantum instruction set architecture
Ganapathy et al. Designing a coprocessor for recurrent computations
Ueki et al. Aqss: Accelerator of quantization neural networks with stochastic approach
US20240111525A1 (en) Multiplication hardware block with adaptive fidelity control system
US20240086677A1 (en) Learned column-weights for rapid-estimation of properties of an entire excitation vector
EP2797237A1 (en) High speed add-compare-select circuit
Bildosola et al. Adaptive scalable SVD unit for fast processing of large LSE problems
Song et al. MSDF-SGD: Most-Significant Digit-First Stochastic Gradient Descent for Arbitrary-Precision Training
Soudris et al. On the design of two-level pipelined processor arrays
GB2611522A (en) Neural network accelerator with a configurable pipeline
GB2615773A (en) Method and system for calculating dot products
GB2611521A (en) Neural network accelerator with a configurable pipeline
GB2584017A (en) Iterative estimation hardware
CN111352607A (en) Iterative estimation hardware

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: BAO LIU (Bao Liu)

Inventor before: Liu Bao