WO2004079585A1 - Dispositif d'opération matricielle - Google Patents

Dispositif d'opération matricielle Download PDF

Info

Publication number
WO2004079585A1
WO2004079585A1 PCT/JP2003/002696 JP0302696W WO2004079585A1 WO 2004079585 A1 WO2004079585 A1 WO 2004079585A1 JP 0302696 W JP0302696 W JP 0302696W WO 2004079585 A1 WO2004079585 A1 WO 2004079585A1
Authority
WO
WIPO (PCT)
Prior art keywords
matrix
solution
register
linear
linear equation
Prior art date
Application number
PCT/JP2003/002696
Other languages
English (en)
Japanese (ja)
Inventor
Katsuyoshi Naka
Keiichi Kitagawa
Original Assignee
Matsushita Electric Industrial Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Matsushita Electric Industrial Co., Ltd. filed Critical Matsushita Electric Industrial Co., Ltd.
Priority to PCT/JP2003/002696 priority Critical patent/WO2004079585A1/fr
Priority to CN03821223.4A priority patent/CN1682214A/zh
Publication of WO2004079585A1 publication Critical patent/WO2004079585A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/11Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
    • G06F17/12Simultaneous equations, e.g. systems of linear equations
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B1/00Details of transmission systems, not covered by a single one of groups H04B3/00 - H04B13/00; Details of transmission systems not characterised by the medium used for transmission
    • H04B1/69Spread spectrum techniques
    • H04B1/707Spread spectrum techniques using direct sequence modulation
    • H04B1/7097Interference-related aspects
    • H04B1/7103Interference-related aspects the interference being multiple access interference
    • H04B1/7105Joint detection techniques, e.g. linear detectors
    • H04B1/71052Joint detection techniques, e.g. linear detectors using decorrelation matrix

Definitions

  • the present invention relates to a matrix operation device that can be used to obtain a solution of a system of linear equations expressed in a form including an upper triangular matrix or a lower triangular matrix.
  • the symmetric matrix is Cholesky-decomposed into the form of a product of an upper triangular matrix and a lower triangular matrix, and the solution of a system of linear equations is performed by forward or backward substitution.
  • a matrix operation is executed using, for example, a multiprocessor system 100 having a plurality of processors (DSP 102, 1004) as shown in FIG.
  • DSP 102, 1004 The matrix operation using a plurality of processors is described, for example, in Japanese Patent Application Laid-Open No. 2000-33992.
  • the processing is delayed because information must be transmitted between the processors.
  • An object of the present invention is to realize a remarkable improvement in the processing capacity of a matrix operation device, a reduction in the size of the device, and a reduction in power consumption.
  • the matrix operation device of the present invention converts the solution of a system of linear equations into forward substitution or back substitution. Therefore, it is a cyclic arithmetic processing circuit composed of hardware that is obtained in order.
  • the circuit elements substantially performing these operations for performing multiplication, division, addition, and subtraction necessary for finding the solution of the linear equation are calculated by the operation procedure (operation flow). ), And perform processing in a pipeline while creating a reasonable data flow. Since the processing is performed by hardware, the arithmetic processing can be performed with the maximum processing capability of the hardware implemented as an LSI, thereby achieving, for example, 10 times the speed as compared with the conventional technology.
  • the number of the multipliers is reduced without difficulty and effectively, thereby promoting the miniaturization and low power consumption of the device.
  • n simultaneous linear equations are solved sequentially, for example, by forward substitution, to solve the n-th linear equation, the first to (n-1) th linear equations obtained before It is necessary to perform an operation using the solution (for example, a product-sum operation). Therefore, if the solution of the previous (n_l) -th linear equation cannot be determined, the solution of the n-th linear equation cannot be obtained.
  • the matrix operation device of the present invention roughly classifies operation terms into a group of terms including a solution of the immediately preceding linear equation and a group of only terms not including the solution.
  • the operation formula for the group of only the terms that do not include the solution of the previous linear equation among the calculation terms for solving the next linear equation is performed first. And temporarily store the results.
  • the matrix operation device of the present invention can be applied to a forward substitution operation when finding a solution of a simultaneous linear equation including a lower triangular matrix.
  • the present invention can be applied to a backward substitution operation when obtaining a solution of a simultaneous linear equation including an upper triangular matrix.
  • it when solving a system of linear equations including a symmetric matrix, it can be applied to the operation of a system of linear equations in which a symmetric matrix is transformed by Cholesky decomposition or modified Cholesky decomposition.
  • the present invention can be applied to an inverse matrix calculator that performs an inverse matrix operation using Cholesky decomposition.
  • the matrix operation device of the present invention includes a receiving device having a joint detection demodulation function, a receiving device in which an AAA (adaptive array) based on the least squares error method is mounted, a transversal filter, and the like. It can be mounted on a receiver equipped with an adaptive equalizer that implements the above.
  • the matrix operation device when the known lower triangular matrix is “L” and the known upper triangular matrix is “U”, the matrix operation device has L (or U) ⁇ X ⁇ Y (X is A matrix operation unit that calculates the solution of a system of linear equations represented by the following equation using forward or backward substitution to obtain the values of all elements of the matrix X,
  • the first register that temporarily stores the solution of the previous linear equation, which is necessary to find the solution of the first (n is a natural number of 2 or more), and the solution of the n-th linear equation
  • a shift register configured to temporarily accumulate each solution of the linear equation prior to the immediately preceding linear equation, and to include taps classified into a first half and a second half; Of the first half taps and The first register and the second half tap are set as one set, or the first register and the second half tap located at the corresponding position of the shift register are set as one set, and are provided corresponding to each set.
  • a plurality of switches for switching and outputting any of the values, and as a coefficient for multiplying each of the values output from the plurality of switches, a general element of the lower triangular matrix L or the two-triangular matrix U is used.
  • a coefficient generator generated in a predetermined order; a plurality of multipliers for multiplying a value output from each of the plurality of switches and a coefficient corresponding to each switch generated from the coefficient generator; An adder for adding values output from each of the plurality of multipliers, and temporarily accumulating the addition result output from the adder, and performing the next addition process in the adder.
  • That stock A second register that returns the added result to the adder, and a linear operation circuit that performs a necessary linear operation on the added result output from the adder to calculate a solution of the n-th linear equation And returning the solution of the n-th linear equation calculated by the linear operation circuit to the first register, and before the solution of the n-th linear equation is set in the first register.
  • the shift register is shifted by one stage, and during a period after the addition result is output from the adder output and an operation is being performed by the linear operation circuit, the plurality of switches are switched.
  • the shift register is switched so that the values of the taps in the latter half are output, and a part of the multiply-accumulate operation necessary to find the solution of the next (n + 1) th linear equation is executed in advance. Then, the obtained solution of the product-sum operation is returned to the first register to continue the operation, and all the elements of the matrix X to be obtained are specified.
  • the matrix operation device of the present invention is small in size, consumes low power, and can perform ultra-high-speed processing. Can be done. BRIEF DESCRIPTION OF THE FIGURES
  • FIG. 1 is a diagram showing processing contents of a matrix operation including a lower triangular matrix
  • FIG. 2 is a circuit diagram showing a configuration of an example of a matrix operation device of the present invention
  • FIG. 3 is a timing chart for explaining the characteristic operation of the matrix operation device of FIG. 2, and FIG. 4 illustrates an example of processing contents when a solution of a linear equation is obtained by the matrix operation device of FIG. Figure for the
  • FIG. 5 is a diagram showing processing contents of a matrix operation including an upper triangular matrix
  • FIG. 6A is a block diagram showing a configuration of a CDMA receiving device including a JD demodulation unit
  • FIG. 6B is a diagram showing a format of transmission data
  • Figure 7 shows the propagation model of the multi-user transmission signal.
  • FIG. 8 is a diagram showing the propagation model of FIG. 7 in the form of a matrix.
  • FIG. 9 is a diagram for explaining the generation of the cross-correlation matrix (F).
  • FIG. 10 is a diagram showing a configuration of an adaptive array device to which the matrix operation device of the present invention is applied;
  • FIG. 11 is a diagram showing the configuration of an adaptive equalizer to which the matrix operation device of the present invention is applied
  • FIG. 12 is a diagram for explaining a conventional matrix operation method using a multiprocessor.
  • F is a known matrix of n rows ⁇ n columns
  • r is a known matrix of n rows ⁇ 1 column
  • d is a matrix to be obtained (n rows ⁇ 1 column).
  • a known symmetric matrix F by its transposed matrix L T between the lower triangular matrix L can be expressed as following equation (2).
  • This process is roughly divided into two stages.
  • equation (3) is transformed as equation (4).
  • the equation for calculating the matrix z can be expressed as the following equation (5).
  • the matrix z is obtained as follows.
  • the element Zl on the first line is calculated.
  • the element z 2 in the second row is calculated using the calculated Zl according to the equation (5).
  • Z i is calculated using the calculation results of ⁇ ⁇ ⁇ ⁇ .
  • the operation of calculating the first element of the matrix z in accordance with the order of the n-th element is called forward substitution.
  • the matrix L ⁇ is an upper triangular matrix because it is a transposed matrix of the lower triangular matrix L. Therefore, the equation for calculating the solution d of the system of linear equations is given by the following equation (6).
  • i ⁇ -1, n-2 ... 1.
  • the operation of the matrix d is called the back substitution operation because it calculates the elements from the n-th element to the first element in reverse order.
  • the matrix operation device of the present invention can be widely used for the operation for obtaining the solution of the simultaneous linear equation by such forward substitution and backward substitution.
  • z is a matrix of n rows and 1 column that is the solution of the system of linear equations.
  • the process (SP1) shown in FIG. 1 shows the contents of the matrix operation including the lower triangular matrix.
  • the process (SP2) shows the first to fourth simultaneous linear equations.
  • the process (SP3) shows the specific operation contents for finding the solutions of the first to fourth simultaneous linear equations.
  • the lower triangular matrix is a matrix in which all upper right halves are "0" and matrix elements are arranged in the lower left half.
  • diagonal element the matrix element located on the diagonal of the lower triangular matrix
  • diagonal elements are indicated by circles.
  • FIG. 2 is a diagram showing a specific configuration of a matrix operation device that executes such a matrix operation at high speed.
  • FIG. 3 is a timing chart for explaining the characteristic operation of the matrix operation device of FIG.
  • the matrix operation device shown in FIG. 2 is a cyclic operation processing circuit composed of hardware that sequentially obtains solutions of simultaneous linear equations by forward substitution or backward substitution.
  • circuit elements for multiplication, division, addition, and subtraction necessary to find the solution of the linear equation are arranged along the operation procedure (operation flow) to create a reasonable data flow. Processing is performed in a pipeline manner.
  • the matrix operation device includes a product-sum operation unit (101, 102, 103, 105, 106, 107, 108, 109, 110) and a linear operation unit (111, 112, 113, 114). And.
  • the sum-of-products calculation unit (101, 102, 103, 105, 106, 107, 108, 109, 110) is required to find the solution of the ⁇ th ( ⁇ is a natural number of 2 or more) linear equation. Performs a predetermined sum-of-products operation on all solutions found in the past, including the solution of the linear equation.
  • the linear operation units (1 1 1, 1 12, 1 1 3, 1 14) perform arithmetic processing by hardware. That is, a predetermined linear operation is performed on the value output from the product-sum operation unit to obtain a solution of the ⁇ -th linear equation.
  • the matrix-sum operation unit uses the product-sum calculation unit to obtain the solution of the ⁇ th linear equation.
  • the product-sum operation is performed in advance for the term that does not include the solution of the ( ⁇ -1) th linear equation.
  • the multipliers (MUL (1) to MUL (n / 2)) of the multiplier 107 are used in a time division manner.
  • MUL (1) to MUL (n / 2) the multipliers (MUL (1) to MUL (n / 2)) of the multiplier 107 are used in a time division manner.
  • a register (REG) 101 is a register for storing a solution of the immediately preceding linear equation.
  • the shift register 102 is a shift register that sequentially accumulates solutions of linear equations obtained in the past.
  • the reason why the register 101 and the shift register 102 are separated is that the timing at which the solution of the immediately preceding linear equation is latched (set) in the register 101 and the data of the register 101 and the data of each tap of the shift register 102 This is a force that takes into account the deviation from the timing of the first-stage shift.
  • register latch clock is simply referred to as latch clock.
  • the timing of one-stage sipping of the data of the register 101 and the data of each tap (REG (1) to REG (n-2)) of the shift register is controlled by the shift clock (SCL).
  • the register latch clock (RC) and shift clock (SCL) are supplied to the register 101 via an OR gate (OR).
  • OR OR gate
  • the shift register 102 has a shape that is folded back halfway.
  • the outputs of a certain delay element (memory element) form a pair.
  • each set of signals is input to each of the switches SW1 to SW (n / 2) provided in the switch unit 105.
  • switches SW1 to SW (n / 2) are switched to terminal b.
  • the delay element memory element
  • the first memory unit 103 (having M (1) to M (n-1)) stores ⁇ of the general element (PX) in the lower triangular matrix.
  • the value of the general element of the lower triangular matrix is supplied to the multiplier provided in the multiplier 107 through a plurality of switches (PW1-PW (n / 2)) provided in the switch 106. (MUL (1) to MUL (n / 2)). Then, the solution of the linear equation already obtained and the general element of the lower triangular matrix are multiplied.
  • the adder 108 adds the output values from the respective multipliers.
  • the data indicating the result is sent to the register (REG) 109 via the adder (ADD) 110.
  • the data is temporarily stored in register 109.
  • each switch of the switch unit 105 is switched to the terminal a to perform the same product-sum operation.
  • the result of the product-sum operation is stored in the register 109 in the adder 110.
  • V is added to the result of the preceding processing.
  • the second memory 111 stores the matrix elements of the known matrix r.
  • the subtracter 1 1 2 performs an operation of subtracting the product-sum operation result from the elements of the known matrix r. The result is further divided by the value of the diagonal element of the lower triangular matrix in the divider (DIV) 114.
  • the third memory 1 1 3 stores the value of the diagonal element (AX) of the lower triangular matrix. I have.
  • the solution of the linear equation obtained as a result of the division is stored in the fourth memory (memory for storing the solution) 115 and set in the register 101.
  • the same processing is repeatedly performed.
  • FIG. 3 is a timing chart for explaining the operation of the portion of the matrix operation device of FIG. 2 that performs a product-sum operation (an operation for summing the outputs of the multipliers).
  • Each of the switches SW1 to SW (n / 2) of the switch section 105 is alternately and periodically switched to a terminal side and b terminal side.
  • the switching of the switch is the same for the switch unit 106.
  • the delay element (storage element) in the first half of the shift register 102 and the extraction of data from the register 101 are performed.
  • the switch is switched to the terminal b, data is extracted from the delay element in the latter half of the shift register 102.
  • the shift clock (SCL) and register latch clock (RC) are out of phase. In other words, the phase of the shift clock (SC) is ahead of that of the shift clock.
  • FIG. 4 schematically shows the operation when obtaining the solution z 8 of the eighth equation.
  • the addition computation by the adder 1 1 0 ended (i.e., even after the operation in the product-sum operation unit is completed, according to the subtracter 1 1 2 in the period) for division by calculation or divider 1 1 4 is being performed, to update the contents of the shift register, the second half of the shift register, taken out solution for zi ⁇ z 3 obtained in the past. Then, the product-sum operation of group A is performed in advance.
  • Fig. 3 shows the details of the operations performed in each period.
  • the data in the second half of the shift register remains "0". Therefore, even if the switch is switched to the terminal b and data is taken out from the second half of the shift register and the product-sum operation is performed, the result remains "0".
  • the register 101 is referred to as a first register, and the register 104 is referred to as a second register.
  • register 109 is referred to as a third register.
  • the adder 108 is called a first adder, and the adder 110 is called a second adder.
  • the switch 105 is called a first switch, and the switch 107 is called a second switch.
  • the matrix operation device of FIG. 2 stores a first register 101 that stores the operation result ( Z i ) obtained at the present time, and stores the operation result ( ⁇ ⁇ to ⁇ ⁇ ) obtained up to the present time. 2) A stage shift register 102, a first memory 103 in which all the general elements of the known lower triangular matrix (L) except diagonal elements are stored, and a second register 1 in which 0 is always stored. 04 and
  • the matrix operation device uses the first half of the first register 101 and the first half ( ⁇ / 2 ⁇ 1) of the shift register 102 as the first half, and the second half ( ⁇ / 2 ⁇ 1) of the shift register as the second half.
  • a first switch unit 105 for controlling the reading of either the first part or the second part, and the first part ( ⁇ / 2) of the first memory 103 as the first part, and the second part ( ⁇ 2— 1)
  • the second switch 106 for controlling the reading of either the first half or the second half of the memory and the second register 104 as the second half, the output value of the first switch 105 and the second switch And ⁇ / 2 multipliers 107 for performing multiplication with the output value of 106.
  • the matrix operation device obtains the first adder 108 that adds all the operation results output from the ⁇ 2 multipliers 107 and the operation result of the first adder 108 by reading the latter half.
  • a third register 109 for storing the result of the first adder 108, and a first adder 108 when the result of the first adder 108 is obtained by reading the first half.
  • a second adder 110 for adding the operation result of the above to the value stored in the third register 109.
  • the matrix operation device includes a second memory 111 that stores the elements of the known matrix of ⁇ rows and 1 column, and a second adder 110 based on the value read from the second memory 111.
  • Subtractor 1 12 for subtracting the result of the operation
  • third memory 1 13 for storing the diagonal elements of the known lower triangular matrix
  • third memory for the output value from subtractor 1 12
  • Divider 1 14 for dividing by the value read from 1 13 and Divider 1
  • a fourth memory 1 15 for storing the operation result output by 14.
  • the fourth memory 1 1 5 solution z n from the solution Zl is, are sequentially stored.
  • the first memory 103 stores the known general elements of the lower triangular matrix L.
  • the second memory 1 1 1 stores all elements of a known matrix r having n rows and 1 column.
  • the third memory 113 stores the diagonal element (L u , L 22 , ... , L nn ) force S of the lower triangular matrix L.
  • the first memory 103 includes n_1 memories.
  • the general elements of the lower triangular matrix L are regularly stored in each memory.
  • the M (1) (0, L 21, L 32, L 43, ... ⁇ L n, n-1) is stored.
  • the M (2) (0, 0 , L 31, L 42, L 53, ⁇ , L n 'n - 2) is stored.
  • Each memory has n addresses.
  • the memory address is incremented in synchronization with the timing at which the shift register 102 shifts, and data is sequentially read from the address.
  • the first register 101 and the shift register unit 102 store the initial value “0”.
  • the first switch 105 performs control so that a value is read from the second half of the shift register 102.
  • the second switch 106 controls the value to be read from the latter half of the first memory 103.
  • the result of the first adder 108 is also 0, and 0 is included in the third register 109.
  • control is performed so that the first switch 105 is switched to read a value from the first half of the first register 101 and the shift register 102.
  • the read data is input to the multiplier 107.
  • control is performed so that the first half of the first memory 103 and the second register 104 are read out and input to the multiplier 107.
  • the operation result of the first adder 108 is also zero.
  • the operation result of the first adder 108 (sum of the multiplication results in the first half) and the value stored in the third register 109 (sum of the multiplication results in the second half) are added to the second adder 1 Enter 1 in 0.
  • the output of the second adder 1 110 is also 0.
  • the first element of the known matrix r is read from the second memory 1 1 1 1 and the operation result of the second adder 1 1 0 is subtracted from ⁇ by the subtractor 1 1 2 I do.
  • the third register 109 is initialized.
  • the output of the divider 1 1 4 solution zi since it is r L u is obtained at this point.
  • the obtained solution Zl is stored in the fourth memory 1 15 and simultaneously in the first register 101.
  • control is performed such that the first switch 105 is switched to read a value from the second half of the shift register 102.
  • control is performed such that the second switch 106 is switched to read a value from the second half of the first memory 103.
  • the second half of the shift register 102 and the second half of the first memory are input to the multiplier 107 to perform multiplication.
  • the multiplication result is input to the first adder 108 and added.
  • the result of the adder is also zero.
  • the addition result is stored in the third register 109.
  • the first switch 105 When the solution Zl is obtained and stored in the first register 101, the first switch 105 is switched to read a value from the first half of the first register 101 and the shift register 102. . The read data is input to the multiplier 107.
  • a multiplier 107 multiplies the output of the first switch 105 by the output of the second switch 105.
  • the first register 101 stores the solution.
  • the first memory 1 0 3 read Adoresu since been incremented, from M (1) L 21 is read out.
  • the second adder 110 adds the value stored in the third register 109 to the operation result of the first adder 108.
  • the value r 2 is input to the subtracter 112 is read out.
  • the subtractor 112 subtracts the output value ⁇ ⁇ L 21 of the second adder 110 from r 2 read from the second memory 111.
  • the operation result of the subtractor 112 is (r 2 —z L 21 ).
  • the divider 114 divides (r 2 —zi ⁇ L 21 ), which is the operation result of the subtractor 112, by L 22 read from the third memory 113.
  • the output of the divider 114 is z 2, which is input to the fourth memory 115 and the first register 101.
  • the solutions that have already been obtained are stored.
  • the second half of the shift register 102 is operated first.
  • the number of multipliers is reduced by half, and time-division processing is performed, such that the first half of the operation is performed when the previous solution is found.
  • the forward substitution operation can be performed with a high-speed and small-scale circuit configuration.
  • the solution d is a matrix with n rows and 1 column.
  • the upper triangular matrix is, for example, a matrix in which the lower left half is all zero and the matrix elements are arranged in the upper right half as shown in FIG.
  • the matrix operation device of FIG. 2 stores the operation result obtained at the present time (the first register 101 storing d) and the operation result obtained up to the present time (d n to di + 1 ).
  • two registers 104 the matrix operation device of FIG. 2 stores the operation result obtained at the present time (the first register 101 storing d) and the operation result obtained up to the present time (d n to di + 1 ).
  • the matrix operation device uses the first half (n / 2-1) of the first register 101 and the shift register 102 as the first half, and the second half (n / 2-1) of the shift register as the second half.
  • a first switch 105 for controlling reading of either the first half or the second half, and a first half (nZ2) memo of the first memory 103;
  • a second switch 106 is a second switch 106.
  • the matrix operation device includes nZ two multipliers 107 for multiplying the output value of the first switch 105 and the output value of the second switch 106, and nZ two The first adder 1 08 that adds all the operation results output from the multiplier 1 107 of the first adder 1 0 and the first adder 1 08 when the operation result is obtained by reading the second half. And a third register 109 for storing the result of the adder 1 of 108. Further, when the result of the first adder 108 is obtained by reading the first half, the matrix operation device stores the result of the operation of the first adder 108 and the third register 109.
  • the second adder 1 10 Read from the second adder 1 10 for adding the current value, the second memory 1 11 1 for storing the elements of a known matrix of n rows and 1 column, and the second memory 1 1 1 And a third memory 1 13 for storing the diagonal elements of a known upper triangular matrix.
  • a divider 114 for dividing the output value from the subtractor 112 by the value read from the third memory 113, and a divider 114 for storing the operation result output from the divider 114. And four memories 1 15.
  • the solutions are sequentially stored in the fourth memory from the solution dacious.
  • the known element of the upper triangular matrix L is stored in the first memory 103.
  • the second memory 1 1 1 stores all elements of a known matrix z (z 1N z 2 ,..., Z n ) having n rows and 1 column.
  • the third memory 113 stores diagonal elements (L u , L 22 ,..., L nn ) of the upper triangular matrix L.
  • the first memory 103 is composed of n_1 memories, each of which regularly stores general elements of the upper triangular matrix L.
  • the M (1) in the first memory 1 0 3 (L 12, L 23, L 34, ... ⁇ L n - ⁇ 0) is stored.
  • the M (2) (L 13, L 24 ⁇ L 35, ... ⁇ L n - 2, n ⁇ 0, 0) is stored.
  • M (n ⁇ 2) has — L 2 , 0, ⁇ , 0) force S and M (n ⁇ 1) has (L 1; n , 0, " ⁇ , 0) Is stored.
  • Each memory has n addresses.
  • the memory address is decremented in synchronization with the timing at which the shift register 102 shifts. As a result, data is sequentially read from the memory.
  • the first register 101 and the shift register 102 store an initial value of 0.
  • the first switch 105 is controlled so that a value is read from the second half of the shift register 102.
  • the second switch 106 is controlled so that a value is read from the second half of the first memory 103.
  • the result of the first adder 108 is also 0, and 0 is stored in the third register 109.
  • the first switch 105 is switched so that the value is read out from the first half of the first register 101 and the first half of the shift register 102, and input to the multiplier 107.
  • the first half of the first memory 103 and the second register 104 are controlled so as to be read and input to the multiplier 107.
  • the output of the second adder 1 110 is also 0. From the n elements z n reads and subtracter 1 1 2 and have your z n of the second adder of the arithmetic is finished second memory 1 1 1 from the known matrix z of the second adder 1 1 0 Subtract the operation result. Also, the third register 109 is initialized.
  • the diagonal element L nn of the upper triangular matrix is read from the third memory 1 13.
  • the output value of the subtracter 112 is divided by Lnn .
  • Solution d n from the output of the divider 1 14 is z n / L nn is obtained at this point.
  • the resulting solution d n is Ru is within Kakuito the first register 10 1 at the same time and stored in the memory 1 1 5 of the fourth.
  • control is performed such that the first switch 105 is switched to read a value from the second half of the shift register 102.
  • control is performed such that the second switch 106 is switched to read a value from the second half of the first memory 103.
  • the second half of the shift register 102 and the second half of the first memory are input to the multiplier 107 to perform multiplication. Even at this time, since the values of the shift register 102 are all 0, the multiplication results are all 0.
  • the multiplication result is input to the first adder 108 and added.
  • the result of the adder is also 0.
  • the addition result is stored in the third register 109.
  • the first switch 105 Contact Yopi second switch 106 switching is not performed.
  • the operation for finding the solution d n enters a standby state.
  • the multiplier 107 multiplies the output of the first switch 105 and the output of the second switch 106.
  • the solution d n is stored in the first register 101, and since the read address of the first memory 103 is decremented, L, is read from M (1).
  • the value stored in the third register 109 is added to the operation result of the first adder 108.
  • the output of the second adder 110 is d n ⁇ L n — 1> n .
  • the readout address of the second memory 1 1 1 The address is decremented and the value is read out and input to the subtractor 112.
  • the subtractor 112 subtracts the output value d n 'L n — 1> n of the second adder 110 from Z lrt read from the second memory 111 .
  • the operation result of the subtractor 112 is (z n —L n — 1> n ).
  • the read address of the third memory 113 is decremented, the diagonal element of the upper triangular matrix L is read, and input to the divider 114.
  • the divider 114 divides (z n —d L), which is the operation result of the subtractor 112, by reading from the third memory 113.
  • the operation result of the divider 114 is (z n — “d n '/ L. From equation (6), (z n — 1 — d n -L n _ 1> n ) ZL n — The output of the divider 114 is ## EQU2 ## which is input to the fourth memory 115 and the first register 101.
  • the second half of the shift register 102 in which the solution that has already been obtained is stored is operated first, and the operation of the first half is performed when the immediately preceding solution is obtained.
  • the number of multipliers is reduced by half, and time-division processing is performed.
  • the matrix operation device of the present invention it is possible to perform the backward substitution operation with a high-speed and small-scale circuit configuration.
  • the matrix operation device can also be applied to joint detection demodulation, which is a method for demodulating a signal received through wireless communication.
  • Joint detection demodulation (JD demodulation) is a demodulation method suitable for W-CDMA TDD mode communication.
  • JD demodulation enables accurate removal of interference components that cannot be eliminated by autocorrelation alone, thus enabling more accurate demodulation.
  • FIG. 7 shows a propagation model of a signal for performing JD demodulation.
  • d (1) to d (K) indicate signals transmitted by ⁇ users, respectively, which are to be demodulated.
  • c (1) to c (K) are spreading codes, and h (1) to h (K) are estimated propagation characteristics (delay profile: estimated channel impulse response).
  • b (1) to b ( ⁇ ) are vectors obtained by convolution of the spreading code and propagation characteristics.
  • the reception signal e is obtained by adding the noise n to this.
  • FIG. 8 is a diagram showing a matrix of the propagation model of FIG.
  • a H e represents the symbol data after RAKE combining, to serial r and table.
  • ⁇ ⁇ ⁇ + ⁇ 2 ⁇ F.
  • F d r.
  • the matrix d can be obtained by generating and multiplying the inverse matrix of the matrix F.
  • JD demodulating section 203 of the present embodiment obtains elements of matrix d by performing Cholesky decomposition of matrix F and solving simultaneous linear equations.
  • FIG. 6A is a block diagram illustrating a configuration of a CDMA receiving device including a JD demodulation unit 203
  • FIG. 6B is a diagram illustrating a format of transmission data.
  • the signal received by antenna 2 is amplified by radio receiving section 10 and input to channel estimating section 201 and despreading section 207.
  • Channel estimation section 201 estimates the channel of each user's signal from the impulse response of a known signal (midamble code) included in the received signal.
  • the midamble code is a known code for channel estimation inserted in the center of one slot as shown in FIG. 6B.
  • the channel estimating section 201 includes a midamble correlation processing section 20, a midamp record generating section 24, and a path selecting section 22.
  • the signal that has passed through radio receiving section 10 is despread by despreading section 207.
  • the channel estimation value obtained by channel estimation section 201 is input to RAKE combining section 202 and JD demodulation section 203.
  • phase compensation is performed based on the channel estimation value
  • RAKE combining is performed in RAKE combining section 202
  • RAKE combining result r is input to JD decoding section 203.
  • JD demodulation section 203 includes cross-correlation matrix (F) generation section 204 for obtaining cross-correlation matrix (F) from the spreading code given from spreading code generation section 30 and the estimated impulse response of the line, Cholesky Decomposition or Modified Cholesky Decomposition of Cholesky Decomposition Unit 205, which is in the form of a product of a lower triangular matrix and an upper triangular matrix, and a simultaneous linear equation expressed in a form including a lower triangular matrix or an upper triangular matrix And a simultaneous equation calculation unit 206 that calculates a solution using forward substitution or backward substitution.
  • F cross-correlation matrix
  • F cross-correlation matrix
  • Modified Cholesky Decomposition of Cholesky Decomposition Unit 205 which is in the form of a product of a lower triangular matrix and an upper triangular matrix, and a simultaneous linear equation expressed in a form including a lower triangular matrix or an upper triangular matrix
  • a simultaneous equation calculation unit 206 that
  • the simultaneous equation operation unit 206 includes the matrix operation device shown in FIG.
  • FIG. 9 is a diagram for explaining the function of the cross-correlation matrix (F) generation unit 204.
  • the spreading code (.) Output from the portion 900 storing the spreading code and the channel estimation output from the portion 902 storing the channel estimation value
  • the value parameter (h ⁇ hj) is convolved using the adder 903 (903a to 903c, etc.) and the adder 904 to obtain the beta to ⁇ .
  • This equation is solved for d by the simultaneous equation calculation unit 206.
  • the matrix F is a symmetric matrix
  • Cholesky decomposition including modified Cholesky decomposition
  • L H is the conjugate transpose of L.
  • Cholesky decomposition (205) performs Cholesky decomposition (or modified Cholesky decomposition).
  • the matrix operation device of the present invention can also be used for a communication device in which an adaptive array based on the least squares error method (MMSE) is mounted.
  • MMSE least squares error method
  • Figure 10 shows a communication device that implements an adaptive array based on the least squares error method (MMSE).
  • MMSE least squares error method
  • Received signals are input from antenna 301, antenna 302, and antenna 303.
  • the signal received from each antenna is multiplied by the weight generated by the weight generation unit 305 so as to perform optimal weighting so that the demodulator 304 of the received signal can perform optimal demodulation.
  • the weight w (1 0) It can be obtained from the equation.
  • R is a symmetric matrix
  • Cholesky decomposition or modified Cholesky decomposition
  • it solves a system of linear equations for the upper triangular matrix and a system of linear equations for the lower triangular matrix. Therefore, by using the matrix operation device of the present invention, an optimal weight can be obtained at high speed.
  • the matrix operation device of the present invention can also be used for a communication device in which an adaptive equalizer is mounted.
  • An adaptive equalizer is a filter that precisely controls the time response of a transmission line and smoothes the amplitude and delay characteristics of the transmission line.
  • Figure 11 shows a communication device equipped with an adaptive equalizer.
  • the received signal is input to transversal filter (FIR) 401 and weight calculation section 402.
  • FIR transversal filter
  • the weight calculation unit 402 calculates an optimum tap coefficient of the transversal filter 401.
  • the optimum tap coefficient is calculated as follows. Let the optimal tap coefficient be an M-row x 1-column matrix w, the autocorrelation matrix of the received signal be (M-row XM column matrix), and the cross-correlation matrix between the received signal and the desired response corresponding to the known signal be P (Matrix with M rows and 1 column), the following equation (1 1) holds.
  • the autocorrelation matrix R of the received signal is given as follows.
  • the cross-correlation matrix P between the desired response d (n) and the received signal is given as follows.
  • Equation (11) is solved for w, the optimum tap coefficient is found, and the optimum filter is generated.
  • R is a symmetric matrix, Cholesky decomposition and modified Cholesky decomposition can be performed.
  • the matrix operation device of the present invention is also effective in the operation of the Cap-on method, which is a type of arrival direction estimation algorithm, in a communication device in which an adaptive array is mounted.
  • the matrix calculator of the present invention is capable of finding an inverse matrix at high speed. Can be.
  • an ultra-high-speed matrix operation can be executed by a cyclic operation processing circuit configured by hardware. This achieves, for example, 10 times the speedup of the related art.
  • the number of the multipliers can be reduced reasonably and effectively, and the miniaturization of the device and the reduction in power consumption can be promoted.
  • the first half multiplication is started after the immediately preceding element is obtained. For example, half the number of multipliers is sufficient as compared with simultaneous multiplication and product-sum operation after finding the immediately preceding element. The processing time remains almost unchanged.
  • the matrix operation device of the present invention is suitable for LSI, and is therefore applicable to mobile communication devices such as mobile phones.
  • the present invention can be applied to a matrix operation device that can be used to obtain a solution of a system of linear equations expressed in a form including an upper triangular matrix or a lower triangular matrix.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Operations Research (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Algebra (AREA)
  • Complex Calculations (AREA)

Abstract

Les solutions d'équations linéaires simultanées exprimées par une forme comprenant une matrice triangulaire supérieure ou inférieure sont obtenues par substitution vers l'avant ou vers l'arrière au moyen d'un dispositif de taille réduite et à faible consommation d'énergie à très grande vitesse. Un dispositif d'opération matricielle constitué de matériel est composé d'une unité d'opération de produit-somme (101, 102, 103, 105, 106, 107, 108, 109, 110) et une unité d'opération linéaire du résultat d'une opération de produit-somme. Pendant que l'unité d'opération linéaire fonctionne pour déterminer la solution de la nème équation linéaire, l'unité d'opération de produit-somme effectue préalablement une opération de produit-somme du terme non inclus dans la solution de la nème équation linéaire hors des termes de l'opération de produit-somme nécessaires à la détermination de la solution de la (n+1)ème équation linéaire suivante et utilise des multiplicateurs d'une unité de multiplication (107) par une division dans le temps.
PCT/JP2003/002696 2003-03-07 2003-03-07 Dispositif d'opération matricielle WO2004079585A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/JP2003/002696 WO2004079585A1 (fr) 2003-03-07 2003-03-07 Dispositif d'opération matricielle
CN03821223.4A CN1682214A (zh) 2003-03-07 2003-03-07 矩阵运算装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2003/002696 WO2004079585A1 (fr) 2003-03-07 2003-03-07 Dispositif d'opération matricielle

Publications (1)

Publication Number Publication Date
WO2004079585A1 true WO2004079585A1 (fr) 2004-09-16

Family

ID=32948271

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2003/002696 WO2004079585A1 (fr) 2003-03-07 2003-03-07 Dispositif d'opération matricielle

Country Status (2)

Country Link
CN (1) CN1682214A (fr)
WO (1) WO2004079585A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111052111A (zh) * 2017-09-14 2020-04-21 三菱电机株式会社 运算电路、运算方法以及程序
CN116560733A (zh) * 2023-07-07 2023-08-08 中国兵器科学研究院 一种空间目标特征在轨实时并行lu分解计算系统及方法

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100449522C (zh) * 2007-07-12 2009-01-07 浙江大学 基于多fpga的矩阵乘法并行计算系统
CN100465876C (zh) * 2007-07-12 2009-03-04 浙江大学 基于单fpga的矩阵乘法器装置
CN102541507B (zh) * 2010-12-31 2015-12-16 联芯科技有限公司 维度可重配的数据处理方法、系统和矩阵乘法处理器
CN106844294B (zh) * 2016-12-29 2019-05-03 华为机器有限公司 卷积运算芯片和通信设备
CN109766515B (zh) * 2018-12-26 2023-04-14 上海思朗科技有限公司 矩阵分解处理装置及方法

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS54132144A (en) * 1978-04-05 1979-10-13 Nippon Telegr & Teleph Corp <Ntt> Multiple process system
JPS61241879A (ja) * 1985-04-18 1986-10-28 Fanuc Ltd 空間積和演算装置
JPS62175868A (ja) * 1986-01-30 1987-08-01 Fujitsu Ltd 行列演算処理方式
US5436929A (en) * 1992-06-26 1995-07-25 France Telecom Decision feedback equalizer device and method for the block transmission of information symbols
JPH0830583A (ja) * 1994-07-13 1996-02-02 Fujitsu Ltd 連立方程式の求解装置
JPH08327721A (ja) * 1995-05-30 1996-12-13 Japan Radio Co Ltd ディコンボルューション回路
JPH0990018A (ja) * 1995-09-20 1997-04-04 Japan Radio Co Ltd デコンボルーション回路
JPH09212489A (ja) * 1996-01-31 1997-08-15 Fujitsu Ltd 対称行列の固有値問題を解く並列処理装置および方法
JP2001243272A (ja) * 2000-03-01 2001-09-07 Univ Waseda 電子回路シュミレータ
JP2003122736A (ja) * 2001-10-11 2003-04-25 Matsushita Electric Ind Co Ltd 行列演算装置

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS54132144A (en) * 1978-04-05 1979-10-13 Nippon Telegr & Teleph Corp <Ntt> Multiple process system
JPS61241879A (ja) * 1985-04-18 1986-10-28 Fanuc Ltd 空間積和演算装置
JPS62175868A (ja) * 1986-01-30 1987-08-01 Fujitsu Ltd 行列演算処理方式
US5436929A (en) * 1992-06-26 1995-07-25 France Telecom Decision feedback equalizer device and method for the block transmission of information symbols
JPH0830583A (ja) * 1994-07-13 1996-02-02 Fujitsu Ltd 連立方程式の求解装置
JPH08327721A (ja) * 1995-05-30 1996-12-13 Japan Radio Co Ltd ディコンボルューション回路
JPH0990018A (ja) * 1995-09-20 1997-04-04 Japan Radio Co Ltd デコンボルーション回路
JPH09212489A (ja) * 1996-01-31 1997-08-15 Fujitsu Ltd 対称行列の固有値問題を解く並列処理装置および方法
JP2001243272A (ja) * 2000-03-01 2001-09-07 Univ Waseda 電子回路シュミレータ
JP2003122736A (ja) * 2001-10-11 2003-04-25 Matsushita Electric Ind Co Ltd 行列演算装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
B.G. MERTZIOS AND A.N. VENETSANOPOULOS, [MINIMUM CYCLE FAST IMPLEMENTATION OF TWO-DIMENSIONAL DIGITAL FILTER BASED ON THE DECOMPOSITION], Signal Processing, July 1989, Vol. 17, No. 3, pages 227 to 239 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111052111A (zh) * 2017-09-14 2020-04-21 三菱电机株式会社 运算电路、运算方法以及程序
CN116560733A (zh) * 2023-07-07 2023-08-08 中国兵器科学研究院 一种空间目标特征在轨实时并行lu分解计算系统及方法
CN116560733B (zh) * 2023-07-07 2023-10-24 中国兵器科学研究院 一种空间目标特征在轨实时并行lu分解计算系统及方法

Also Published As

Publication number Publication date
CN1682214A (zh) 2005-10-12

Similar Documents

Publication Publication Date Title
CN103999078B (zh) 具有包含用于fir滤波的矢量卷积函数的指令集的矢量处理器
US6959065B2 (en) Reduction of linear interference canceling scheme
CN101553995B (zh) 联合检测器的定点实现
EP1512218A2 (fr) Procede et appareil d&#39;annulation de midamble parallele
WO2005076493A1 (fr) Interpolation post-desetalement dans des systemes amrc
JP5032506B2 (ja) 積和演算を実行する方法及び装置
WO2004079585A1 (fr) Dispositif d&#39;opération matricielle
Rajagopal et al. A programmable baseband processor design for software defined radios
EP2070205A2 (fr) Implémentation de points fixes d&#39;un détecteur conjoint
JP2003122736A (ja) 行列演算装置
CN102611648B (zh) 一种串行干扰抵消系统及方法
US20040181565A1 (en) Matrix calculation device
US7924948B2 (en) Pre-scaling of initial channel estimates in joint detection
JP4141550B2 (ja) マルチユーザ受信機
JP3503433B2 (ja) スペクトル拡散受信機
US7916841B2 (en) Method and apparatus for joint detection
JP4167112B2 (ja) スペクトル拡散によって送信されたデータの検出のための方法および装置
US20060146759A1 (en) MIMO Kalman equalizer for CDMA wireless communication
JP2003332945A (ja) Cdma復調器回路用共通データパスレーキ受信器
KR20020000391A (ko) 시간 다중화된 파일럿을 이용한 광대역 시디엠에이채널응답 추정장치
US20030128748A1 (en) Path search for CDMA implementation
JPH06216715A (ja) ディジタルフィルタ
BehnaamAazhang Efficient Algorithms and Architectures for Multiuser Channel Estimation and Detection in Wireless Base-Station Receivers
Scheibler et al. Interference Mitigation for WCDMA Using QR Decomposition and a CORDIC-based Reconfigurable Systolic Array
JP2001156682A (ja) マッチドフィルタおよびそれを用いた大規模集積回路と通信システム

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): CN

WWE Wipo information: entry into national phase

Ref document number: 20038212234

Country of ref document: CN