WO2019135355A1 - 演算回路 - Google Patents
演算回路 Download PDFInfo
- Publication number
- WO2019135355A1 WO2019135355A1 PCT/JP2018/046496 JP2018046496W WO2019135355A1 WO 2019135355 A1 WO2019135355 A1 WO 2019135355A1 JP 2018046496 W JP2018046496 W JP 2018046496W WO 2019135355 A1 WO2019135355 A1 WO 2019135355A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- circuit
- value
- bit
- bit position
- circuits
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/544—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices for evaluating functions by calculation
- G06F7/5443—Sum of products
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/52—Multiplying; Dividing
- G06F7/523—Multiplying only
- G06F7/53—Multiplying only in parallel-parallel fashion, i.e. both operands being entered in parallel
- G06F7/5324—Multiplying only in parallel-parallel fashion, i.e. both operands being entered in parallel partitioned, i.e. using repetitively a smaller parallel parallel multiplier or using an array of such smaller multipliers
Definitions
- the present invention relates to an arithmetic circuit in digital signal processing, and more particularly to an arithmetic circuit that performs product-sum operation.
- the main operation in digital signal processing is a product-sum operation in which data of a digital signal expressed as a fixed-point binary number is multiplied by a coefficient, also expressed as a fixed-point binary number, and these are summed up Patent Document 1).
- FIG. 11 shows the configuration of a general product-sum operation circuit.
- each data x [n] is a fixed-point binary number, and the number of digits after the decimal point (bit width after the decimal point) is x_scale.
- Each coefficient c [n] is a fixed point binary number, and the number of digits after the decimal point is c_scale.
- the product-sum operation circuit includes a summing circuit 1001.
- the product-sum operation circuit includes a digit alignment circuit 1002.
- the digit alignment circuit 1002 adjusts the number of decimal places to the number of decimal places z_scale of the fixed-point number z to be output by the product-sum operation circuit by lower-order bit truncation or rounding of y.
- the number of decimal places z_scale is a value smaller than the number of decimal places x_scale + c_scale of s. Therefore, the digit alignment circuit 1002 that performs truncation processing outputs a value obtained by removing (x_scale + c_scale-z_scale) lower bits of y. Further, the digit alignment circuit 1002 which performs rounding processing outputs a value obtained by adding the most significant bit of the bits to be deleted by the above-mentioned truncation processing to the value remaining by the truncation processing.
- the digit alignment circuit 1002 performs the truncation process or the rounding process on the value y summed by the summing circuit 1001 will be described below.
- data and coefficients contain noise components, and the ratio of noise components to signal components is particularly large in the lower bits. Therefore, the ratio of noise components in the lower bits of the multiplication result of data and coefficients also becomes large.
- quantization noise is included in bits lower than the number of decimal places of data or the number of decimal places of coefficients.
- a value output from a certain product-sum operation circuit is usually an input of another product-sum operation circuit using another coefficient value.
- the product-sum operation is performed in multiple stages, if the number of digits of the output value increases compared to the number of digits of the input value, the number of digits handled in the product-sum operation circuit in the subsequent stage increases.
- the number of digits after multiplication when the number of digits is not reduced by digit alignment circuit 1002 is not less than the sum of the number of digits of data and coefficient It becomes. Therefore, in the configuration where the product-sum operation circuit of the former stage outputs the value without reducing the number of digits, the product-sum operation circuit of the latter stage inputs the value to perform the calculation, the product-sum operation of the latter stage than the product-sum operation circuit of the former stage In the circuit, the circuit size and power consumption will be significantly increased.
- the product-sum operation circuit in the subsequent stage inputs a value with the number of digits increased by the product-sum operation circuit in the former stage to perform the operation, from the arithmetic processing for lower bits having a large ratio of noise components, No significant results can be obtained. Furthermore, the area of the circuit responsible for the product-sum operation in the subsequent stage and the power consumed by the circuit significantly increase due to the increase in the number of digits, so that the unnecessary circuit area and power consumption increase significantly. It occurs.
- the low-order bits with high noise components are deleted by the digit alignment circuit 1002 and limited to significant bit widths and output to the subsequent stage. Power) is reduced.
- the conventional product-sum operation circuit performs processing for deleting the lower bits having high noise components from the output value.
- the internal multiplication circuit 1000-n itself performs accurate multiplication processing regardless of whether the bit is a high noise component. Therefore, the low order bits accurately calculated by the multiplication circuit 1000-n in the conventional product-sum operation circuit are deleted as high noise components by the digit alignment circuit 1002.
- the multiplication circuit 1000-n has a characteristic that the circuit scale and the power consumption increase significantly with the increase of the number of digits (proportional to the square of the number of digits in the balanced tree type multiplication circuit). For this reason, although the increase in the number of digits leads to an increase in circuit elements and power consumption, the lower bits accurately calculated by the multiplier circuit 1000-n are deleted by the digit alignment circuit 1002 as high noise components.
- the present invention has been made to solve the above-described problems, and it is an object of the present invention to provide an arithmetic circuit capable of reducing the circuit area and the power consumption.
- An arithmetic circuit that calculates and outputs the value z [m], and generates a LUT configured to output a value calculated for each pair when the N coefficients c [n] are divided into two each A value of the product-sum operation z [m] which is the result of adding the circuit and N data x [m, n] of the data set X [m] by the N coefficients c [n]
- M distributed arithmetic circuits configured to calculate and output in parallel for each of the M groups, each distributed arithmetic circuit corresponding to its own circuit A value obtained by dividing the N pieces of data
- a plurality of binomial distributed arithmetic circuits configured to calculate and output in parallel for each time, a first summing circuit that adds up values calculated by the plurality of binary term distributed arithmetic circuits, and the first summing circuit
- a process is performed to adjust the number of decimal places of the result added by the summing circuit to a predetermined number of decimal places smaller than the number of decimal places, and the process result is output as the value z [m] of the product-sum operation.
- Each binomial distributed operation which is composed of a digit alignment circuit configured to A path is provided for each bit position of two values of the same set among the N pieces of data x [m, n], and a value of 0 and two of the N coefficients c [n] are formed of the same set of The same set of the N data x [m, n] among the element values consisting of the two values and the value calculated by the LUT generation circuit from the two values of the coefficient c [n] And a plurality of index circuits configured to acquire one element value corresponding to two values of the same bit position constituting two values of for each bit position, and the plurality of index circuits
- the binomial product-sum operation is the sum of the values calculated by the plurality of bit position-specific arithmetic circuits configured to perform bit position-specific arithmetic operations on the plurality of element values and the plurality of bit position-specific arithmetic circuits.
- a second summing circuit configured to output as the value of Among the number of bit position-specific arithmetic circuits, the bit position-specific arithmetic circuits whose corresponding bit positions l of the self circuit are smaller than a predetermined value Lc (Lc is an integer less than 2 and less than L) are the index circuits corresponding to the self circuit. And invalidate the (Lc ⁇ l) bits of the least significant bit side of the element value acquired by
- the complex number coefficient C is applied to each of data corresponding to the self circuit among the complex number X [m], and the LUT generation circuit configured to calculate the value d_add of the sum of the real part value c_real and the imaginary part value c_imag
- the complex value Y [m] which is the result of multiplication and addition, is M distributed arith
- the arithmetic circuit according to the present invention is not a multiplication circuit that performs accurate multiplication on all bits without distinguishing upper bits and lower bits, but is a distributed operation in which LUTs are searched for each data bit position and element values are accumulated. By adopting this, it is possible to reduce the number of circuits for processing corresponding to the lower bits designated in advance for each bit position.
- the reduced circuit is a wasteful circuit for calculating the lower bit value invalidated by the digit alignment circuit because the noise component is high.
- the present invention has the effect of being able to reduce the area and power due to the useless circuit.
- FIG. 1 is a block diagram showing the configuration of an arithmetic circuit according to a first embodiment of the present invention.
- FIG. 2 is a block diagram showing the configuration of the distributed arithmetic circuit according to the first embodiment of the present invention.
- FIG. 3 is a block diagram showing the configuration of a two-term distributed arithmetic circuit according to the first embodiment of the present invention.
- FIG. 4 is a diagram for explaining the operation of the LUT index circuit in the first embodiment of the present invention.
- FIG. 5 is a diagram for explaining the concept of operation of the binary term distributed arithmetic circuit according to the first embodiment of the present invention.
- FIG. 6 is a diagram for explaining the concept of operation of the binary term distributed arithmetic circuit according to the first embodiment of the present invention.
- FIG. 1 is a block diagram showing the configuration of an arithmetic circuit according to a first embodiment of the present invention.
- FIG. 2 is a block diagram showing the configuration of the distributed arithmetic circuit according to
- FIG. 7 is a block diagram showing the configuration of an arithmetic circuit according to the second embodiment of the present invention.
- FIG. 8 is a block diagram showing the configuration of a distributed arithmetic circuit according to a second example of the present invention.
- FIG. 9 is a diagram for explaining the operation of the real part LUT index circuit and the imaginary part LUT index circuit in the second embodiment of the present invention.
- FIG. 10 is a diagram for explaining the operation concept of the distributed arithmetic circuit according to the second embodiment of the present invention.
- FIG. 11 is a block diagram showing the configuration of a conventional product-sum operation circuit.
- FIG. 1 is a block diagram showing the configuration of an arithmetic circuit according to a first embodiment of the present invention.
- Each of the data x [m, n] and the coefficient c [n] is a two's complement binary number representing a signed fixed-point number.
- the number of decimal places of each data x [m, n] is x_scale, and the number of decimal places of each coefficient c [n] is c_scale.
- the value of the product-sum operation z [m] is a two's complement binary number representing a signed fixed-point number, and the number of decimal places is z_scale.
- the arithmetic circuit of FIG. 1 includes one LUT generation circuit 1 and M (M is an integer of 2 or more) distributed arithmetic circuits 2-1 to 2-M.
- the binomial distributed arithmetic circuit 20m-n ' forms a LUT in which the values 0, c [2 ⁇ n'-1], c [2 ⁇ n'], d [n '] are the numerical values of the respective elements, and
- the result of product-sum operation c [2 ⁇ n′-1] ⁇ x [m, 2 ⁇ n′ ⁇ 1] + c [2 n ′] ⁇ x [m, 2 ⁇ n ′] is calculated by dispersion operation using LUT. Acquire and output as y '[m, n'].
- the result y '[m, n'] of the product-sum operation is a two's complement binary number representing a signed fixed-point number.
- the above description of the dispersion operation circuit 2-m is the case where N is an even number, and in the case where N is an odd number, c [N] ⁇ x [m, N] is calculated as shown in FIG.
- An auxiliary multiplication circuit 23m is added which outputs the result as y '[m, N' + 1].
- the digit alignment circuit 22m outputs the number of decimal places of y [m] by the arithmetic circuit by rounding off or discarding the lower bits of the result y [m] output from the distributed operation result summation circuit 21m. Processing is performed according to the number of decimal places z_scale of the power fixed-point number, and the processing result is output as z [m]. z_scale is a value smaller than the number of decimal places y_scale of y [m].
- the digit alignment circuit 22 m that performs truncation processing outputs a value obtained by deleting the lower (y_scale-z_scale) bits of y [m]. Further, the digit alignment circuit 22m that performs rounding processing outputs a value obtained by adding the most significant bit of the bits to be deleted by the above-mentioned truncation processing to the value remaining by the truncation processing.
- the binomial distributed arithmetic circuit 20m-n 'shown in FIG. 3 includes L LUT index circuits 200m-n'-l (select circuits) and L bit-specific arithmetic circuits 201m-n'-l. And a summing circuit 202m-n '.
- the LUT index circuit 200m-n'-l is a bit x [m, x at the bit position l corresponding to its own circuit among the data x [m, 2 ⁇ n'-1] and x [m, 2 ⁇ n '].
- LUT # m-n'-l is a two's complement binary number that represents a signed fixed-point number.
- each value of bits x [m, 2 ⁇ n′ ⁇ 1] [l] and x [m, 2 ⁇ n ′] [l] and the element value LUT # m of the LUT selected at that time are shown.
- the relationship with n'-l is shown.
- the relationship between each value of the bits x [m, 2 ⁇ n′-1] [l] and x [m, 2 ⁇ n ′] [l] and the element value LUT # m-n′-l is generally This is the same as the relationship between the address and the stored value in the LUT in the case where the two-term product-sum operation is performed using the distributed operation.
- LUT # m-n'-l is (c [2.times.n'-1] .times.x [m, 2.times.n'-1] [l] + c [2.times.n ']. Times.x [m, 2]. ⁇ n '] [l]).
- Each of the LUTs # m-n'-l is subjected to a bit-wise operation by a bit-wise operation circuit 201m-n'-l.
- the concept of the operation of the binomial distributed arithmetic circuit 20m-n ' will be described with reference to FIG.
- This left shift operation is equivalent to multiplying the element value LUT # m-n'-l by 2 (l-Lc) , and is added to the LSB side by the left shift operation (l-Lc)
- the bit value of is 0. Referring to FIG. 5, an example is shown in which bit position l is larger than Lc, and the bit position classified operation result BR # mn ′-(L ⁇ 1) is obtained.
- the result of performing a left shift operation for 2 (L-Lc) bits after performing code inversion on '-L is output as a bit position-specific calculation result BR # m-n'-L.
- the result of performing a right shift operation by (Lc-l) bits on the element value LUT # m-n'-l selected by the circuit 200 m-n'-l is the result of operation by bit position BR # m Output as -n'-l.
- This right shift operation corresponds to dividing the element value LUT # m-n'-l by 2 (Lc-l) , and the LSB side (Lc-l) number of bits before the right shift operation is the right After the shift operation, it is not held by the circuit and becomes invalid.
- the process of the right shift operation is equivalent to performing the truncation process on the LSB side (Lc-1) bits of the element value LUT # m-n'-l. Referring to FIG. 5, an example is shown in which the bit position l is smaller than Lc, as the bit position classified operation result BR # mn ′-(Lc ⁇ 1), BR # m n′ ⁇ 1.
- the LSB side (Lc-1) bit values become invalid.
- the LUT index circuit 200m-n'-l is used from four bit values for each bit position of the element values.
- a 4: 1 selector circuit is used to select the value.
- the LSB side (Lc-l) number of bits that become invalid by the bit position-specific arithmetic circuit 201 m-n'-l are the above-mentioned 4: By omitting one selector circuit, it is possible to reduce the circuit scale of the LUT index circuit 200m-n'-l.
- Each bit position operation result BR # m-n'-l is a two's complement binary number representing a signed fixed point number. Therefore, in the bit width adjustment described above, it is necessary to add a bit having the same value as that of the sign bit S # l to the MSB side.
- bit position classified operation circuit 201m-n'-l applies to element value LUT # m-n'-l.
- bit position classified operation circuit 201m-n'-l applies to element value LUT # m-n'-l.
- bit position classified operation circuit 201m-n'-l applies to element value LUT # m-n'-l.
- bit position classified operation circuit 201m-n'-l applies to element value LUT # m-n'-l.
- the value y '[m, n'] output from each of the two-term distributed arithmetic circuits 20m-n ' is the result of the product-sum operation of the data x [m, n] and the coefficient c [n].
- the number of decimal places when the result of the product-sum operation is obtained by the conventional technique (operation combining multiplication and addition) is the number of decimal places x_scale of data x [m, n] and the decimal point of coefficient c [n]
- the number of decimal places of the result of the product-sum operation is x_scale + c_scale.
- the arithmetic circuit of this embodiment is not a multiplier circuit that performs accurate multiplication on all bits, but acquires element value LUT # m-n'-l from LUT for each bit position l of data, With respect to the element value LUT # m-n'-l, the lower bits designated in advance according to the bit position l, which are (Lc-1) lower bits in this embodiment, are invalidated, Perform a cumulative variance operation.
- the present embodiment As compared with a conventional arithmetic circuit that does not perform invalidation, accumulation processing for the invalidated bits becomes unnecessary, and the circuit area and power consumption can be reduced accordingly. Further, since the low-order bits that are invalidated contain a lot of noise components, the rounding process and the round-off process are performed by the digit alignment circuit 22m in the conventional arithmetic circuit as well, so the low-order bits of this embodiment are invalidated. Does not degrade the accuracy of the value output by the arithmetic circuit.
- the arithmetic circuit according to the present embodiment is not a multiplier circuit that performs accurate multiplication on all bits without discriminating between high-order bits and low-order bits.
- FIG. 7 is a block diagram showing the configuration of an arithmetic circuit according to a second embodiment of the present invention.
- the figure is
- Each of the M complex values Z [m] corresponds to (C ⁇ X [m]). That is, the real part value z_real [m] corresponds to c_real x x real [m]-c imag x x imag [m].
- the imaginary part value z_imag [m] corresponds to c_imag ⁇ x_real [m] + c_real ⁇ x_imag [m].
- the real part value z_real [m] and the imaginary part value z_imag [m] are significant bits by removing the lower bits of the noise component from the multiplication result of the complex number X [m] and the complex coefficient C It is a value limited to the width.
- the real part value z_real [m] does not always completely match c_real x x real [m]-c imag x x imag [m].
- the imaginary part value z_imag [m] does not always completely match c_imag ⁇ x_real [m] + c_real ⁇ x_imag [m].
- the real part value z_real [m] and the imaginary part value z_imag [m] are binary numbers of 2's complement that represent signed fixed-point numbers, and the number of decimal places is z_scale.
- the arithmetic circuit shown in FIG. 7 includes one LUT generation circuit 1a, M (M is an integer of 2 or more) distributed arithmetic circuits 2a-1 to 2a-M, and M digit alignment circuits 3a-1 to 3a-M.
- the LUT generation circuit 1a receives the real part value c_real and the imaginary part value c_imag of the complex number coefficient C, and the value d_sub corresponding to the difference c_real-c_imag between the real part value c_real and the imaginary part value c_imag and the real part value c_real
- the value d_add corresponding to the sum c_real + c_imag of the imaginary part value c_imag is calculated, and the real part value c_real and the imaginary part value c_imag are output to the dispersion arithmetic circuits 2a-1 to 2a-M together with the values d_sub and d_add.
- the dispersion operation circuit 2a-m is a real part LUT having a value of 0, c_real, -c_imag, d_sub as a numerical value of each element, and an imaginary part LUT having a value of 0, c_imag, c_real, d_add as a numerical value of each element, and
- the result of product-sum operation of real part c_real ⁇ x_real [m] -c_imag ⁇ x_imag [m] is acquired by dispersion operation using the real part LUT, and is output as y_real [m] and an imaginary part
- the result of product-sum operation of imaginary part c_imag ⁇ x_real [m] + c_real ⁇ x_imag [m] is acquired by the dispersion operation using the LUT for LUT and output as y_imag [m].
- the digit alignment circuit 3a-m generates the number of decimal places of y_real [m] by the lower bit truncation or rounding off of the result y_real [m] of the product-sum operation of the real part output from the distributed arithmetic circuit 2a-m. Processing is performed according to the number of decimal places z_scale of the fixed point number to be output by the arithmetic circuit, and the processing result is output as z_real [m].
- digit matching circuit 3a-m generates the decimal point of y_imag [m] by the lower bit truncation or rounding off of the result y_imag [m] of the product-sum operation of the imaginary part output from distributed arithmetic circuit 2a-m.
- a process is performed to match the number with the number of decimal places z_scale of the fixed point number to be output by the arithmetic circuit, and the processing result is output as z_imag [m].
- z_scale is a value smaller than the number of decimal places y_scale of y_real [m] and y_imag [m].
- the digit alignment circuit 3a-m that performs truncation processing removes the lower (y_scale-z_scale) bits from y_real [m] and y_imag [m], and the values are respectively z_real [m], Output as z_imag [m]. Further, the digit alignment circuit 3a-m that performs rounding processing outputs a value obtained by adding the most significant bit of the bits to be deleted by the above-mentioned truncation processing to the value remaining by the truncation processing.
- the distributed arithmetic circuit 2a-m shown in FIG. 8 includes L real part LUT index circuits 203m-1 (real part selection circuits), a sign inverting circuit 204, and L real part bit-specific operations.
- the real part LUT index circuit 203m-l is the bit x_real [m] [l], x_imag [m] [l] of the data x_real [m], x_imag [m] at the bit position l corresponding to its own circuit. Based on this, one of the four element values of the real part LUT, that is, values 0, c_real, -c_imag, and d_sub, is selected, and the selected element value is acquired as LUTr # m-l.
- the element value LUTr # m-1 is a two's complement binary number representing a signed fixed point number.
- the LUT for LUT for imaginary part 207m-1 is the bit x_real [m] [l], x_imag [m] [l] in the bit position l corresponding to its own circuit among the data x_real [m] and x_imag [m]. Based on this, one element is selected from the four element values of the LUT for imaginary part, that is, the values 0, c_imag, c_real, and d_add, and the selected element value is acquired as LUTi # m-l.
- the element value LUTi # m-1 is a two's complement binary number representing a signed fixed point number.
- FIG. 9 shows the relationship between the values of the bits x_real [m] [l] and x_imag [m] [l] and the element values of the real part LUT and the imaginary part LUT selected at that time.
- the relationship between each value of bits x_real [m] [l] and x_imag [m] [l] and the element value of the real part LUT and the element value of the imaginary part LUT is a general two-term product-sum operation. This is the same as the relationship between the address and the stored value in the LUT in the case of performing the dispersion operation.
- the LUT element value LUTr # m selected for each bit position l (l 1,..., L) by the real part LUT index circuit 203m-l.
- bit position-specific operations are performed by the real part bit position-specific calculating circuit 205m-l.
- the LUT element value LUTi # m-1 selected for each bit position 1 by the imaginary part LUT index circuit 207m-1 is operated by bit position by the imaginary part bit position arithmetic circuit 208m-1. Will be applied.
- FIG. 10 is a conceptual diagram for explaining the calculation of the distributed calculation circuit 2a-m. Note that bit-position-specific operation processing and summation processing are the same for real part and imaginary part, so in FIG. 10, bit-position-specific computation processing for real part and addition processing are described as an example There is.
- left shift operations are equivalent to multiplying element values LUTr # m-1 and LUTi # m-1 by 2 (l-Lc) , and are added to the LSB side by the left shift operation (l Lc)
- the number of bit values is 0. Referring to FIG. 10, an example is shown in which the bit position l is larger than Lc, and the bit position classified operation result BRr # m- (L-1) is obtained.
- the real part bit-wise arithmetic circuit 205m-1 corresponds to the real part corresponding to the own circuit.
- the right shift operation for (Lc-1) bits is performed on the element value LUTr # m-1 selected by the LUT index circuit 203m-1 for the real part, and the result of operation by bit position for real part BRr # m- Output as l.
- the imaginary part bit-by-bit arithmetic circuit 208m-1 selects the element selected by the imaginary part LUT index circuit 207m-1 corresponding to its own circuit.
- a value obtained by performing a right shift operation by (Lc-l) bits on the value LUTi # m-l is output as a calculation result BRi # m-l per imaginary bit position.
- the operation is not performed, and the element value LUTr # m-Lc is output as it is as the calculation result BRr # m-Lc for each real part bit position.
- the element value LUTi # m-Lc is output as it is as the imaginary-part bit-position-based calculation result BRi # m-Lc.
- bit position-specific operation results BRr # m-l and BRi # m-l are two's complement binary numbers that represent signed fixed-point numbers. Therefore, in the above bit width alignment, it is necessary to add a bit having the same value as the sign bit to the MSB side.
- the LSB side (Lc ⁇ l) bit values become invalid.
- the LUT index circuits 203m-l and 207m-l to select one out of four element values of the LUT, one bit value from four bit values is selected for each bit position of the element values.
- a 4: 1 selector circuit is used to select bit values.
- bit position specific operation circuits 205m-1 and 208m-1 have element values LUTr # m-1 and LUTi # m-.
- bit position specific operation circuits 205m-1 and 208m-1 When performing a right shift operation by (Lc-1) bits for l, the value of 1 bit on the MSB side among the LSB side (Lc-1) bits to be invalidated is added to the result of the right shift operation. It is also possible to set the bit position operation result BRr # m-l and BRi # m-l. This process is equivalent to rounding off the LSB side (Lc-1) bits of the element values LUTr # m-1 and LUTi # m-1.
- the values y_real [m] and y_imag [m] output from the respective dispersion calculation circuits 2a-m are the product-sum operation results of the data x_real [m], x_imag [m] and the complex coefficient C.
- the number of decimal places when the result of the product-sum operation is obtained by the conventional technique (operation combining multiplication and addition) is the number of decimal places x_scale of data x_real [m] and x_imag [m] and the complex coefficient C
- the number of decimal places of the result of the product-sum operation is x_scale + c_scale.
- the arithmetic circuit of this embodiment is not a multiplier circuit that performs accurate multiplication on all bits, but the element values LUTr # m-l and LUTi # m-l from the LUT for each bit position l of data.
- the element value LUTr # m-l and LUTi # m-l are invalidated for the low-order bits designated in advance according to the bit position l, in this embodiment, (Lc-l) low-order bits. Then, a distributed operation is performed to accumulate them.
- the present embodiment As compared with a conventional arithmetic circuit that does not perform invalidation, accumulation processing for the invalidated bits becomes unnecessary, and the circuit area and power consumption can be reduced accordingly. Further, since the low-order bits that are invalidated contain a lot of noise components, the rounding process and the round-off process are performed by the digit alignment circuit 22m in the conventional arithmetic circuit as well, so the low-order bits of this embodiment are invalidated. Does not degrade the accuracy of the value output by the arithmetic circuit.
- the arithmetic circuit according to the present embodiment is not a multiplier circuit that performs accurate multiplication on all bits without discriminating between high-order bits and low-order bits. Reduces the area and power of the circuit without degrading the calculation accuracy by adopting the distributed calculation that acquires and accumulates l and LUTi # m-l and omitting the processing of the lower bits specified in advance for each bit position l There is an effect that can be done.
- the arithmetic circuit described in the first and second embodiments can be realized by, for example, an FPGA (Field Programmable Gate Array).
- FPGA Field Programmable Gate Array
- (Lc-1) bits are deleted from the above-mentioned accurate product-sum operation value.
- the value obtained by subtracting Log 2 (Lc) from the above Lr is the value of (Lc-1)
- Lc-1 ⁇ Lr ⁇ Log 2 (Lc).
- Lc is set to 5 or less.
- the present invention can be applied to arithmetic circuits.
- 1, 1a LUT generation circuit, 2-1 to 2-M, 2a-1 to 2a-M: dispersion operation circuit, 3a-1 to 3a-M, digit matching circuit, 20 m: 2-term dispersion operation circuit, 21 m: Distributed arithmetic result summing circuit, 22m ... digit matching circuit, 23m ... auxiliary multiplication circuit, 200m ... LUT index circuit, 201m ... bit position arithmetic circuit, 202m ... summing circuit, 203m ... real part LUT index circuit, 204 ...
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Algebra (AREA)
- Complex Calculations (AREA)
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US16/959,986 US11494165B2 (en) | 2018-01-05 | 2018-12-18 | Arithmetic circuit for performing product-sum arithmetic |
| CN201880085295.3A CN111630509B (zh) | 2018-01-05 | 2018-12-18 | 执行积和运算的运算电路 |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2018000452A JP6863907B2 (ja) | 2018-01-05 | 2018-01-05 | 演算回路 |
| JP2018-000452 | 2018-01-05 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2019135355A1 true WO2019135355A1 (ja) | 2019-07-11 |
Family
ID=67143671
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2018/046496 Ceased WO2019135355A1 (ja) | 2018-01-05 | 2018-12-18 | 演算回路 |
Country Status (4)
| Country | Link |
|---|---|
| US (1) | US11494165B2 (https=) |
| JP (1) | JP6863907B2 (https=) |
| CN (1) | CN111630509B (https=) |
| WO (1) | WO2019135355A1 (https=) |
Families Citing this family (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP6995629B2 (ja) * | 2018-01-05 | 2022-01-14 | 日本電信電話株式会社 | 演算回路 |
| CN116010762B (zh) * | 2021-10-22 | 2026-03-13 | 意法半导体(格勒诺布尔2)公司 | 包括硬件计算器的集成电路和相应的计算方法 |
| CN117335810A (zh) * | 2022-06-23 | 2024-01-02 | 加特兰微电子科技(上海)有限公司 | 数据压缩、解压缩方法及装置 |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2000132539A (ja) * | 1998-10-28 | 2000-05-12 | Matsushita Electric Ind Co Ltd | 演算装置 |
| JP2004171263A (ja) * | 2002-11-20 | 2004-06-17 | Sharp Corp | 演算装置 |
| US20050201457A1 (en) * | 2004-03-10 | 2005-09-15 | Allred Daniel J. | Distributed arithmetic adaptive filter and method |
Family Cites Families (16)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2630778B2 (ja) * | 1987-07-17 | 1997-07-16 | 三洋電機株式会社 | 低周波帯域デジタルフィルタの構成方法 |
| EP1728865A1 (en) * | 1998-09-04 | 2006-12-06 | Aventis Pasteur Limited | Treatment of cervical cancer |
| WO2000062421A1 (en) * | 1999-04-14 | 2000-10-19 | Nokia Networks Oy | Digital filter and method for performing a multiplication based on a look-up table |
| US7043515B2 (en) * | 2002-12-10 | 2006-05-09 | Isic Corporation | Methods and apparatus for modular reduction circuits |
| JP4724413B2 (ja) * | 2004-11-26 | 2011-07-13 | キヤノン株式会社 | データ分類方法 |
| EP1975906B1 (en) * | 2006-01-13 | 2012-07-04 | Fujitsu Ltd. | Montgomery s algorithm multiplication remainder calculator |
| GB2448744A (en) * | 2007-04-26 | 2008-10-29 | Wolfson Microelectronics Plc | Look-up table indexing scheme with null values used to expand table to have a power of two number of entries in each cycle of coefficients |
| CN101682297B (zh) * | 2007-06-04 | 2012-06-27 | Nxp股份有限公司 | 包含频带选择的数字信号处理电路和方法 |
| JP2011186592A (ja) * | 2010-03-05 | 2011-09-22 | Renesas Electronics Corp | フィルタ演算器、フィルタ演算方法及び動き補償処理装置 |
| KR20120077164A (ko) * | 2010-12-30 | 2012-07-10 | 삼성전자주식회사 | Simd 구조를 사용하는 복소수 연산을 위한 사용하는 장치 및 방법 |
| JP5920226B2 (ja) * | 2011-02-15 | 2016-05-18 | 日本電気株式会社 | 複素演算処理用コプロセッサ及びプロセッサシステム |
| ES2396673B2 (es) * | 2012-08-09 | 2014-01-24 | Universidade De Santiago De Compostela | Aparato y método para calcular operaciones de potenciación y extracción de raíces |
| US9753695B2 (en) * | 2012-09-04 | 2017-09-05 | Analog Devices Global | Datapath circuit for digital signal processors |
| US10019230B2 (en) * | 2014-07-02 | 2018-07-10 | Via Alliance Semiconductor Co., Ltd | Calculation control indicator cache |
| US10303735B2 (en) * | 2015-11-18 | 2019-05-28 | Intel Corporation | Systems, apparatuses, and methods for K nearest neighbor search |
| JP6995629B2 (ja) * | 2018-01-05 | 2022-01-14 | 日本電信電話株式会社 | 演算回路 |
-
2018
- 2018-01-05 JP JP2018000452A patent/JP6863907B2/ja active Active
- 2018-12-18 WO PCT/JP2018/046496 patent/WO2019135355A1/ja not_active Ceased
- 2018-12-18 US US16/959,986 patent/US11494165B2/en active Active
- 2018-12-18 CN CN201880085295.3A patent/CN111630509B/zh active Active
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2000132539A (ja) * | 1998-10-28 | 2000-05-12 | Matsushita Electric Ind Co Ltd | 演算装置 |
| JP2004171263A (ja) * | 2002-11-20 | 2004-06-17 | Sharp Corp | 演算装置 |
| US20050201457A1 (en) * | 2004-03-10 | 2005-09-15 | Allred Daniel J. | Distributed arithmetic adaptive filter and method |
Non-Patent Citations (1)
| Title |
|---|
| YI, RU ET AL.: "Implementation Consideration of Linear-Phase Delay Digital Filter Using Distributed Arithmetic on FPGA", JOINT RESEARCH PRESENTATION OF TOCHIGI AND GUNMA BRANCHES OF THE INSTITUTE OF ELECTRICAL ENGINEERS OF JAPAN, 29 February 2012 (2012-02-29), pages 18 - 20 , 25-26, XP055619872, [retrieved on 20190319] * |
Also Published As
| Publication number | Publication date |
|---|---|
| US11494165B2 (en) | 2022-11-08 |
| JP6863907B2 (ja) | 2021-04-21 |
| CN111630509A (zh) | 2020-09-04 |
| JP2019121172A (ja) | 2019-07-22 |
| CN111630509B (zh) | 2023-12-08 |
| US20210064340A1 (en) | 2021-03-04 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US10768898B2 (en) | Efficient modulo calculation | |
| WO2019135355A1 (ja) | 演算回路 | |
| US5862068A (en) | Arithmetic circuit for calculating a square-root of a sum of squares | |
| WO2022170811A1 (zh) | 一种适用于混合精度神经网络的定点乘加运算单元及方法 | |
| JP3598096B2 (ja) | ニュートンラフソン法を用いた演算方式 | |
| JPH04205026A (ja) | 除算回路 | |
| EP1049002A2 (en) | Method and apparatus for efficient calculation of an approximate square of a fixed-precision number | |
| WO2019135354A1 (ja) | 演算回路 | |
| KR100329914B1 (ko) | 제산장치 | |
| US20170010862A1 (en) | Apparatus and method for performing division | |
| Iyer et al. | Generalised algorithm for multiplying binary numbers via vedic mathematics | |
| US7447726B2 (en) | Polynomial and integer multiplication | |
| Song et al. | Design of multiplier circuit based on signed-digit hybrid stochastic computing | |
| JP2645422B2 (ja) | 浮動小数点演算処理装置 | |
| US20030187900A1 (en) | Apparatus and method for calculation of divisions and square roots | |
| Ram et al. | Efficient hardware design of parameterized posit multiplier and posit adder | |
| JPH056263A (ja) | 加算器およびその加算器を用いた絶対値演算回路 | |
| JPH0793134A (ja) | 乗算器 | |
| RU2829089C1 (ru) | Умножитель по модулю | |
| Hu et al. | Comparison of constant coefficient multipliers for CSD and booth recoding | |
| RU2751802C1 (ru) | Умножитель по модулю | |
| US20040143619A1 (en) | Sparce-redundant fixed point arithmetic modules | |
| KR950015180B1 (ko) | 고속연산형 가산기 | |
| US20230142818A1 (en) | Circuits and methods for multiplying large integers over a finite field | |
| KR100241066B1 (ko) | 단일명령사이클에서의 A+sin(A)식의 연산 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18898669 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 18898669 Country of ref document: EP Kind code of ref document: A1 |