WO2019064774A1 - 情報処理装置、および情報処理方法 - Google Patents
情報処理装置、および情報処理方法 Download PDFInfo
- Publication number
- WO2019064774A1 WO2019064774A1 PCT/JP2018/024923 JP2018024923W WO2019064774A1 WO 2019064774 A1 WO2019064774 A1 WO 2019064774A1 JP 2018024923 W JP2018024923 W JP 2018024923W WO 2019064774 A1 WO2019064774 A1 WO 2019064774A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- product
- information processing
- sum operation
- remainder
- divisor
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/483—Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Definitions
- the present disclosure relates to an information processing apparatus and an information processing method.
- Non-Patent Document 1 describes a method of reducing the processing load by binarizing the weight coefficient.
- Non-Patent Document 2 describes a method of converting multiplication into addition by converting an input signal into a log domain.
- Non-Patent Document 1 since binarization is performed using +1 or ⁇ 1, it is assumed that the granularity of quantization becomes rough as the number of dimensions of the weighting factor increases. Further, although the method described in Non-Patent Document 2 has a predetermined effect in avoiding multiplication, it is assumed that there is further room for reducing the processing load.
- the present disclosure proposes a new and improved information processing apparatus and information processing method capable of further reducing the processing load for inner product calculation and guaranteeing the quantization particle size of the weighting factor.
- a product-sum operation circuit that executes a product-sum operation based on a plurality of input values and a plurality of weighting coefficients quantized by a power expression respectively corresponding to the input values.
- the quantized exponent of the weighting factor is represented by a fraction having a predetermined divisor as a denominator, and the product-sum operation circuit calculates a product sum using different addition multipliers based on the remainder determined from the divisor
- the processor executes a product-sum operation based on a plurality of input values and a plurality of weighting coefficients quantized by a power expression respectively corresponding to the input values.
- the quantized exponent of the weighting factor is represented by a fraction having a predetermined divisor as a denominator, and performing the product-sum operation may add different adding multipliers based on the remainder determined from the divisor.
- An information processing method is provided that performs the product-sum operation used.
- FIG. 27 is an enlarged view of data of SNR 7 to 9 dB in FIG. 26. It is a figure which shows the BER evaluation result at the time of using QPSK based on the embodiment for a modulation system. It is the figure which expanded the data of SNR10-12dB in FIG. It is a figure which shows the BER evaluation result at the time of using 16 QAM which concerns on the embodiment for a modulation system. It is the figure which expanded the data of SNR 16-18 dB in FIG. It is a figure showing an example of hardware constitutions concerning one embodiment of this indication.
- FIG. 1 is a conceptual diagram for explaining an outline of basic operations in a neural network.
- FIG. 1 shows two layers constituting a neural network, and cells c1 1 to c1 N and a cell c2 1 belonging to the two layers, respectively.
- the input signal inputted to the cell C2 1 (hereinafter, the input vector, and also referred to), the input vector and the weighting factor according to cell c1 1 ⁇ c1 N belonging to the lower layer (hereinafter, the weight vector, and also referred to) and in It is decided based on. More specifically, the input vector input to the cell c2 1 is a value obtained by adding the bias b to the result of the inner product calculation of the input vector and the weight vector related to the cells c1 1 to c1 N and further processed by the activation function h. It becomes.
- FIG. 2 is a schematic diagram for explaining the inner product calculation of the input vector x and the weight vector w.
- FIG. 3 is a diagram for describing a weight vector w which has been binary-quantized in a two-dimensional space.
- the granularity of the weight vector w can be expressed by the rotation angle ⁇ in a plane, and the granularity becomes 90 degrees as shown in FIG.
- FIG. 4 is a diagram for explaining weight vector w quantized in four values in a two-dimensional space.
- the granularity of the weight vector w that is, the rotation angle ⁇ is approximately 15 degrees, and it is possible to guarantee finer granularity compared to the case of binary quantization.
- FIG. 5 is a diagram for explaining the dispersion of the particle size of the weight vector w in the three-dimensional space.
- the (1, 1, 0) direction is Since the length of the side of is ⁇ ⁇ ⁇ 2 times the length of the side in the (0, 0, 1) direction, it can be seen that the variation of the particle size at the time of quantization becomes large.
- FIG. 6 is a diagram for explaining the dispersion of the granularity of the weight vector w in the N-dimensional space.
- FIG. 6 shows the faces defined by (1, 1,..., 1, 0) and (0, 0,..., 0, 1) in the N-dimensional space.
- the length of the side in the (1, 1, ..., 1, 0) direction is ⁇ (N-1) of the length of the side in the (0, 0, ..., 0, 1) direction It can be represented by a double.
- N 100, 1, 1,. . .
- the length of the side in the (1, 0, 0) direction is 9999 times ( ⁇ 10 times) the side in the (0, 0,..., 0, 1) direction.
- an information processing apparatus and an information processing method according to a first embodiment of the present disclosure perform inner product operation using a weight vector quantized based on the granularity of vector directions in an N-dimensional hyperspherical plane. Is one of the features.
- An information processing apparatus and an information processing method according to a first embodiment of the present disclosure balance high approximation accuracy and reduction of processing load by quantizing the weight vector with a particle size that is neither too fine nor too coarse. It is possible. More specifically, the information processing apparatus and the information processing method according to the first embodiment of the present disclosure may perform an inner product operation using a weight vector represented by a power.
- the features of the information processing apparatus and the information processing method according to the first embodiment of the present disclosure will be described in detail.
- FIG. 7 is an example of a functional block diagram of the information processing apparatus 10 according to the present embodiment.
- the information processing apparatus 10 according to the present embodiment includes an input unit 110, an arithmetic unit 120, a storage unit 130, and an output unit 140.
- the above-mentioned composition is explained focusing on the function which the composition concerned has.
- the input unit 110 has a function of detecting various input operations by the operator.
- the input unit 110 according to the present embodiment may include various devices for detecting an input operation by the operator.
- the input unit 110 can be realized by, for example, various buttons, a keyboard, a touch panel, a mouse, a switch, and the like.
- the operation unit 120 has a function of performing an inner product operation based on a plurality of input values and a plurality of weighting factors respectively corresponding to the input values to calculate an output value.
- the operation unit 120 according to the present embodiment particularly performs an inner product operation related to forward propagation of a neural network.
- one of the features is that the computing unit 120 according to the present embodiment calculates the output value based on the weighting coefficient quantized based on the particle size in the vector direction on the surface of the N-dimensional hypersphere. More specifically, the operation unit 120 according to the present embodiment may calculate the output value based on the weighting factor represented by the power.
- the storage unit 130 has a function of storing a program, data, and the like used in each configuration included in the information processing apparatus 10.
- the storage unit 130 according to the present embodiment stores, for example, various parameters used in a neural network.
- the output unit 140 has a function of outputting various information to the operator.
- the output unit 140 may be configured to include a display device that outputs visual information.
- the display device described above may be realized by, for example, a cathode ray tube (CRT) display device, a liquid crystal display (LCD) device, an organic light emitting diode (OLED) device, or the like.
- the functional configuration example described above is merely an example, and the functional configuration example of the information processing apparatus 10 according to the present embodiment is not limited to such an example.
- the information processing apparatus 10 according to the present embodiment may further include a configuration other than that shown in FIG.
- the information processing apparatus 10 may further include, for example, a communication unit that performs information communication with another information processing terminal.
- the functional configuration of the information processing apparatus 10 according to the present embodiment can be flexibly changed in design.
- the information processing apparatus 10 can maintain high uniformity of the particle size by performing quantization with the weight vector w represented by the power.
- the operation unit 120 normalizing the plurality of weight vector component w i rearranged in ascending order of value, also the plurality of weight vector component w i the most large value of the weight coefficient w i To be one of the features.
- the weight vector w j is expressed by the following Equations (2) to (4).
- ⁇ in the above equation (2) is 0 ⁇ ⁇ 1, s j is s j ⁇ ⁇ 1, 1 ⁇ , and n j is n j ⁇ ⁇ 0, 0 1, 2,. . . ⁇ , May be. That is, the operation unit 120 according to the present embodiment performs quantization with n j as an integer. At this time, the inner product operation performed by the operation unit 120 is expressed by the following equation (5). K in the following equation (5) represents a normalization constant. Further, the value of ⁇ may be finally determined in the above-mentioned range in the inner product calculation even when the following equation (5) is appropriately modified.
- the formulas shown in the present disclosure are merely examples and can be flexibly deformed.
- the inner product operation by the operation unit 120 according to the present embodiment can be processed by N addition operations and the number of multiplications of the order of -1/2 log (N-1) / log ⁇ . .
- one feature is that the weight vector w is approximated by a power expression of ⁇ , and the weight vector w is rearranged in the order of small values.
- quantization of the weight vector w is performed by converting the index of ⁇ to t according to N.
- N 100
- Most of the bits are quantized by the same value to be 0, so that the number of multiplications can be significantly reduced.
- n j-1 ⁇ n j can take values other than 0 only four times. Therefore, in the case of this example, the number of multiplications for the inner product calculation is only four, and the rest is all addition, so the processing load can be effectively reduced.
- the information processing apparatus 10 includes a product-sum operation circuit having a table for holding the address information of the input vector x corresponding to the plurality of weight vectors w rearranged in the order of small value. Good.
- FIG. 8 is an example of a circuit block diagram of the product-sum operation circuit 200 provided in the information processing apparatus 10 according to the present embodiment.
- the product-sum operation circuit according to this embodiment includes a storage circuit that holds a table WT that holds address information of an input vector x corresponding to a weight vector w, a RAM 210, an adder circuit 220, and an accumulator 230. , And a second multiplication circuit 250 which performs multiplication according to the normalization constant.
- the address table WT holds address information, code information, and multiplication instruction information of the input vector x corresponding to the plurality of weight vectors w rearranged in ascending order of values.
- the above-mentioned address information may include Null Pointer, as shown in FIG. In this case, 0 is added to the accumulator 230, and the value of the accumulator 230 can be simply multiplied by ⁇ .
- symbol information is information which shows the value corresponding to Sj in Numerical formula (5) mentioned above.
- the above-mentioned multiplication instruction information is information for instructing the processing content of the first multiplication circuit 240.
- the multiplication instruction information according to the present embodiment may include, for example, information specifying whether or not multiplication is necessary. In FIG. 8, when the multiplication instruction information is 0, the first multiplication circuit 240 does not perform multiplication, and when the multiplication instruction information is 1, the first multiplication circuit 240 multiplies ⁇ . An example of the case is shown.
- the multiplication instruction information according to the present embodiment is not limited to the above example, and may include information specifying various processing contents.
- the multiplication instruction information according to the present embodiment may include, for example, the number of times of multiplication, information specifying a shift operation, and the like.
- RAM210 according to the present embodiment outputs an input vector components x j corresponding to the weight vector component w j in accordance with the address information input from the address table WT to the adder circuit 220.
- the addition circuit 220 performs addition based on the input vector component x j input from the RAM 210 and the value output from the first multiplication circuit 240. At this time, the adder circuit 220 according to the present embodiment performs the above addition based on the code information held by the address table WT.
- the accumulator 230 accumulates the operation result output from the adding circuit 220.
- the accumulator 230 outputs the accumulated value to the first multiplication circuit 240 and the second multiplication circuit 250. Further, a reset signal for resetting the accumulated value to 0 is appropriately input to the accumulator 230.
- the first multiplication circuit 240 multiplies the value accumulated by the accumulator 230 by ⁇ . At this time, as described above, the first multiplication circuit 240 executes the above-described multiplication based on the multiplication instruction information held by the address table WT. The first multiplication circuit 240 outputs the operation result to the addition circuit 220.
- the second multiplication circuit 250 multiplies the value output from the accumulator 230 by the normalization constant K.
- the configuration example of the product-sum calculation circuit 200 according to this embodiment has been described above. According to the product-sum operation circuit 200 according to the present embodiment, the number of multiplications in the inner product operation can be effectively reduced, and the processing load can be reduced.
- the address table WT according to the present embodiment may include an offset indicating the relative position between the addresses.
- FIG. 9 is an example of offset notation according to the address information held by the address table WT according to the present embodiment.
- the address table WT in the section where the value of n j-1 ⁇ n j is continuously 0 in the above-described equation (5), that is, the section in which multiplication is not performed. Addresses may be sorted in address order, and offsets between the addresses may be held as address information. According to the above-described address table WT according to the present embodiment, the amount of information related to the address information can be significantly reduced, and the power consumption can be effectively reduced.
- the address table WT according to the present embodiment can take various forms other than the forms shown in FIGS. 8 and 9.
- the address table WT according to the present embodiment may not clearly hold code information and multiplication instruction information separately, and an address compression method other than the above may be adopted.
- the address table WT according to the present embodiment can be flexibly deformed according to the configuration of the neural network, the performance of the information processing apparatus 10, and the like.
- w max in the above equation (6) indicates the maximum value of w i . Also, for integerization int, you may choose round up or round down, whichever is closer.
- the address table WT described above can be generated by rearranging n i in final learning.
- w i quantized by a power expression is defined as w j by performing rearrangement normalization in the order of small values.
- the weight vector w is expressed by the following equation (7).
- FIG. 10 is a diagram showing a processing image of the information processing method according to the present embodiment.
- the weight vectors q 1 , q 2 ,. . . In the plane spanned by the axis projected onto q j -1 space and q j , the quantization particle size ⁇ of the weight vector is, as shown in FIG. And (9). However, in this case, l in Equations (8) and (9) is defined by Equation (10).
- FIG. 11 is a diagram for explaining the quantization particle size ⁇ according to the present embodiment. In FIG. 11, the weight vector projected in the first quadrant is shown.
- FIG. 12 is a graph showing the maximum value of the quantized particle size ⁇ according to ⁇ according to the present embodiment. As described above, according to the information processing method according to the present embodiment, the quantization particle size is ensured in all orthogonal rotation directions in the N-dimensional space.
- FIG. 13 is a diagram for explaining the maximum power multiplier according to the present embodiment.
- weight vectors projected in the first quadrant are shown.
- the following equation (13) may be added to the minimum m that satisfies the following equation (12) as the maximum exponent that guarantees the quantization particle size ⁇ . Therefore, the number of multiplications executed by the information processing apparatus 10 according to the present embodiment can be obtained by the following equation (14).
- FIG. 14 and FIG. 15 are diagrams showing an example of the number of multiplications with respect to the number N of inputs according to the present embodiment.
- the number of multiplications can be significantly reduced in the inner product calculation related to the forward propagation of the neural network. It is possible to effectively reduce the power consumption due to Further, according to the information processing apparatus for realizing the information processing method in the present embodiment, the quantization accuracy of the weight vector can be improved, and compared with the conventional quantization method with the same number of bits, the neural network can be used. The effect of improving recognition accuracy and approximation accuracy is expected.
- the method in the first embodiment is effective when the number of dimensions in the inner product space is relatively large, the processing load for the inner product operation with a relatively small number of dimensions such as CNN (Convolutional Neural Network) It is also assumed that the mitigation effect is not sufficient.
- CNN Convolutional Neural Network
- an arithmetic circuit capable of effectively reducing the processing load of the inner product calculation even if the number of dimensions of the inner product space is relatively small is proposed.
- the weight vector component w i and the input vector component x i may be expressed as ⁇ ⁇ n / p .
- ⁇ 2
- possible values of ⁇ ⁇ n / p can be expressed as shown in Table 1 below.
- Table 1 above shows that the larger the value of p, the smaller the granularity of the quantization. For this reason, in the second embodiment of the present disclosure, the quantization error is reduced as compared to the first embodiment by quantizing the weight vector component w i and the input vector component x i by ⁇ ⁇ n / p. It becomes possible. Further, according to the calculation method in the second embodiment of the present disclosure, processing equivalent to the inner product calculation described in the first embodiment can be performed only by shift calculation and addition, and the processing load in the inner product calculation can be reduced. It becomes possible to reduce effectively.
- the inner product operation shown in the above equation (19) can be realized, for example, by the product-sum operation circuit 300 shown in FIG. FIG. 16 is an example of the product-sum operation circuit in the case where the weight vector according to the present embodiment is quantized.
- the product-sum operation circuit 300 includes a shift operator 310, a remainder operator 320, selectors 330 and 340, an accumulator group 350, an adder-subtractor 360, a multiplier group 370, and an adder 380. Equipped with
- the shift computing unit 310 performs a shift operation based on the input vector components x i and n i that are input. Specifically, shift operator 310 bit shifts input vector component x i to the right by the value of int (n i / p).
- the remainder operation unit 320 performs an operation of n i mod p based on the input n i and inputs the value of the remainder to the selectors 330 and 340.
- the selectors 330 and 340 select an accumulator that connects circuits among the plurality of accumulators included in the accumulator group 350, based on the calculation result by the remainder calculator 320. At this time, the selectors 330 and 340 according to the present embodiment operate to connect the accumulator and the circuit respectively corresponding to the value of the remainder. For example, if the remainder is 0, selectors 330 and 340 operate to connect the circuit with accumulator y 0, and if the remainder is 1, selectors 330 and 340 operate to connect the circuit with accumulator y 1 .
- the accumulator group 350 includes a plurality of accumulators respectively corresponding to the residual values of n i mod p. That is, the accumulator group 350 according to the present embodiment holds yr for each value of the remainder.
- the adder-subtractor 360 performs addition / subtraction based on the input s i , the shift operation result, and the value of y r .
- the value of y r held by the accumulator selected based on the value of the remainder of n i mod p is input to the adder-subtractor 360. Further, y r of the selected accumulator is updated based on the calculation result by the adder-subtractor 360.
- the multiplier group 370 multiplies y r updated for each remainder by the process described above by an addition multiplier according to the remainder.
- the multiplier group 370 according to this embodiment includes a plurality of multipliers corresponding to the remainders of n i mod p. For example, the multiplier group 370 multiplies 1 by y 0 input from the accumulator group 350 and multiplies 2 ⁇ 1 / p by y 1 .
- the adder 380 adds the value of y r calculated by the multiplier group 370 for each remainder, and outputs the final calculation result y.
- the product-sum calculation circuit 300 has been described above. As described above, according to the product-sum operation circuit 300 according to the present embodiment, y r is accumulated in the accumulator corresponding to each residue of n i mod p and the final multiplication is performed to obtain the number of multiplications. It can be minimized. In the example shown in FIG. 16, although i is sequentially calculated and y r is updated, it is also possible to calculate part or all of the above calculation in parallel.
- Inner product operation can be expressed by the following equation (22). Also, here, y r is defined by the following equation (23).
- r ⁇ ⁇ 0, 1,. . . , P-1 ⁇ , y r can be represented by the usual fixed-point notation in which a negative number is represented by 2's complement.
- p may be a natural number, but p may be expressed by a power expression.
- p 2 q , q ⁇ ⁇ 0, 1, 2,. . . ⁇ , And int ((m i + n i ) / p) and (m i + n i )
- the calculation of mod p can be performed by cutting out bits and the division is not necessary, which has the effect of simplifying the calculation.
- the inner product operation can be realized, for example, by the product-sum operation circuit 400 shown in FIG. FIG. 17 is an example of the product-sum operation circuit in the case where both the weight vector and the input vector according to the present embodiment are quantized.
- the product-sum operation circuit 400 includes a first adder 410, a shift operator 420, a selector 430, an XOR circuit 440, an accumulator group 450, a multiplier group 460, and a second An adder 470 is provided.
- First adder 410 adds the m i and n i input thereto. At this time, the addition result of m i and n i is [b k ⁇ 1,. . . , b q , b q-1,. . . b 0 ] can be represented as a bit array.
- the shift computing unit 420 performs a right shift operation of 1 represented by fixed point by int ((m i + n i ) / p) based on the calculation result by the first adder 410.
- the value of int ((m i + n i ) / p) is [b k ⁇ 1,. . . , b q ] is the value of the upper bit corresponding to Therefore, the shift operator 420 may perform the shift operation using the value of the upper bit.
- the value of the above-mentioned remainder corresponds to the lower q bits of the bit array which is the operation result by the first adder 410 [b q ⁇ 1,. . . , b 0 ], so that the operation can be simplified in the same manner as described above.
- the accumulator group 450 includes a plurality of accumulators respectively corresponding to the value of the remainder of (m i + n i ) mod p.
- the accumulator group 450 is configured to include a plurality of adders / subtractors (1-bit up / down counters) corresponding to the accumulator.
- each of the above adders / subtractors determines the necessity of addition / subtraction based on the Enable signal input from the selector 430, as shown in the lower right of the figure. Specifically, each adder adds only one bit to the value O held by the corresponding accumulator according to the value of U / D input from XOR circuit 440 only when the input Enable signal is 1. Perform addition or subtraction. According to the accumulator group 450 according to the present embodiment, because it can update the value of y r a 1-bit adder for the upper bit, normal subtracter it is possible to reduce the circuit scale becomes unnecessary.
- the multiplier group 460 multiplies y r updated for each remainder by the process described above by a value according to the remainder.
- the multiplier group 460 according to the present embodiment includes a plurality of multipliers corresponding to the remainder of (m i + n i ) mod p. For example, the multiplier group 460 multiplies 1 by y 0 input from the accumulator group 450 and multiplies 2 ⁇ 1 / p by y 1 .
- the second adder 470 adds the value of y r calculated by the multiplier group 460 for each remainder, and outputs the final calculation result y.
- the product-sum calculation circuit 400 according to the present embodiment has been described above. As described above, according to the product-sum operation circuit 400 according to the present embodiment, y r is accumulated in the accumulator corresponding to each remainder of (m i + n i ) mod p, and multiplication is performed collectively at the end. The number of multiplications can be minimized. In the example shown in FIG. 17, although i is sequentially calculated to update y r , it is also possible to calculate part or all of the above calculation in parallel.
- the sum operation circuit 400 may include a selector and a single adder / subtractor as in the product-sum operation circuit 300 shown in FIG. 16 instead of the above configuration.
- the configuration of the product-sum operation circuit according to the present embodiment can be appropriately designed such that the circuit size is smaller according to the value of p.
- Equation (22) above can be modified as Equation (24) below.
- the inner product operation can also be realized by a single adder-subtractor as in the product-sum operation circuit 500 shown in FIG.
- FIG. 18 is an example of the product-sum operation circuit in the case where both the weight vector and the input vector according to the present embodiment are quantized.
- the product-sum operation circuit 500 includes an adder 510, a selector 520, a memory circuit group 530, a shift operator 540, an XOR circuit 550, an adder / subtractor 560, and an accumulator 570.
- Adder 510 adds the m i and n i input thereto.
- the adder 510 may perform the same operation as the first adder 410 shown in FIG.
- the selector 520 corresponds to the lower q bits [b q ⁇ 1,. . . , b 0 ], the storage circuit to which the circuit is connected is selected among the plurality of storage circuits included in the storage circuit group 530.
- the memory circuit group 530 includes a plurality of memory circuits respectively corresponding to the value of the remainder of (m i + n i ) mod p. Each storage circuit stores an addition multiplier corresponding to each remainder. Note that each storage circuit included in the storage circuit group 530 may be a read only circuit that holds the addition multiplier as a constant, or may be a rewritable register. When the addition multiplier is stored as a constant in the read only circuit, there is an advantage that the circuit configuration is simplified and the power consumption can be reduced.
- the shift computing unit 540 stores the addition multipliers stored in the connected storage circuit [b k ⁇ 1 ,. . . , b q ] and right shift operation is performed by the value of the upper bit corresponding to.
- the XOR circuit 550 outputs 1 or 0 based on the input S xi and S wi .
- the XOR circuit 550 may perform the same operation as the XOR circuit 440 shown in FIG.
- the adder-subtractor 560 repeatedly performs addition or subtraction based on the calculation result by the shift calculator 540 and the input from the XOR circuit 550 with respect to y held in the accumulator 570.
- the accumulator 570 holds the result y of the inner product operation.
- the inner product operation can be realized by the single adder-subtractor 560 and the single accumulator 570, and the circuit scale can be further reduced. Is possible.
- the input vector x and the weight vector w can be expressed by the following equations (25) and (26), respectively.
- L the number of consecutive 0 bits from msb (most significant bit: most significant bit) of c.
- c [c k-1,. . . , c 0 ] is left shifted by L bits, and d is the bit arrangement, and the bit arrangement is treated as fixed point with msb as 0.5.
- r min the minimum r that satisfies the following equation (29).
- x i it is approximated as equation (31) below, that is, can be quantized.
- the obtained coefficient is quantized to the nearest quantization point with each value of p, and inference is performed without relearning.
- An experiment was conducted to compare the image recognition rate.
- FIG. 19 is a diagram showing a network structure of ResNet used in a comparative experiment according to the present embodiment.
- the input size input to each layer is shown on the right in the figure, and the kernel size is shown on the left in the figure.
- the created network includes both ResBlock not including the Max Pooling layer and ResBlock including the Max Pooling layer.
- FIGS. 20 and 21 are diagrams showing network configurations of ResBlock not including the Max Pooling layer and ResBlock including the Max Pooling layer, respectively.
- FIG. 22 The comparison result of the image recognition rate at the time of performing inference without relearning by the quantization demonstrated above is shown in FIG.
- the vertical axis indicates the recognition accuracy
- the horizontal axis indicates the quantization number (N value) of the input vector x.
- the recognition accuracy before quantization is line segment C
- the quantization method according to the present embodiment it is possible to effectively reduce the processing load in the inner product calculation and to maintain high performance of the learning device.
- the quantization method according to the present embodiment may be applied to a convolution operation in a band pass filter used in the communication technology field.
- simulation results when the quantization method according to the present embodiment is applied to a band pass filter will be described.
- FIG. 23 is a diagram showing simulation results concerning frequency characteristics (gain characteristics) when the quantization method according to the present embodiment is applied to a band pass filter.
- the coefficient (63 tap, rolloff 0.5) in the RRC (Root-Raised Cosine) filter is quantized.
- FIG. 24 is a diagram showing simulation results concerning phase characteristics when the quantization method according to the present embodiment is applied to a band pass filter. Referring to FIG. 24, it can be seen that, even when the quantization method according to the present embodiment is applied, the rotation of the phase in the passband, that is, the deterioration of the phase characteristic is not confirmed. As described above, since the quantization method according to the present embodiment does not significantly deteriorate the frequency characteristics of the band pass filter, it can be said that it can be sufficiently applied also in the communication technology field.
- FIG. 25 is a block diagram used for BER evaluation according to the present embodiment.
- FIG. 26 is a diagram showing the result of BER evaluation when BPSK is used for the modulation scheme.
- FIG. 28 is a diagram showing the result of BER evaluation when QPSK is used for the modulation scheme.
- FIG. 30 is a diagram showing the result of BER evaluation when 16 QAM is used for the modulation scheme.
- the quantization method according to the present embodiment is effective regardless of the value of p.
- the BER is not affected if p ⁇ 4.
- FIG. 32 is a block diagram illustrating an exemplary hardware configuration of the information processing apparatus 10 according to an embodiment of the present disclosure.
- the information processing apparatus 10 includes, for example, a CPU 871, a ROM 872, a RAM 873, a host bus 874, a bridge 875, an external bus 876, an interface 877, an input device 878, and an output device 879.
- Storage 880, drive 881, connection port 882, and communication device 883 Note that the hardware configuration shown here is an example, and some of the components may be omitted. In addition, components other than the components shown here may be further included.
- the CPU 871 functions as, for example, an arithmetic processing unit or a control unit, and controls the overall operation or a part of each component based on various programs recorded in the ROM 872, the RAM 873, the storage 880, or the removable recording medium 901.
- the ROM 872 is a means for storing a program read by the CPU 871, data used for an operation, and the like.
- the RAM 873 temporarily or permanently stores, for example, a program read by the CPU 871 and various parameters appropriately changed when the program is executed.
- the CPU 871, the ROM 872, and the RAM 873 are mutually connected via, for example, a host bus 874 capable of high-speed data transmission.
- host bus 874 is connected to external bus 876, which has a relatively low data transmission speed, via bridge 875, for example.
- the external bus 876 is connected to various components via an interface 877.
- Input device 8708 For the input device 878, for example, a mouse, a keyboard, a touch panel, a button, a switch, a lever, and the like are used. Furthermore, as the input device 878, a remote controller (hereinafter, remote control) capable of transmitting a control signal using infrared rays or other radio waves may be used.
- the input device 878 also includes a voice input device such as a microphone.
- the output device 879 is a display device such as a CRT (Cathode Ray Tube), an LCD, or an organic EL, a speaker, an audio output device such as a headphone, a printer, a mobile phone, or a facsimile. It is a device that can be notified visually or aurally. Also, the output device 879 according to the present disclosure includes various vibration devices capable of outputting haptic stimulation.
- the storage 880 is a device for storing various data.
- a magnetic storage device such as a hard disk drive (HDD), a semiconductor storage device, an optical storage device, a magneto-optical storage device, or the like is used.
- the drive 881 is a device that reads information recorded on a removable recording medium 901 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory, or writes information on the removable recording medium 901, for example.
- a removable recording medium 901 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory
- the removable recording medium 901 is, for example, DVD media, Blu-ray (registered trademark) media, HD DVD media, various semiconductor storage media, and the like.
- the removable recording medium 901 may be, for example, an IC card equipped with a non-contact IC chip, an electronic device, or the like.
- connection port 882 is, for example, a port for connecting an externally connected device 902 such as a USB (Universal Serial Bus) port, an IEEE 1394 port, a SCSI (Small Computer System Interface), an RS-232C port, or an optical audio terminal. is there.
- an externally connected device 902 such as a USB (Universal Serial Bus) port, an IEEE 1394 port, a SCSI (Small Computer System Interface), an RS-232C port, or an optical audio terminal. is there.
- the external connection device 902 is, for example, a printer, a portable music player, a digital camera, a digital video camera, an IC recorder, or the like.
- the communication device 883 is a communication device for connecting to a network.
- a communication card for wired or wireless LAN Bluetooth (registered trademark) or WUSB (Wireless USB), a router for optical communication, ADSL (Asymmetric Digital) (Subscriber Line) router, or modem for various communications.
- Bluetooth registered trademark
- WUSB Wireless USB
- ADSL Asymmetric Digital
- Subscriber Line Subscriber Line
- the information processing apparatus is based on a plurality of input values and a plurality of weighting coefficients quantized by a power expression respectively corresponding to an external input value, And a product-sum operation circuit that executes a product-sum operation.
- the quantized weighting factor index is represented by a fraction having a predetermined divisor p in the denominator.
- the product-sum operation circuit performs product-sum operation using different addition multipliers based on the remainder determined from the divisor p. According to such a configuration, it is possible to further reduce the processing load associated with the inner product calculation and to guarantee the quantization particle size of the weighting factor.
- a product-sum operation circuit that executes a product-sum operation based on a plurality of input values and a plurality of weighting coefficients quantized by a power expression respectively corresponding to the plurality of input values; Equipped with The quantized exponent of the weighting factor is represented by a fraction having a predetermined divisor as a denominator, The product-sum operation circuit performs product-sum operation using different addition multipliers based on the remainder determined from the divisor. Information processing device.
- the product-sum operation circuit includes different accumulators that hold operation results for each of the remainders determined from the divisor.
- the remainder is obtained by a remainder operation using a numerator related to the exponent of the quantized weighting coefficient as a dividend.
- the product-sum operation circuit further includes a selector connecting a circuit to the accumulator corresponding to the remainder.
- the product-sum operation circuit further includes a shift operator that performs a shift operation relating to the input value based on an integer value of a quotient obtained by dividing the numerator by the divisor.
- the input value is quantized by a power expression
- the quantized input value is represented by a fraction having a predetermined divisor as a denominator
- the remainder is a remainder when a value obtained by adding a numerator according to the exponent of the quantized weighting coefficient and a numerator according to the exponent of the quantized input value is a dividend.
- the product-sum operation circuit includes a plurality of adders / subtractors for each accumulator corresponding to the remainder.
- the product-sum operation circuit further includes a selector for inputting a signal instructing the adder / subtractor corresponding to the remainder to execute the operation based on the remainder.
- the information processing apparatus according to (6).
- the product-sum operation circuit further includes a plurality of storage circuits each holding the addition multiplier corresponding to the remainder.
- the product-sum operation circuit further includes a selector connecting the storage circuit corresponding to the remainder based on the remainder.
- the divisor includes a first divisor determined for the input value and a second divisor determined for the weighting factor. The first divisor and the second divisor have mutually different values, The information processing apparatus according to any one of (5) to (7). (11)
- the divisor is a natural number, The information processing apparatus according to any one of (1) to (10).
- the divisor is represented by a power
- the information processing apparatus according to any one of (1) to (10).
- (13) Performing a product-sum operation based on a plurality of input values and a plurality of weighting coefficients quantized by a power expression respectively corresponding to the input values; Including The quantized exponent of the weighting factor is represented by a fraction having a predetermined divisor as a denominator, The performing the product-sum operation performs a product-sum operation using different addition multipliers based on the remainder determined from the divisor.
- Information processing method is described in accordance with a power, the information processing apparatus according to any one of (1) to (10).
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Optimization (AREA)
- Computational Mathematics (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Neurology (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Nonlinear Science (AREA)
- Complex Calculations (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
Description
1.第1の実施形態
1.1.背景
1.2.情報処理装置10の機能構成例
1.3.重みベクトルの量子化
1.4.積和演算回路の構成例
1.5.学習時における量子化
1.6.効果
2.第2の実施形態
2.1.概要
2.2.重みベクトルの量子化
2.3.重みベクトルおよび入力ベクトル双方の量子化
2.4.効果
2.5.通信技術への応用例
3.ハードウェア構成例
4.まとめ
<<1.1.背景>>
近年、深層学習(Deep Learning)などニューラルネットワークを用いた学習手法が広く研究されている。ニューラルネットワークを用いた学習手法は高い精度を有する一方、演算に係る処理負担が大きいことから、当該処理負担を効果的に軽減する演算方式が求められている。
続いて、本実施形態に係る情報処理方法を実現する情報処理装置10の機能構成例について述べる。図7は、本実施形態に係る情報処理装置10の機能ブロック図の一例である。図7を参照すると、本実施形態に係る情報処理装置10は、入力部110、演算部120、記憶部130、および出力部140を備える。以下、上記構成について、当該構成が有する機能を中心に説明する。
本実施形態に係る入力部110は、操作者による種々の入力操作を検出する機能を有する。このために、本実施形態に係る入力部110は、操作者による入力操作を検出するための各種の装置を含んでよい。入力部110は、例えば、各種のボタン、キーボード、タッチパネル、マウス、スイッチなどにより実現され得る。
本実施形態に係る演算部120は、複数の入力値と当該入力値にそれぞれ対応する複数の重み係数とに基づく内積演算を行い出力値を算出する機能を有する。本実施形態に係る演算部120は、特に、ニューラルネットワークの順伝播に係る内積演算を行う。この際、本実施形態に係る演算部120は、N次元超球表面におけるベクトル方向の粒度に基づいて量子化された重み係数に基づいて出力値を算出することを特徴の一つとする。より具体的には、本実施形態に係る演算部120は、べき乗により表現された重み係数に基づいて出力値を算出してよい。本実施形態に内積演算の特徴については、別途詳細に説明する。
記憶部130は、情報処理装置10が備える各構成で用いられるプログラムやデータなどを記憶する機能を有する。本実施形態に係る記憶部130は、例えば、ニューラルネットワークに用いられる種々のパラメータなどを記憶する。
出力部140は、操作者に対し種々の情報出力を行う機能を有する。このために、本実施形態に係る出力部140は、視覚情報を出力するディスプレイ装置を含んで構成され得る。ここで、上記のディスプレイ装置は、例えば、CRT(Cathode Ray Tube)ディスプレイ装置、液晶ディスプレイ(LCD:Liquid Crystal Display)装置、OLED(Organic Light Emitting Diode)装置などにより実現され得る。
次に、本実施形態に係る重みベクトルの量子化について詳細に説明する。上述したように、本実施形態に係る情報処理装置10は、べき乗により表現された重みベクトルwによる量子化を行うことで、粒度の均一性を高く保つことができる。この際、本実施形態に係る演算部120は、複数の重みベクトル成分wiを値の小さい順番に並び替え、また当該複数の重みベクトル成分wiを最も値の大きい重み係数wiで正規化することを特徴の一つとする。ここで、並び替えて正規化された重みベクトルをwjとすると、重みベクトルwjは、下記の数式(2)~(4)により表される。
次に、本実施形態に係る演算方式を実現する積和演算回路について説明する。上述のように、重みベクトルwをべき乗表現により量子化し、また並び替えを行った場合、重みベクトルwに対応する入力ベクトルxの並び替えも併せて行う必要がある。
本実施形態に係るアドレステーブルWTは、値の小さい順番に並び替えられた複数の重みベクトルwと対応する入力ベクトルxのアドレス情報、符号情報、および乗算指示情報を保持する。なお、上記のアドレス情報は、図8に示すように、Null Pointerを含んでもよい。この場合、アキュムレータ230には0が加算されることとなり、アキュムレータ230の値を単純にα倍することが可能となる。また、上記の符号情報は、上述した数式(5)におけるSjに対応する値を示す情報である。
本実施形態に係るRAM210は、アドレステーブルWTから入力されたアドレス情報に基づいて重みベクトル成分wjに対応する入力ベクトル成分xjを加算回路220に出力する。
本実施形態に係る加算回路220は、RAM210から入力される入力ベクトル成分xjと第1の乗算回路240から出力される値に基づいて加算を実行する。この際、本実施形態に係る加算回路220は、アドレステーブルWTが保持する符号情報に基づいて上記の加算を行う。
本実施形態に係るアキュムレータ230は、加算回路220から出力される演算結果を累積する。アキュムレータ230は、累積した値を第1の乗算回路240および第2の乗算回路250に出力する。また、アキュムレータ230には、累積した値を0にリセットするためのリセット信号が適宜入力される。
本実施形態に係る第1の乗算回路240は、アキュムレータ230が累積する値に対し、αを乗算する。この際、第1の乗算回路240は、上述したように、アドレステーブルWTが保持する乗算指示情報に基づいて上記の乗算を実行する。第1の乗算回路240は、演算結果を加算回路220に出力する。
本実施形態に係る第2の乗算回路250は、アキュムレータ230から出力される値に対し、正規化定数Kを乗算する。
次に、本実施形態に係る学習時における重みベクトルwの量子化について説明する。本実施形態に係る情報処理方法において、学習時における重みベクトル成分wiの更新については、下記の数式(6)により計算することが可能である。
次に、本実施形態に係る重みベクトルの量子化により奏される効果について詳細に説明する。上述したように、本実施形態に係る情報処理方法では、べき乗表現により量子化したwiを値の小さい順番に並び替え正規化を行うことでwjと定義する。この際、並び替えられた基底ベクトルをqjとすると、重みベクトルwは、下記の数式(7)により表される。
<<2.1.概要>>
次に、本開示の第2の実施形態について説明する。上記の第1の実施形態では、重みベクトル成分wjをαnにより表現することで、高い近似精度を実現するとともに、内積演算における乗算回数をlogのオーダーで軽減する手法について述べた。
まず、重みベクトルwのみをα-n/pにより量子化する場合の手法について述べる。ここで、α=2、p∈{1,2,3,...}の自然数とし、下記の数式(15)により内積演算を行う場合を説明する。なお、pは本開示における除数に対応する。また、下記の数式(15)におけるwiは、下記数式(16)により表すものとする。また、下記の数式(16)におけるsiおよびniはそれぞれ、si∈{-1,1}、ni∈{0,1,2,...}、とする。
本実施形態に係るシフト演算器310は、入力される入力ベクトル成分xi、およびniに基づくシフト演算を行う。具体的には、シフト演算器310は、入力ベクトル成分xiを、int(ni/p)の値ぶん右にビットシフトする。
本実施形態に係る剰余演算器320は、入力されるniに基づいて、ni mod pの演算を行い、剰余の値をセレクタ330および340に入力する。
本実施形態に係るセレクタ330および340は、剰余演算器320による演算結果に基づいて、アキュムレータ群350が含む複数のアキュムレータのうち、回路を繋ぐアキュムレータを選択する。この際、本実施形態に係るセレクタ330および340は、剰余の値にそれぞれ対応したアキュムレータと回路が接続されるよう動作する。例えば、剰余が0である場合、セレクタ330および340は、アキュムレータy0と回路が繋がるよう動作し、剰余が1である場合、セレクタ330および340は、アキュムレータy1と回路が繋がるよう動作する。
本実施形態に係るアキュムレータ群350は、ni mod pの剰余の値にそれぞれ対応した複数のアキュムレータを備える。すなわち、本実施形態に係るアキュムレータ群350は、剰余の値ごとにyrを保持する。
本実施形態に係る加減算器360は、入力されるsi、シフト演算結果、およびyrの値に基づく加減算を行う。この際、加減算器360には、上述のように、ni mod pの剰余の値に基づいて選択されたアキュムレータが保持するyrの値が入力される。また、加減算器360による演算結果に基づいて、上記選択されたアキュムレータのyrが更新される。
本実施形態に係る乗算器群370は、上述した処理により剰余ごとに更新されたyrに対し、当該剰余に応じた加算乗数を乗算する。このために、本実施形態に係る乗算器群370は、ni mod pの剰余ごとに対応した複数の乗算器を含む。例えば、乗算器群370は、アキュムレータ群350から入力されるy0に対し1を乗算し、y1に対しては2-1/pを乗算する。
本実施形態に係る加算器380は、乗算器群370が剰余ごとに計算したyrの値を加算し、最終的な演算結果yを出力する。
次に、重みベクトルwと入力ベクトルxの双方をα-n/pにより量子化する場合の手法について述べる。ここで、α=2、p∈{1,2,3,...}の自然数とし、内積演算を行う場合を説明する。ここで、入力ベクトル成分xiおよび重みベクトル成分wiは、それぞれ下記の数式(20)および(21)で表すものとする。また、下記の数式(20)および(21)において、sxi,swi∈{-1,1}、ni,mi∈{0,1,2,...}、とする。
本実施形態に係る第1の加算器410は、入力されるmiとniとを加算する。この際、miおよびniの加算結果は、図示するように、[bk-1,...,bq,bq-1,...b0]のビット配列として表すことができる。
本実施形態に係るシフト演算器420は、第1の加算器410による演算結果に基づいて、固定小数点で表現された1をint((mi+ni)/p)だけ右シフト演算する。この際、int((mi+ni)/p)の値は、第1の加算器410による演算結果である上記のビット配列のうち、[bk-1,...,bq]に該当する上位ビットの値となる。このため、シフト演算器420は、当該上位ビットの値を用いてシフト演算を行ってよい。
本実施形態に係るセレクタ430は、(mi+ni) mod pの剰余の値に基づいて、アキュムレータ群450が含む複数のアキュムレータおよび加減算器のうち、加減算を実行させる加減算器を選択し、Enable信号=1を入力する。この際、上記の剰余の値は、第1の加算器410による演算結果であるビット配列のうち、下位qビットに対応する[bq-1,...,b0]に該当するため、上記と同様に演算を簡略化することが可能である。
本実施形態に係るXOR回路440は、入力されるSxiおよびSwiに基づいて、1または0をアキュムレータ群450の各アキュムレータに入力する。具体的には、XOR回路440は、SwiSxi=-1となる場合に1を、SwiSxi=+1となる場合に0を各アキュムレータに入力する。
本実施形態に係るアキュムレータ群450は、(mi+ni) mod pの剰余の値にそれぞれ対応した複数のアキュムレータを備える。また、アキュムレータ群450は、当該アキュムレータと対応する複数の加減算器(1ビットのアップダウンカウンタ)を含んで構成される。
本実施形態に係る乗算器群460は、上述した処理により剰余ごとに更新されたyrに対し、当該剰余に応じた値を乗算する。このために、本実施形態に係る乗算器群460は、(mi+ni) mod pの剰余ごとに対応した複数の乗算器を含む。例えば、乗算器群460は、アキュムレータ群450から入力されるy0に対し1を乗算し、y1に対しては2-1/pを乗算する。
本実施形態に係る第2の加算器470は、乗算器群460が剰余ごとに計算したyrの値を加算し、最終的な演算結果yを出力する。
本実施形態に係る加算器510は、入力されるmiとniとを加算する。加算器510は、図17に示した第1の加算器410と同様の動作を行ってよい。
本実施形態に係るセレクタ520は、下位qビットに対応する[bq-1,...,b0]の値に基づいて、記憶回路群530が含む複数の記憶回路のうち回路を接続する記憶回路を選択する。
本実施形態に係る記憶回路群530は、(mi+ni) mod pの剰余の値にそれぞれ対応した複数の記憶回路を備える。各記憶回路には、剰余ごとに対応した加算乗数がそれぞれ格納される。なお、記憶回路群530が備える各記憶回路は、上記加算乗数を定数として保持する読み取り専用回路であってもよいし、書き換え可能なレジスタであってもよい。加算乗数を定数として読み取り専用回路に記憶させる場合、回路構成が簡略化されるとともに消費電力を低減できるメリットがある。
本実施形態に係るシフト演算器540は、接続された記憶回路が格納する加算乗数を[bk-1,...,bq]に該当する上位ビットの値だけ右シフト演算を行う。
本実施形態に係るXOR回路550は、入力されるSxiおよびSwiに基づいて、1または0を出力する。XOR回路550は、図17に示したXOR回路440と同様の動作を行ってよい。
本実施形態に係る加減算器560は、アキュムレータ570に保持されるyに対し、シフト演算器540による演算結果およびXOR回路550からの入力に基づく加算または減算を繰り返し実行する。
本実施形態に係るアキュムレータ570は、内積演算の結果yを保持する。
次に、本実施形態に係る重みベクトルwおよび入力ベクトルxの量子化により奏される効果について詳細に説明する。ここでは、重みベクトル成分wiおよび入力ベクトル成分xiを±2-n/pとして量子化を行う場合において、p=1、すなわち第1の実施形態で説明した量子化手法を用いた場合と、p=2、すなわち本実施形態の量子化手法を用いた場合の認識率を比較した。
次に、本実施形態に係る量子化手法の他分野への応用について説明する。上記の説明では、本実施形態に係る量子化手法をニューラルネットワークの順伝播に係る内積演算に適用する場合について述べた。一方、本実施形態に係る量子化手法は、上記の例に限定されず、内積演算を行う種々の技術に応用することが可能である。
<3.ハードウェア構成例>
次に、本開示の一実施形態に係る情報処理装置10のハードウェア構成例について説明する。図32は、本開示の一実施形態に係る情報処理装置10のハードウェア構成例を示すブロック図である。図32を参照すると、情報処理装置10は、例えば、CPU871と、ROM872と、RAM873と、ホストバス874と、ブリッジ875と、外部バス876と、インターフェース877と、入力装置878と、出力装置879と、ストレージ880と、ドライブ881と、接続ポート882と、通信装置883と、を有する。なお、ここで示すハードウェア構成は一例であり、構成要素の一部が省略されてもよい。また、ここで示される構成要素以外の構成要素をさらに含んでもよい。
CPU871は、例えば、演算処理装置又は制御装置として機能し、ROM872、RAM873、ストレージ880、又はリムーバブル記録媒体901に記録された各種プログラムに基づいて各構成要素の動作全般又はその一部を制御する。
ROM872は、CPU871に読み込まれるプログラムや演算に用いるデータ等を格納する手段である。RAM873には、例えば、CPU871に読み込まれるプログラムや、そのプログラムを実行する際に適宜変化する各種パラメータ等が一時的又は永続的に格納される。
CPU871、ROM872、RAM873は、例えば、高速なデータ伝送が可能なホストバス874を介して相互に接続される。一方、ホストバス874は、例えば、ブリッジ875を介して比較的データ伝送速度が低速な外部バス876に接続される。また、外部バス876は、インターフェース877を介して種々の構成要素と接続される。
入力装置878には、例えば、マウス、キーボード、タッチパネル、ボタン、スイッチ、及びレバー等が用いられる。さらに、入力装置878としては、赤外線やその他の電波を利用して制御信号を送信することが可能なリモートコントローラ(以下、リモコン)が用いられることもある。また、入力装置878には、マイクロフォンなどの音声入力装置が含まれる。
出力装置879は、例えば、CRT(Cathode Ray Tube)、LCD、又は有機EL等のディスプレイ装置、スピーカ、ヘッドホン等のオーディオ出力装置、プリンタ、携帯電話、又はファクシミリ等、取得した情報を利用者に対して視覚的又は聴覚的に通知することが可能な装置である。また、本開示に係る出力装置879は、触覚刺激を出力することが可能な種々の振動デバイスを含む。
ストレージ880は、各種のデータを格納するための装置である。ストレージ880としては、例えば、ハードディスクドライブ(HDD)等の磁気記憶デバイス、半導体記憶デバイス、光記憶デバイス、又は光磁気記憶デバイス等が用いられる。
ドライブ881は、例えば、磁気ディスク、光ディスク、光磁気ディスク、又は半導体メモリ等のリムーバブル記録媒体901に記録された情報を読み出し、又はリムーバブル記録媒体901に情報を書き込む装置である。
リムーバブル記録媒体901は、例えば、DVDメディア、Blu-ray(登録商標)メディア、HD DVDメディア、各種の半導体記憶メディア等である。もちろん、リムーバブル記録媒体901は、例えば、非接触型ICチップを搭載したICカード、又は電子機器等であってもよい。
接続ポート882は、例えば、USB(Universal Serial Bus)ポート、IEEE1394ポート、SCSI(Small Computer System Interface)、RS-232Cポート、又は光オーディオ端子等のような外部接続機器902を接続するためのポートである。
外部接続機器902は、例えば、プリンタ、携帯音楽プレーヤ、デジタルカメラ、デジタルビデオカメラ、又はICレコーダ等である。
通信装置883は、ネットワークに接続するための通信デバイスであり、例えば、有線又は無線LAN、Bluetooth(登録商標)、又はWUSB(Wireless USB)用の通信カード、光通信用のルータ、ADSL(Asymmetric Digital Subscriber Line)用のルータ、又は各種通信用のモデム等である。
以上説明したように、本開示の一実施形態に係る情報処理装置は、複数の入力値と、鴎外入力値にそれぞれ対応する、べき乗表現により量子化された複数の重み係数と、に基づいて、積和演算を実行する積和演算回路、を備える。量子化された重み係数の指数は、予め定められた除数pを分母に有する分数により表現される。また、積和演算回路は、除数pから定まる剰余に基づいて、異なる加算乗数を用いた積和演算を行う。係る構成によれば、内積演算に係る処理負担をより軽減すると共に、重み係数の量子化粒度を保証することが可能となる。
(1)
複数の入力値と、前記入力値にそれぞれ対応する、べき乗表現により量子化された複数の重み係数と、に基づいて、積和演算を実行する積和演算回路、
を備え、
量子化された前記重み係数の指数は、予め定められた除数を分母に有する分数により表現され、
前記積和演算回路は、前記除数から定まる剰余に基づいて、異なる加算乗数を用いた積和演算を行う、
情報処理装置。
(2)
前記積和演算回路は、前記除数から定まる前記剰余ごとに演算結果を保持する異なる複数のアキュムレータ、を備える、
前記(1)に記載の情報処理装置。
(3)
前記剰余は、量子化された前記重み係数の指数に係る分子を被除数とした剰余演算により求められ、
前記積和演算回路は、前記剰余に対応する前記アキュムレータと回路を接続するセレクタ、をさらに備える、
前記(2)に記載の情報処理装置。
(4)
前記積和演算回路は、前記分子を前記除数で除算した際の商を整数化した値に基づいて、前記入力値に係るシフト演算を行うシフト演算器、をさらに備える、
前記(3)に記載の情報処理装置。
(5)
前記入力値は、べき乗表現により量子化され、
量子化された前記入力値は、予め定められた除数を分母に有する分数により表現され、
前記剰余は、量子化された前記重み係数の指数に係る分子と、量子化された前記入力値の指数に係る分子と、を加算した値を被除数とした際の剰余である、
前記(1)または(2)に記載の情報処理装置。
(6)
前記積和演算回路は、前記剰余に対応するアキュムレータごとに複数の加減算器、を備える、
前記(5)に記載の情報処理装置。
(7)
前記積和演算回路は、前記剰余に基づいて、前記剰余に対応する前記加減算器に対し演算の実行を指示する信号を入力するセレクタ、をさらに備える、
前記(6)に記載の情報処理装置。
(8)
前記積和演算回路は、前記剰余に対応する前記加算乗数をそれぞれ保持する複数の記憶回路、をさらに備える、
前記(1)に記載の情報処理装置。
(9)
前記積和演算回路は、前記剰余に基づいて、前記剰余に対応する前記記憶回路を接続するセレクタ、をさらに備える、
前記(8)に記載の情報処理装置。
(10)
前記除数は、前記入力値に対して定められる第1の除数と、前記重み係数に対して定められる第2の除数と、を含み、
前記第1の除数と前記第2の除数とは、互いに異なる値である、
前記(5)~7のいずれかに記載の情報処理装置。
(11)
前記除数は、自然数である、
前記(1)~10のいずれかに記載の情報処理装置。
(12)
前記除数は、べき乗で表現される、
前記(1)~10のいずれかに記載の情報処理装置。
(13)
複数の入力値と、前記入力値にそれぞれ対応する、べき乗表現により量子化された複数の重み係数と、に基づいて、積和演算を実行すること、
を含み、
量子化された前記重み係数の指数は、予め定められた除数を分母に有する分数により表現され、
前記積和演算を行うことは、前記除数から定まる剰余に基づいて、異なる加算乗数を用いた積和演算を行う、
情報処理方法。
110入力部
120演算部
130記憶部
140出力部
200、300、400、500 積和演算回路
Claims (13)
- 複数の入力値と、前記入力値にそれぞれ対応する、べき乗表現により量子化された複数の重み係数と、に基づいて、積和演算を実行する積和演算回路、
を備え、
量子化された前記重み係数の指数は、予め定められた除数を分母に有する分数により表現され、
前記積和演算回路は、前記除数から定まる剰余に基づいて、異なる加算乗数を用いた積和演算を行う、
情報処理装置。 - 前記積和演算回路は、前記除数から定まる前記剰余ごとに演算結果を保持する異なる複数のアキュムレータ、を備える、
請求項1に記載の情報処理装置。 - 前記剰余は、量子化された前記重み係数の指数に係る分子を被除数とした剰余演算により求められ、
前記積和演算回路は、前記剰余に対応する前記アキュムレータと回路を接続するセレクタ、をさらに備える、
請求項2に記載の情報処理装置。 - 前記積和演算回路は、前記分子を前記除数で除算した際の商を整数化した値に基づいて、前記入力値に係るシフト演算を行うシフト演算器、をさらに備える、
請求項3に記載の情報処理装置。 - 前記入力値は、べき乗表現により量子化され、
量子化された前記入力値は、予め定められた除数を分母に有する分数により表現され、
前記剰余は、量子化された前記重み係数の指数に係る分子と、量子化された前記入力値の指数に係る分子と、を加算した値を被除数とした際の剰余である、
請求項1に記載の情報処理装置。 - 前記積和演算回路は、前記剰余に対応するアキュムレータごとに複数の加減算器、を備える、
請求項5に記載の情報処理装置。 - 前記積和演算回路は、前記剰余に基づいて、前記剰余に対応する前記加減算器に対し演算の実行を指示する信号を入力するセレクタ、をさらに備える、
請求項6に記載の情報処理装置。 - 前記積和演算回路は、前記剰余に対応する前記加算乗数をそれぞれ保持する複数の記憶回路、をさらに備える、
請求項1に記載の情報処理装置。 - 前記積和演算回路は、前記剰余に基づいて、前記剰余に対応する前記記憶回路を接続するセレクタ、をさらに備える、
請求項8に記載の情報処理装置。 - 前記除数は、前記入力値に対して定められる第1の除数と、前記重み係数に対して定められる第2の除数と、を含み、
前記第1の除数と前記第2の除数とは、互いに異なる値である、
請求項5に記載の情報処理装置。 - 前記除数は、自然数である、
請求項1に記載の情報処理装置。 - 前記除数は、べき乗で表現される、
請求項1に記載の情報処理装置。 - プロセッサが、複数の入力値と、前記入力値にそれぞれ対応する、べき乗表現により量子化された複数の重み係数と、に基づいて、積和演算を実行すること、
を含み、
量子化された前記重み係数の指数は、予め定められた除数を分母に有する分数により表現され、
前記積和演算を行うことは、前記除数から定まる剰余に基づいて、異なる加算乗数を用いた積和演算を行う、
情報処理方法。
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP18860078.7A EP3543873B1 (en) | 2017-09-29 | 2018-06-29 | Information processing device and information processing method |
CA3044660A CA3044660C (en) | 2017-09-29 | 2018-06-29 | Information processing device and information processing method |
CN201880004505.1A CN110036384B (zh) | 2017-09-29 | 2018-06-29 | 信息处理设备和信息处理方法 |
JP2019504151A JP6504331B1 (ja) | 2017-09-29 | 2018-06-29 | 情報処理装置、および情報処理方法 |
US16/463,193 US11086969B2 (en) | 2017-09-29 | 2018-06-29 | Information processing device and information processing method |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2017189889 | 2017-09-29 | ||
JP2017-189889 | 2017-09-29 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019064774A1 true WO2019064774A1 (ja) | 2019-04-04 |
Family
ID=65903401
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2018/024923 WO2019064774A1 (ja) | 2017-09-29 | 2018-06-29 | 情報処理装置、および情報処理方法 |
Country Status (6)
Country | Link |
---|---|
US (1) | US11086969B2 (ja) |
EP (1) | EP3543873B1 (ja) |
JP (2) | JP6504331B1 (ja) |
CN (1) | CN110036384B (ja) |
CA (1) | CA3044660C (ja) |
WO (1) | WO2019064774A1 (ja) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019168088A1 (ja) * | 2018-03-02 | 2019-09-06 | 日本電気株式会社 | 推論装置、畳み込み演算実行方法及びプログラム |
WO2021039164A1 (ja) * | 2019-08-26 | 2021-03-04 | ソニー株式会社 | 情報処理装置、情報処理システム及び情報処理方法 |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210056446A1 (en) * | 2019-08-23 | 2021-02-25 | Nvidia Corporation | Inference accelerator using logarithmic-based arithmetic |
US12033060B2 (en) | 2019-08-23 | 2024-07-09 | Nvidia Corporation | Asynchronous accumulator using logarithmic-based arithmetic |
US11886980B2 (en) | 2019-08-23 | 2024-01-30 | Nvidia Corporation | Neural network accelerator using logarithmic-based arithmetic |
JP7354736B2 (ja) * | 2019-09-30 | 2023-10-03 | 富士通株式会社 | 情報処理装置、情報処理方法、情報処理プログラム |
WO2021240633A1 (ja) * | 2020-05-26 | 2021-12-02 | 日本電気株式会社 | 情報処理回路および情報処理回路の設計方法 |
CN111696528B (zh) * | 2020-06-20 | 2021-04-23 | 龙马智芯(珠海横琴)科技有限公司 | 一种语音质检方法、装置、质检设备及可读存储介质 |
CN112082628B (zh) * | 2020-09-11 | 2021-12-14 | 锐马(福建)电气制造有限公司 | 一种家畜养殖物联网数据采集系统 |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0451384A (ja) * | 1990-06-19 | 1992-02-19 | Canon Inc | ニューラルネットワークの構築方法 |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4718032A (en) * | 1985-02-14 | 1988-01-05 | Prime Computer, Inc. | Method and apparatus for effecting range transformation in a digital circuitry |
DE69130656T2 (de) | 1990-06-14 | 1999-06-17 | Canon K.K., Tokio/Tokyo | Neuronale Netzwerke |
US5473730A (en) * | 1993-11-09 | 1995-12-05 | At&T Ipm Corp. | High efficiency learning network |
JP5262248B2 (ja) | 2008-03-31 | 2013-08-14 | 富士通株式会社 | 積和演算回路 |
JP4529098B2 (ja) * | 2008-07-29 | 2010-08-25 | ソニー株式会社 | 演算処理装置および方法、並びにプログラム |
JP2012058850A (ja) * | 2010-09-06 | 2012-03-22 | Sony Corp | 画像処理装置および方法、並びにプログラム |
US8909687B2 (en) * | 2012-01-19 | 2014-12-09 | Mediatek Singapore Pte. Ltd. | Efficient FIR filters |
CN102681815B (zh) * | 2012-05-11 | 2016-03-16 | 深圳市清友能源技术有限公司 | 用加法器树状结构的有符号乘累加算法的方法 |
US20210048982A1 (en) * | 2019-08-13 | 2021-02-18 | International Business Machines Corporation | Partial product floating-point multiplication circuitry operand summation |
-
2018
- 2018-06-29 JP JP2019504151A patent/JP6504331B1/ja active Active
- 2018-06-29 WO PCT/JP2018/024923 patent/WO2019064774A1/ja unknown
- 2018-06-29 CA CA3044660A patent/CA3044660C/en active Active
- 2018-06-29 CN CN201880004505.1A patent/CN110036384B/zh active Active
- 2018-06-29 US US16/463,193 patent/US11086969B2/en active Active
- 2018-06-29 EP EP18860078.7A patent/EP3543873B1/en active Active
-
2019
- 2019-03-26 JP JP2019057948A patent/JP7103289B2/ja active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0451384A (ja) * | 1990-06-19 | 1992-02-19 | Canon Inc | ニューラルネットワークの構築方法 |
Non-Patent Citations (6)
Title |
---|
DAISUKE MIYASHITA ET AL.: "Convolutional Neural Networks using Logarithmic Data Representation", ARXIV, 3 March 2016 (2016-03-03), Retrieved from the Internet <URL:https://arxivorg/pdf/1603.01025.pdf> |
HUBARA, ITAY ET AL.: "Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations", 22 September 2016 (2016-09-22), XP080813052, Retrieved from the Internet <URL:https://arxiv.org/abs/1609.07061v1> [retrieved on 20180822] * |
MATTHIEU COURBARIAUX ET AL.: "BinaryConnect: Training Deep Neural Networks with binary weights during propagations", ARXIV, 11 November 2015 (2015-11-11), Retrieved from the Internet <URL:https://arxiv.org/pdf/1511.00363.pdf> |
MIYASHITA, DAISUKE ET AL.: "Convolutional Neural Networks using Logarithmic Data Representation", 17 March 2016 (2016-03-17), XP080686928, Retrieved from the Internet <URL:https://arxiv.org/abs/1603.01025v2> [retrieved on 20180822] * |
See also references of EP3543873A4 |
TANG, CHUAN ZHANG ET AL.: "Multilayer Feedforward Neural Networks with Single Powers-of-Two Weights", IEEE TRANSACTIONS ON SIGNAL PROCESSING, vol. 41, no. 8, August 1993 (1993-08-01), pages 2724 - 2727, XP055503286, Retrieved from the Internet <URL:https://ieeexplore.ieee.org/abstract/document/229903> [retrieved on 20180822] * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019168088A1 (ja) * | 2018-03-02 | 2019-09-06 | 日本電気株式会社 | 推論装置、畳み込み演算実行方法及びプログラム |
JPWO2019168088A1 (ja) * | 2018-03-02 | 2021-02-12 | 日本電気株式会社 | 推論装置、畳み込み演算実行方法及びプログラム |
JP7060079B2 (ja) | 2018-03-02 | 2022-04-26 | 日本電気株式会社 | 推論装置、畳み込み演算実行方法及びプログラム |
US11960565B2 (en) | 2018-03-02 | 2024-04-16 | Nec Corporation | Add-mulitply-add convolution computation for a convolutional neural network |
WO2021039164A1 (ja) * | 2019-08-26 | 2021-03-04 | ソニー株式会社 | 情報処理装置、情報処理システム及び情報処理方法 |
EP4024198A4 (en) * | 2019-08-26 | 2022-10-12 | Sony Group Corporation | INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING SYSTEM AND INFORMATION PROCESSING METHOD |
Also Published As
Publication number | Publication date |
---|---|
CN110036384B (zh) | 2021-01-05 |
JPWO2019064774A1 (ja) | 2019-11-14 |
JP7103289B2 (ja) | 2022-07-20 |
JP2019091512A (ja) | 2019-06-13 |
JP6504331B1 (ja) | 2019-04-24 |
US20200073912A1 (en) | 2020-03-05 |
EP3543873A4 (en) | 2020-02-26 |
CA3044660A1 (en) | 2019-04-04 |
CA3044660C (en) | 2020-06-09 |
EP3543873A1 (en) | 2019-09-25 |
EP3543873B1 (en) | 2022-04-20 |
CN110036384A (zh) | 2019-07-19 |
US11086969B2 (en) | 2021-08-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6504331B1 (ja) | 情報処理装置、および情報処理方法 | |
EP3657399A1 (en) | Weight pruning and quantization method for a neural network and accelerating device therefor | |
CN110363279B (zh) | 基于卷积神经网络模型的图像处理方法和装置 | |
CN110852434B (zh) | 基于低精度浮点数的cnn量化方法、前向计算方法及硬件装置 | |
CN114341892A (zh) | 具有用于高效参数更新的降低精度参数分量的机器学习硬件 | |
CN112771547A (zh) | 通信系统中的端到端学习 | |
JP6958652B2 (ja) | 情報処理装置、および情報処理方法 | |
WO2020071441A1 (ja) | 秘密シグモイド関数計算システム、秘密ロジスティック回帰計算システム、秘密シグモイド関数計算装置、秘密ロジスティック回帰計算装置、秘密シグモイド関数計算方法、秘密ロジスティック回帰計算方法、プログラム | |
Langroudi et al. | Alps: Adaptive quantization of deep neural networks with generalized posits | |
US20150113027A1 (en) | Method for determining a logarithmic functional unit | |
WO2021039164A1 (ja) | 情報処理装置、情報処理システム及び情報処理方法 | |
CN109697507B (zh) | 处理方法及装置 | |
JPH11212768A (ja) | 対数値算出回路 | |
JP6734938B2 (ja) | ニューラルネットワーク回路 | |
KR20230076641A (ko) | 부동-소수점 연산을 위한 장치 및 방법 | |
US7698356B2 (en) | Smart evaluation in computer algebra | |
US20090319589A1 (en) | Using fractional exponents to reduce the computational complexity of numerical operations | |
US11550545B2 (en) | Low-power, low-memory multiply and accumulate (MAC) unit | |
US20210081783A1 (en) | Information processing apparatus, method of processing information, and non-transitory computer-readable storage medium for storing information processing program | |
US20240061646A1 (en) | Information processing apparatus, information processing method, and information processing program | |
CN115526800A (zh) | 数据处理方法、装置、设备及介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
ENP | Entry into the national phase |
Ref document number: 2019504151 Country of ref document: JP Kind code of ref document: A |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18860078 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 3044660 Country of ref document: CA |
|
ENP | Entry into the national phase |
Ref document number: 2018860078 Country of ref document: EP Effective date: 20190619 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |