WO2020084692A1 - Computation processing device and computation processing device control method - Google Patents

Computation processing device and computation processing device control method Download PDF

Info

Publication number
WO2020084692A1
WO2020084692A1 PCT/JP2018/039370 JP2018039370W WO2020084692A1 WO 2020084692 A1 WO2020084692 A1 WO 2020084692A1 JP 2018039370 W JP2018039370 W JP 2018039370W WO 2020084692 A1 WO2020084692 A1 WO 2020084692A1
Authority
WO
WIPO (PCT)
Prior art keywords
random number
rounding
unit
digit
calculation
Prior art date
Application number
PCT/JP2018/039370
Other languages
French (fr)
Japanese (ja)
Inventor
洋征 和田
Original Assignee
富士通株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 富士通株式会社 filed Critical 富士通株式会社
Priority to PCT/JP2018/039370 priority Critical patent/WO2020084692A1/en
Priority to JP2020551748A priority patent/JP6984762B2/en
Publication of WO2020084692A1 publication Critical patent/WO2020084692A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/499Denomination or exception handling, e.g. rounding or overflow

Definitions

  • the present invention relates to an arithmetic processing device and a control method for the arithmetic processing device.
  • Deep learning which is becoming more and more important these days, requires a huge amount of calculation and memory consumption.
  • the calculation amount and the memory consumption amount increase, the calculation load and the memory load increase, and the learning time becomes long. Therefore, in order to reduce the amount of calculation and the amount of memory consumption and shorten the learning time, it is desirable to use a method of performing calculation with the lowest possible accuracy while maintaining the learning and inference capabilities. In such a method, an operation using a fixed point is often performed.
  • the rounded value of the operation result tends to be biased to a certain value with low accuracy. If the rounded values are biased, there is a problem that the learning becomes difficult to proceed from that point.
  • multiply-accumulate operation is often used for matrix elements.
  • the cumulative result is generally held at a value having a precision considerably higher than the number of inputs to the multiplication.
  • a and B which are the numbers of inputs to the multiplication in the above equation, are 16 bits wide
  • the multiplication result A ⁇ B is 32 bits.
  • the accumulation register for storing the accumulation of multiplication results preferably has a size for storing the accumulation result of 32-bit values, and may have a width of 40 bits, for example.
  • rounding is performed on the multiplication result.
  • the bit width of the input value to the multiplication result is much narrower than the bit width of the multiplication result accumulation register. Therefore, rounding the value of the accumulating register with low precision by rounding the multiplication result may result in an incorrect rounded value.
  • the rounding circuit determines the rounding by using the value obtained from the random number generator, a large-scale modification to the rounding circuit is added. The rounding process by the rounding circuit has a critical path for floating point arithmetic.
  • the disclosed technique has been made in view of the above, and an object thereof is to provide an arithmetic processing device that executes appropriate probabilistic rounding with a simple configuration, and a control method for the arithmetic processing device.
  • the random number generation unit generates a random number.
  • the random number moving unit moves the position of the random number based on the position where the rounding target number is arranged and the decimal point position information of the output data so that the beginning of the random number coincides with the rounding position of the rounding target number.
  • the adding unit adds the random number moved by the random number moving unit and the rounding target number arranged at the predetermined position.
  • the output unit outputs, as the output data, data in a predetermined range including a significant digit of a predetermined digit from the rounded position in the addition result of the addition unit.
  • the present invention can perform appropriate probabilistic rounding with a simple configuration.
  • FIG. 1 is a diagram showing an overall configuration diagram of an information processing apparatus.
  • FIG. 2 is a circuit diagram of the product-sum calculator.
  • FIG. 3 is a diagram showing an outline of the calculation of the probabilistic rounding process by the product-sum calculator according to the embodiment.
  • FIG. 4 is a diagram for explaining digit shift of random numbers.
  • FIG. 5 is a diagram for explaining alignment of added values by the normalization shifter.
  • FIG. 6 is a diagram illustrating a specific example of the probabilistic rounding process.
  • FIG. 7 is a flowchart of the entire processing executed by the product-sum calculation unit.
  • FIG. 8 is a flowchart of the probabilistic rounding process by the product-sum calculator according to the embodiment.
  • the information processing device 50 includes a PCI (Peripheral Component Interconnect) card 1 and a host computer 2.
  • the PCI card 1 and the host computer 2 are connected by a PCI bus and exchange data with each other.
  • the host computer 2 for example, performs overall management when executing deep learning.
  • the host computer 2 instructs the PCI card 1 to execute a predetermined calculation in deep learning such as a convolution calculation.
  • the PCI card 1 receives a command from the host computer 2, executes a calculation, and outputs the calculation result to the host computer 2. As shown in FIG. 1, the PCI card 1 has a plurality of processing units 10, an overall command control unit 11, a memory controller 12, a memory 13 and a PCI control unit 14. The PCI card 1 corresponds to an example of “arithmetic processing device”.
  • the PCI control unit 14 receives from the host computer 2 an input of an operation instruction for instructing execution of operation and operation data used in the operation. Then, the PCI control unit 14 outputs the acquired operation command and operation data to the memory controller 12.
  • the PCI control unit 14 receives the input of the calculation result for the designated calculation from the memory controller 12. Then, the PCI control unit 14 outputs the calculation result to the host computer 2. Specifically, the PCI control unit 14 issues an instruction to the memory controller 12 to read the calculation result in the memory 13, and causes the read data to be output to the host computer 2 via the confidence.
  • the memory controller 12 receives, from the PCI control unit 14, input of operation instructions and operation data used in the operation. Then, the memory controller 12 stores the acquired operation instruction and operation data in the memory 13.
  • the memory controller 12 receives from the overall instruction control unit 11 an instruction to store the operation data used when executing the operation in the vector register 111. Then, the memory controller 12 stores the designated operation data in the designated vector register 111.
  • the memory controller 12 bypasses the product-sum calculation unit 100 and outputs the calculation data to the multiplexer 103.
  • the overall instruction control unit 11 receives a notification of the completion of the operation from the overall instruction control unit 11 and instructs the memory controller 12 with a predetermined instruction to serially output the operation result in the vector register 111.
  • the processing units 10 arranged in line are pulled out or stored in the memory 13.
  • the overall command control unit 11 performs overall management of the operations instructed to be executed by the host computer 2.
  • the overall command control unit 11 receives an instruction from the host computer 2 via the PCI control unit 14, and sequentially reads and executes the overall command sequence stored in the memory 13.
  • As the overall instruction an instruction for transferring an operation instruction sequence from the memory 13 to the operation instruction buffer 102, an instruction for storing operation data from the memory 13 in the vector register 111, and an operation instruction control for the operation instruction sequence stored in the operation instruction buffer 102
  • the overall instruction control unit 11 causes the processing unit 10 to execute the arithmetic instruction sequence.
  • the overall instruction control unit 11 instructs the memory controller 12 to acquire the operation data used when executing the operation.
  • the overall command control unit 11 instructs the memory controller 12 to store the calculation result.
  • the overall instruction control unit 11 notifies the memory controller 12 of the completion of the operation.
  • the processing unit 10 includes a product-sum calculation unit 100, a calculation instruction control unit 101, a calculation instruction buffer 102, and a multiplexer 103.
  • the arithmetic instruction control unit 101 manages and controls the execution processing of arithmetic instructions.
  • the arithmetic instruction control unit 101 receives an instruction to execute an arithmetic instruction sequence from the overall instruction control unit 11.
  • an instruction that can be executed by the processing unit 10 is called an arithmetic instruction in contrast with the whole instruction.
  • the instruction includes an arithmetic instruction in a narrow sense that causes the product-sum operation unit to perform an operation, and a general-purpose register (illustrated No) operation instructions, branch instructions, repeat instructions, and instructions to stop the execution of instruction sequences.
  • the arithmetic instruction control unit 101 sequentially acquires the arithmetic instructions stored in the arithmetic instruction buffer 102. Next, the arithmetic instruction control unit 101 instructs the vector register 111 to output the arithmetic data designated by the acquired arithmetic instruction. Further, the arithmetic instruction control unit 101 outputs an instruction to execute an operation to the product-sum arithmetic unit 112 to the product-sum arithmetic unit 112 according to the acquired arithmetic instruction. After that, the operation instruction control unit 101 loops the operation using the operation result in the product-sum operation unit 112.
  • the operation instruction control unit 101 gives an instruction to execute the probabilistic rounding process, and performs a product-sum operation on decimal point position information which is information indicating which bit of the accumulating register is to be the decimal point position of the output.
  • the decimal point position information represents the decimal point position calculated from the learning result so as to keep the weighting parameter of each layer in the neural network as effective as possible within the range of the bit width that can be calculated. This value is a value determined in the process of executing the deep learning program, and is a variable value for the information processing device 50.
  • the stochastic rounding operation result stored in the vector register 111 is stored in the memory 13 via the chain of processing units and the memory controller 12.
  • the operation command control unit 101 may, for example, VECTOR. h. Issue commands such as accstrnd ELE #, QNUM, DST #.
  • QNUM represents decimal point position information.
  • ELE # is a number indicating which element is to be subjected to stochastic rounding when the product-sum calculator 112 has a register having a plurality of elements.
  • DST # represents the number of the register that stores the result of the probabilistic rounding.
  • the bit range corresponding to the QNUM of the element of the fixed-point accumulation register specified by ELE # is subjected to probabilistic rounding based on the value lower than the specified range, and the result is fixed-point. The value is stored in the register designated by DST #.
  • the arithmetic instruction buffer 102 is a storage area for storing an arithmetic instruction sequence.
  • the arithmetic instruction buffer 102 stores the arithmetic instruction sequence input from the memory controller 12 in the input order from the designated address. After that, in response to a request to acquire the arithmetic instruction from the arithmetic instruction controller 101, the arithmetic instruction buffer 102 outputs the arithmetic instruction of the requested address to the arithmetic instruction controller 101.
  • the product-sum calculation unit 100 has a vector register 111 and a product-sum calculation unit 112. However, the vector register 111 included in the product-sum calculation unit 100 corresponds to a part of the entire vector register mounted in the processing unit 10.
  • the vector register 111 receives an input of operation data used when executing an operation from the memory controller 12, and stores the input operation data. After that, the vector register 111 receives the instruction from the arithmetic instruction control unit 101 and outputs the arithmetic data used in the arithmetic to the product-sum arithmetic unit 112. In addition, in the case of the product-sum accumulation operation, the vector register 111 receives an input of the operation result subjected to the probabilistic rounding process from the product-sum operation unit 112 after the operation loop process by the product-sum operation unit 112 is completed. When the memory controller 12 receives an instruction to output to the memory 13, the vector register 111 outputs to the multiplexer 103 the operation result of the product-sum operation unit 112 that has been subjected to the stochastic rounding processing.
  • the product-sum calculator 112 receives an instruction to execute a calculation from the calculation instruction control unit 101. Then, the product-sum calculation unit 112 executes the product-sum calculation using the calculation data input from the vector register 111. After that, the product-sum calculator 112 outputs the calculation result to the vector register 111. When the accumulation is instructed by the instruction, the product-sum calculation unit 112 holds the accumulation calculation result in a register (accumulator) in the calculation unit and uses it in the subsequent accumulation calculation instruction. The product-sum calculator 112 repeats the product-sum calculation on the value input from the vector register 111 until the product-sum accumulation calculation is completed.
  • the product-sum operation unit 112 receives an instruction to execute the probabilistic rounding process from the operation instruction control unit 101.
  • the product-sum calculation unit 112 receives input of decimal point position information from the calculation instruction control unit 101.
  • the product-sum calculation unit 112 uses the calculation result, which has been stored in the internal register and which has completed the loop processing, as the probabilistic rounding target number, and uses the circuit that performs the product-sum calculation to calculate the stochastic rounding target number. Executes rounding processing.
  • the product-sum calculator 112 outputs the calculation result that has been subjected to the probabilistic rounding process to the vector register 111 and stores it.
  • FIG. 2 is a circuit diagram of the product-sum calculator.
  • the product-sum calculation unit 112 includes a random number generation circuit 121, a power generation unit 122, multiplexers 123 and 124, a multiplier 125, an exponent code calculation unit 126, a digit shifter 127, a multiplexer 128, and an adder 129. Further, the product-sum calculator 112 includes a fixed-point register 130, a precision loss prediction unit 131, a shift amount calculation unit 132, a multiplexer 133, a normalization shifter 134, and a rounding circuit 135.
  • the product-sum calculator 112 performs two processes: an actual calculation process requested by the host computer 2, such as convolution, and a process of calculating a probabilistic rounded value of the calculation result. Therefore, the actual calculation process is called an actual calculation, and the process of calculating the probabilistic rounding value of the calculation result is called the probabilistic rounding process.
  • the multiplexer 124 receives the input of one calculation data to be multiplied.
  • the multiplexer 123 also receives the other operation data to be multiplied.
  • the multiplexers 123 and 124 output the operation data to the multiplier 125.
  • the multiplier 125 multiplies the two calculation data input from the multiplexers 123 and 124. Then, the multiplier 125 outputs the multiplication result to the adder 129.
  • the exponent code calculator 126 receives inputs of three calculation data used for actual calculation from the vector register 111. Then, the exponent sign operation unit 126 calculates the shift amount for aligning the digits of the mantissas of the product and the addend when the operation instruction is a floating-point product sum operation. In the case of a fixed-point arithmetic instruction, the exponent sign arithmetic unit 126 calculates the shift amount for digit alignment. In the case of a fixed-point arithmetic instruction, the exponent sign arithmetic unit 126 uses a constant pre-installed in hardware as the shift amount of the digit shifter so that the digit of the addend matches the digit of the multiplication result.
  • the exponent code calculator 126 calculates the sign of the calculation result. Then, the exponent sign operation unit 126 determines the remaining operation data other than the operation data input to the multiplexers 123 and 124, and the shift amount for digitizing the operation data (addend) and the multiplication result of the multiplier 125. Output to the digit shifter 127.
  • the exponent sign operation unit 126 receives an input of the result of predicting the amount of precision loss from the precision loss unit 131 when the arithmetic instruction is a floating-point multiply-add operation. Then, the exponent code calculation unit 126 calculates the left shift amount used for normalizing the mantissa of the calculation result from the obtained precision loss amount. The normalization of the mantissa is to adjust (shift) the bit positions of the entire mantissa so that the most significant digit of the mantissa is 1. After that, the exponent code calculator 126 outputs the calculated left shift amount to the multiplexer 133.
  • the digit loss amount obtained by the digit loss amount prediction unit 131 may include an error within a predetermined range from the true digit loss amount depending on the circuit configuration. Whether or not there is an error in the predicted digit cancellation amount can be known by checking whether or not the result of shifting the designated amount by the normalization shifter 134 is that the mantissa is normalized correctly. When it is found that there is an error in the digit cancellation amount, the normalization shifter 134 performs an additional shift for adjustment and notifies the exponent code operation unit that there is an error in the prediction. When the exponent code calculation unit 126 receives the error notification from the normalization shifter 134, the exponent code calculation unit 126 calculates the exponent in consideration of the additional shift for error adjustment.
  • the exponent code calculator 126 calculates the exponent when there is no additional shift.
  • the exponent sign operation unit 126 receives a notification from the rounding circuit 135 and adjusts the exponent when a carry occurs due to rounding of the mantissa.
  • the exponent sign operation unit 126 outputs the finally obtained sign and exponent, and concatenates this with the rounding result of the mantissa output from the rounding circuit 135 described later, thereby completing the floating-point operation result.
  • the calculation result is output to the vector register 111.
  • the digit shifter 127 shifts the operation data input from the exponent code operation unit 126 by the shift amount specified by the exponent code operation unit 126. Then, the arithmetic data having undergone digit alignment is output to the multiplexer 128.
  • the multiplexer 128 selects the input from the digit shifter 127 in the case of a floating point actual operation.
  • the fixed-point cumulative calculation when the accumulation is in progress, it is not necessary to align the numbers to be added, so the input from the fixed-point register 130 is selected.
  • the fixed-point cumulative calculation in the case of the first cumulative calculation, the number input from the vector register 111 to the product-sum calculation unit 112 is selected as an addend, and therefore the input from the digit shifter 127 is selected. Then, the multiplexer 128 outputs the operation data input from the digit shifter 127 to the adder 129.
  • the adder 129 receives an input of the multiplication result of two pieces of operation data from the multiplier 125.
  • the adder 129 also receives from the multiplexer 128 the input of the remaining digitized digitized operation data (addend). Then, the adder 129 adds the multiplication result of the two calculation data and the remaining calculation data (addend) aligned with the digit. Then, the adder 129 outputs the addition result to the normalization shifter 134 and the precision loss prediction unit 131.
  • the adder 129 performs two-stage addition, that is, carry save addition and carry propagation addition, using two numbers of the addition result signal and the carry signal. The sum of the stages of addition is simply called addition. Then, actually, the adder 129 outputs the result of the carry save addition to the carry loss amount prediction unit 131.
  • the carry loss prediction unit 131 receives the input of the intermediate addition result from the adder 129. Then, the digit loss amount prediction unit 131 predicts the digit loss amount from the acquired addition result. After that, the digit loss amount prediction unit 131 outputs the predicted digit loss amount to the exponent code calculation unit 126.
  • the multiplexer 133 selects the input from the exponent code calculator 126 in the case of actual calculation. Then, the multiplexer 133 outputs the left shift amount input from the exponent code calculation unit 126 to the normalization shifter 134.
  • the normalization shifter 134 receives the addition result input from the adder 129. Further, the normalization shifter 134 receives the input of the left shift amount from the multiplexer 133. Then, the normalization shifter 134 shifts the addition result to the left according to the input left shift amount and adjusts the output position. After that, the normalization shifter 134 outputs the left-shifted operation result to the rounding circuit 135.
  • the normalization shifter 134 performs an additional shift for error adjustment and notifies the exponent code operation unit 126 of the error notification. Send.
  • the rounding circuit 135 receives the input of the calculation result from the normalization shifter 134. Then, the rounding circuit 135 executes rounding with a predetermined number of digits. After that, the rounding circuit 135 outputs the calculation result rounded to a predetermined digit to the vector register 111.
  • the random number generation circuit 121 In the case of probabilistic rounding processing, the random number generation circuit 121 generates an n-bit uniform random number. Then, the random number generation circuit 121 outputs the generated random number to the multiplexer 123.
  • the random number generated by the random number generation circuit 121 does not have to be a truly uniform random number, but may be a pseudo random number within a practically usable range.
  • the random number generation circuit 121 uses, for example, an LFSR (Linear Feedback Shift Register) arranged in the product-sum calculator 112.
  • LFSR Linear Feedback Shift Register
  • the internal state of the LFSR is updated every time the probabilistic rounding instruction is executed, and the LFSR outputs a new random number value when the next probabilistic rounding instruction is issued.
  • the random number generation circuit 121 is not limited to the LFSR, and may be a pseudo random number generation circuit having higher randomness, or a circuit that acquires a random bit from the fluctuation of the environment.
  • the random number generation circuit 121 may be a circuit attached to each product-sum calculation unit 112 or a circuit shared by a plurality of product-sum calculation units 112.
  • the random number generation circuit 121 corresponds to an example of “random number generation unit”.
  • the value of the bit lower than the rounding position in the number of stochastic rounding targets is finally discarded.
  • the expected value after the probabilistic rounding is stochastic. It can be set to a value equal to the number of rounding targets.
  • the relationship of n ⁇ m is to be maintained even when there are many m, the number of bits to be calculated increases, and the circuit for digit alignment and addition correspondingly increases.
  • circuit becomes too large, it becomes difficult for the existing hardware of the product-sum calculation unit 112 to accommodate the calculation circuit having the number of bits to be used. In that case, it is not preferable to add an arithmetic circuit for probabilistic rounding of the lacking portion, because the circuit amount increases.
  • n ⁇ m a method of keeping n in the number of bits that can be stored in the existing product-sum calculator 112 can be considered.
  • the bits lower than the n-th bit lower than the rounding position are not added to the random number and do not contribute to the result after rounding.
  • the expected value after rounding deviates from the value before rounding by that amount. However, the deviation of the expected value is reduced to about half each time n is increased by 1. Therefore, if n is increased to a certain degree or more, the deviation due to the large value of m becomes sufficiently small, which is not a practical problem. Therefore, it is preferable that an appropriate value be determined for n according to practical requirements and the amount of existing arithmetic circuits.
  • the multiplexer 123 selects the n-bit uniform random number input from the random number generation circuit 121. Then, the multiplexer 123 outputs the n-bit uniform random number to the multiplier 125.
  • the power generation unit 122 receives an input of a probabilistic rounding instruction from the arithmetic instruction control unit 101. At the same time, the power generation unit 122 acquires the decimal point position information of the operand included in the probabilistic rounding instruction. Then, the power generation unit 122 uses the decimal point position information to generate a power of 2 according to the position to be rounded.
  • the position to be rounded corresponds to a digit in which the digit immediately above the digit is a significant digit. Then, the power generation unit 122 outputs the generated power of 2 to the multiplexer 124.
  • the multiplexer 124 selects the power of 2 input from the power generation unit 122. Then, the multiplexer 124 outputs the power of 2 to the multiplier 125.
  • the multiplier 125 receives an n-bit uniform random number input from the multiplexer 123. Further, the multiplier 125 receives an input of a power of 2 corresponding to the position to be rounded from the multiplexer 124. Then, the multiplier 125 multiplies the obtained random number by the obtained power of 2 to match the leading digit of the random number with the position to be rounded in the probabilistic rounding target number.
  • FIG. 3 is a diagram showing an outline of the calculation of the probabilistic rounding process by the product-sum calculator according to the embodiment.
  • the position P in the probabilistic rounding target number 200 represents a rounding position.
  • the range L is a range used as a calculation result.
  • the random number generation circuit 121 generates a random number 201 which is an n-bit uniform random number. Then, the random number 201 is multiplied by a power of 2 by the multiplier 125, and the random number 201 is left-shifted so that the head of the random number 201 matches the rounding position P of the probabilistic rounding target number 200. At this time, the random number 201 is a random number having a lower n-bit digit from the rounding position.
  • FIG. 4 is a diagram for explaining digit shift of random numbers.
  • the random number 201 is a 12-bit random number and a decimal point is a 3-bit random number.
  • the decimal point position of the fixed-point decimal accumulator 211 is immediately to the right of the least significant bit.
  • the multiplier 125 shifts the random number by multiplying the random number 201 by 2 ⁇ E which is a power of 2.
  • QNUM 0, all digits of the random number are buried below the least significant bit of the fixed-point decimal accumulator 211.
  • the position D represents the decimal point position.
  • a frame 210 represents a shift of the random number 201 by the multiplier 125 that performs a multiplication of 24 bits ⁇ 24 bits with single precision.
  • the least significant digit of the multiplier 125 in this case is 2 ⁇ ⁇ 8 when considering a fixed decimal number of 16 bits.
  • the multiplier 125 performs a process of multiplying 2 ⁇ -9, but the multiplier 125 does not have a circuit of that digit.
  • the multiplier 125 multiplies the random number 201 by a power of 2 obtained from QNUM that takes a value between 0 and 24, and thus the lowest significant digit in the fixed-point decimal accumulator 211. The leading position of the random number 201 is aligned with the next lower digit.
  • the multiplier 125 outputs to the adder 129 a random number in which the leading digit is aligned with the digit one lower than the least significant digit of the significant digit that is the multiplication result.
  • the multiplier 125 corresponds to an example of the “random number moving unit”.
  • the fixed-point register 130 is a cumulative register (accumulator) used in product-sum cumulative calculation.
  • the fixed point register 130 stores the number of probabilistic rounding targets that are targets of probabilistic rounding.
  • the probabilistic rounding target number is a calculation result calculated in actual calculation. Since the multiplication result is added to the value of the accumulation register in the accumulation operation, the fixed-point register 130 has a bit width sufficient to set the value of the accumulation register. Then, in the case of probabilistic rounding processing, the fixed point register 130 outputs the number of probabilistic rounding targets to the multiplexer 128.
  • the multiplexer 128 selects the number of probabilistic rounding targets input from the fixed-point register 130. Then, the multiplexer 128 outputs the probabilistic rounding target number to the adder 129.
  • the adder 129 receives from the multiplier 125 an input of a random number in which the leading digit is aligned with the digit one lower than the least significant digit. Further, the adder 129 receives the input of the number of probabilistic rounding targets from the multiplexer 128. Then, the adder 129 adds a random number to the number of probabilistic rounding targets. As a result, probabilistic rounding is performed according to the number of digits below the beginning of the random number in the number of probabilistic rounding targets. That is, the adder 129 performs probabilistic rounding by adding a random number to the number of probabilistic rounding targets. After that, the adder 129 outputs the addition result to the normalization shifter 134.
  • the adder 129 corresponds to an example of “adding unit”.
  • the probabilistic rounding target number 200 is provided from the fixed-point register 130.
  • the range L represents a range of numerical values which is desired to be used as a result of the rounding process.
  • the adder 129 adds the random number 201 whose start position has been shifted to the rounding position P by the multiplier 125 to the probabilistic rounding target number 200.
  • the adder 129 adds a bit having a value of 0 to the stochastic rounding target number 200 to set the least significant digit. Match and then add.
  • the added value 202 in which the carry M1 is stochastically generated according to the value equal to or smaller than the rounding position P is obtained.
  • the maximum deviation of the expected value after rounding is about 0.00025.
  • the number of bits of the random number is not limited to this, and it is preferable that an appropriate number of bits of 1 or more is selected according to the requirements expected for the operation and the balance between the existing circuit and the allowable additional circuit amount. .
  • the shift amount calculation unit 132 receives input of decimal point position information from the arithmetic instruction control unit 101. Then, the shift amount calculation unit 132 calculates the shift amount according to the decimal point position information. Specifically, the shift amount calculation unit 132 uses the shift used to move from the position where valid data is placed in the data output from the adder 129 to the valid data position in the output of the normalization shifter. QNUM is calculated. After that, the shift amount calculation unit 132 outputs the calculated shift amount to the multiplexer 133.
  • the multiplexer 133 selects the shift amount input from the shift amount calculation unit 132. Then, the multiplexer 133 outputs the shift amount to the normalization shifter 134.
  • the normalization shifter 134 receives from the adder 129 the input of the number of probabilistic rounding targets subjected to the probabilistic rounding process.
  • the normalization shifter 134 also receives an input of the shift amount from the multiplexer 133. Then, the normalization shifter 134 shifts the number of probabilistic rounding targets according to the shift amount. Specifically, the normalization shifter 134 shifts the digit right above the rounding position of the input target number to the leftmost digit of the valid number output from the product-sum calculator 112. Make a shift.
  • the normalization shifter 134 outputs the left-shifted probabilistic rounding target number to the rounding circuit 135.
  • the normalization shifter 134 is an example of the “moving unit”.
  • the normalization shifter 134 shifts the added value 202 so that the least significant digit of the range L used as the calculation result in the added value 202 matches the least significant digit of the output data, and shifts the shift value. 203.
  • FIG. 5 is a diagram for explaining alignment of added values by the normalization shifter.
  • the 40-bit addition value 202 is output from the adder 129.
  • the value output from the adder 129 corresponds to the position from bit 16 to bit 55 of the intermediate bus before the output of the normalization shifter 134. Further, 16-bit data of the value is output. Then, bits 48 to 63 of the operation result bus from which data is output from the normalization shifter 134 are output as data.
  • the normalization shifter 134 shifts the added value 202 to the left. As a result, the normalization shifter 134 moves the used 16 bits of the addition value 202 from the bit 48 of the operation result bus to the output position 214 corresponding to the bit 63.
  • the normalization shifter 134 moves the data 212 from bit 40 to bit 55 of the intermediate result bus to the output position 214 of bit 48 to bit 63 of the operation result bus.
  • the left shift amount is 8. That is, in the example of FIG. 5, the normalization shifter 134 performs the left shift by using the left shift amount obtained as “32-QNUM” by the shift amount calculation unit 132.
  • the rounding circuit 135 receives, from the normalization shifter 134, an input of the left-shifted stochastic rounding target number. Then, in the case of the probabilistic rounding processing, the rounding circuit 135 cuts off the digits below the predetermined digit of the input number of stochastic rounding targets. Then, the rounding circuit 135 outputs the probabilistic rounding target number in which digits below the predetermined digit are truncated. The output from the rounding circuit 135 is sent to the vector register 111 as output data.
  • the call rounding circuit 135 is an example of the “output unit”.
  • the rounding circuit 135 performs the cutoff M2 on the lower digit of the range L used in the shift value 203. Then, the range 204, which is the higher digit of the range L to be used, of the data output from the rounding circuit 135 is not included in the output data 205 and is discarded. Then, the output data 205 is sent to the vector register 111.
  • the rounding circuit 135 is an ordinary floating point arithmetic circuit. In normal rounding of floating-point arithmetic, the rounding circuit 135 determines the rounding bit or sticky bit obtained from the lower digit, the value of the least significant digit of the round, the positive / negative of the operation result, and the specified rounding mode, and the rounding circuit 135 One of the following two processes is performed.
  • the first process is a process in which the rounding circuit 135 outputs a value obtained by cutting the input value below the rounding position as it is.
  • the second process is a process in which the rounding circuit 135 adds 1 to a value obtained by cutting the input value below the rounding position and outputs the value.
  • the rounding circuit 135 is designated to always select the above-described first process.
  • FIG. 6 is a diagram illustrating a specific example of the probabilistic rounding process.
  • FIG. 6 a case where processing is performed using a 64-bit wide bus will be described. Further, the position of each data will be described as [x: y], where x represents the most significant bit and y represents the least significant bit.
  • the stochastic rounding target number 300 output from the fixed-point register 130 is located at [55:16].
  • bits 15 and below correspond to decimals and below.
  • bits 31 and below of the probabilistic rounding target number 300 are rounded, and [47:32] is a range to be used.
  • the logical random number 301 generated by the random number generation circuit 121 is logically arranged such that the decimal point is 15 or less, and is located in the place surrounded by the broken line.
  • the multiplier 125 multiplies the random number by 2 ⁇ 7 to shift the random number 301 to the position [31:20].
  • the adder 129 adds a bit having a value of 0 to the random number 301 located at [31:20] so that the least significant bit coincides with the stochastic rounding target number 300, and adds it to the stochastic rounding target number 300. ,
  • the addition value 302 is calculated.
  • the added value 302 is located at [55:16].
  • the output range 303 is [63:48].
  • the normalization shifter 134 shifts the stochastic rounding target number 300 so that the data of the range [47:32] used in the stochastic rounding target number 300 is located at [63:48]. ..
  • the rounding circuit 135 aborts and discards bits 49 or less in the number 300 of stochastic rounding targets after the shift. Furthermore, 64 or more bits in the number 300 of stochastic rounding targets after shifting are not output and are discarded. As a result, the remaining 16-bit output data 304 is output as the calculation result.
  • FIG. 7 is a flowchart of the entire processing executed by the product-sum calculation unit.
  • the product-sum calculation unit 100 executes the product-sum calculation in the actual calculation using the product-sum calculation unit 112 (step S1).
  • the product-sum calculation unit 100 determines whether the product-sum accumulation calculation is completed (step S2). When the product-sum accumulation operation is not completed (step S2: No), the product-sum operation unit 100 returns to step S1 and repeats the product-sum operation.
  • step S2 when the product-sum accumulation operation is completed (step S2: Yes), the product-sum operation unit 100 uses the product-sum operation unit 112 to perform the probabilistic rounding process (step S3).
  • the calculation result stored in the vector register 111 is output to the memory 13 via the chain of processing units and the memory controller 12 (step S4).
  • FIG. 8 is a flowchart of the probabilistic rounding process by the product-sum calculator 112 according to the embodiment.
  • the process shown in the flowchart of FIG. 8 is an example of the process performed in step S3 of FIG.
  • the random number generation circuit 121 acquires an n-digit uniform random number (step S101). Then, the random number generation circuit 121 outputs the generated random number to the multiplier 125 via the multiplexer 123.
  • the power generation unit 122 generates a power of 2 according to QNUM (step S102). Then, the random number generation circuit 121 outputs the generated power of 2 to the multiplier 125 via the multiplexer 124.
  • the multiplier 125 receives a random number input from the random number generation circuit 121.
  • the multiplier 125 also receives from the multiplexer 124 an input of a power of 2 according to QNUM. Then, the multiplier 125 multiplies the random number by a power of 2, and shifts the random number so that the beginning of the random number is located at the rounding position (step S103). After that, the multiplier 125 outputs the multiplication result to the adder 129.
  • the adder 129 acquires the number of probabilistic rounding targets stored in the fixed point register 130 via the multiplexer 128 (step S104).
  • the adder 129 receives the input of the multiplication result from the multiplier 125. Then, the adder 129 executes the stochastic rounding by adding the multiplication result of the multiplier 125 to the number of probabilistic rounding targets (step S105). Then, the adder 129 outputs the addition result representing the number of probabilistic rounding targets subjected to the probabilistic rounding process to the normalization shifter 134.
  • the shift amount calculation unit 132 calculates the shift amount according to the decimal point position information acquired from the arithmetic instruction control unit 101 (step S106). Then, the shift amount calculation unit 132 outputs the calculated shift amount to the normalization shifter 134 via the multiplexer 133.
  • the normalization shifter 134 receives, from the adder 129, an input of the addition result representing the number of probabilistic rounding targets subjected to the probabilistic rounding process. Further, the normalization shifter 134 receives the input of the shift amount from the shift amount calculation unit 132. Then, the addition result is shifted to the left by the shift amount (step S107). Then, the normalization shifter 134 outputs the left-shifted value to the rounding circuit 135.
  • the rounding circuit 135 receives the input of the left-shifted value from the normalization shifter 134. Then, the rounding circuit 135 cuts off a predetermined digit or less of the left-shifted value and discards bits below the output range (step S108).
  • the rounding circuit 135 outputs a predetermined number of bits from the lower order as a result (step S109).
  • the product-sum calculator 112 has a random number generation circuit 121, a power multiplier generation circuit 122, a shift amount calculation unit 132, and multiplexers 123, 124, and 133 added to a circuit that performs floating-point calculation. Then, the probabilistic rounding process using fixed point is executed.
  • the product-sum calculator 112 according to the present embodiment can execute the fixed-point stochastic rounding by adding a small amount of circuits to the circuit used for the floating-point product-sum calculation. Become. Further, the product-sum calculator 112 according to the present embodiment executes stochastic rounding on the cumulative calculation result of multiplication. Therefore, it is possible to execute appropriate probabilistic rounding with a simple configuration.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Computing Systems (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Complex Calculations (AREA)

Abstract

A random number generation circuit (121) generates a random number. On the basis of the location in which a number to be rounded is positioned and decimal point location information for output data, a multiplier (125) shifts the location of the random number such that the beginning of the random number matches the rounding location of the number to be rounded. An adder (129) adds the random number that was shifted by the multiplier (125) and the number to be rounded, which was positioned in a prescribed location. A rounding circuit (135) outputs, as output data, data of a prescribed range that includes a prescribed number of digits of significant figures from the rounding location in the addition result from the adder (129).

Description

演算処理装置及び演算処理装置の制御方法Arithmetic processing device and method for controlling arithmetic processing device
 本発明は、演算処理装置及び演算処理装置の制御方法に関する。 The present invention relates to an arithmetic processing device and a control method for the arithmetic processing device.
 昨今重要性を増す深層学習(Deep Learning)は、計算量やメモリ消費量が莫大である。計算量やメモリ消費量が増加すると、計算負荷及びメモリ負荷などが増加し、学習時間が長くなる。そこで、計算量やメモリ消費量を削減して学習時間を短縮するため、学習及び推論能力を維持しつつ、できる限り低い精度で計算を行う手法を用いることが望ましい。このような手法においては、固定小数点を用いた演算が行われることが多い。 Deep learning, which is becoming more and more important these days, requires a huge amount of calculation and memory consumption. When the calculation amount and the memory consumption amount increase, the calculation load and the memory load increase, and the learning time becomes long. Therefore, in order to reduce the amount of calculation and the amount of memory consumption and shorten the learning time, it is desirable to use a method of performing calculation with the lowest possible accuracy while maintaining the learning and inference capabilities. In such a method, an operation using a fixed point is often performed.
 ただし、低精度で計算を行う場合に挙げられる問題として、演算結果を丸めた値が、低精度ではある決まった値に偏り易くなる。丸めた値が偏ると、その時点から学習が進みにくくなるという問題が存在する。 However, as a problem that occurs when calculating with low accuracy, the rounded value of the operation result tends to be biased to a certain value with low accuracy. If the rounded values are biased, there is a problem that the learning becomes difficult to proceed from that point.
 そのような低精度における値の偏りを解消するために、確率的丸めを導入した従来技術がある。これは、丸める前の計算結果の丸めようとする桁より下位の値に応じた確率で、丸め値を丸め桁で打ち切った(truncated)値にするか、その値に1を加算した値にするかを選択して、丸め結果がとる期待値と丸め前の値とを同等にする技術である。例えば、演算結果1.8を整数に丸める場合、確率的丸めを用いると、丸め結果として80%の確率で2が得られ、20%の確率で1が得られる。これにより、丸め結果の期待値は、丸め前の値と同じ1.8となる。 There is a conventional technique that introduced probabilistic rounding in order to eliminate such a bias in values at low precision. This is the probability depending on the value lower than the digit to be rounded in the calculation result before rounding, and the rounded value is made a value that is truncated at the rounding digit (truncated) or the value obtained by adding 1 to the value. This is a technique in which the expected value of the rounding result and the value before rounding are made equal by selecting or. For example, when the operation result 1.8 is rounded to an integer, if probabilistic rounding is used, 80 is 2% with a probability of 20 and 1 is 20% with a probability of rounding. As a result, the expected value of the rounding result is 1.8, which is the same as the value before rounding.
 ここで、深層学習においては、行列の要素に対し積和累積演算が多用される。積和累積演算とは、C’=C+A×Bの形の演算であり、ある時点までの計算結果に次の要素の積を累積していく演算である。このとき累積演算を継続する間、累積結果は、乗算への入力数よりかなり高い精度を有する値で保持されることが一般的である。例えば、上述の式における乗算への入力数であるA及びBが16ビット幅であるとき、乗算結果であるA×Bは、32ビットとなる。このとき乗算結果の累積を格納する累積レジスタについては、32ビットの値の累積結果を格納する大きさを有することが好ましく、例えば、40ビット幅とされるような場合がある。 In deep learning, multiply-accumulate operation is often used for matrix elements. The product-sum cumulative operation is an operation in the form of C '= C + A * B, and is an operation of accumulating the product of the next element in the calculation result up to a certain time. At this time, while continuing the cumulative operation, the cumulative result is generally held at a value having a precision considerably higher than the number of inputs to the multiplication. For example, when A and B, which are the numbers of inputs to the multiplication in the above equation, are 16 bits wide, the multiplication result A × B is 32 bits. At this time, the accumulation register for storing the accumulation of multiplication results preferably has a size for storing the accumulation result of 32-bit values, and may have a width of 40 bits, for example.
 以上のことから、乗算結果を丸めた場合、精度の低下が大きくなるおそれがある。そこで、深層学習を実行するハードウェアにおいては、乗算への入力数よりもビット数が多い積和累積演算の結果に対し、確率的丸めを行い低精度にした結果を出力するという処理が行われることが望ましい。 From the above, if the multiplication result is rounded, there is a possibility that the accuracy will be greatly reduced. Therefore, in hardware that performs deep learning, a process of performing probabilistic rounding on a result of a product-sum cumulative operation that has more bits than the number of inputs to multiplication and outputting a low-precision result is performed. Is desirable.
 丸めを行う際、丸めにより捨てられることになる下位桁に、何らかの数を加算することで所望する場合に繰り上がりを発生させ、丸め位置により上位桁に所望の丸め結果を得るという手法が存在する。この手法は確率的丸めにおいても有効である。そこで、積和演算器内の加算回路を活用して下位桁に加算を行うことで確率的丸めを行う従来技術が提案されている。また、ランダムノイズ回路からの出力をデータの端数部に加算して丸め処理を実行する従来技術がある。さらに、累積レジスタの丸めを行う装置として、丸め回路が、乱数生成器から値を受け取り丸めの判断に用いる従来技術がある。 When performing rounding, there is a method to generate a carry when desired by adding some number to the lower digit that will be discarded due to rounding, and to obtain the desired rounding result in the upper digit depending on the rounding position. . This method is also effective in stochastic rounding. Therefore, a conventional technique has been proposed in which stochastic rounding is performed by utilizing the addition circuit in the product-sum calculator to perform addition to the lower digits. Further, there is a conventional technique in which an output from a random noise circuit is added to a fractional part of data to perform a rounding process. Further, as a device for rounding a cumulative register, there is a conventional technique in which a rounding circuit receives a value from a random number generator and uses it for rounding judgment.
米国特許出願公開第2017/0220341号明細書U.S. Patent Application Publication No. 2017/0220341 米国特許出願公開第2017/0102920号明細書U.S. Patent Application Publication No. 2017/0102920 特開平03-63722号公報Japanese Patent Laid-Open No. 03-63722 特表2004-506365号公報Special table 2004-506365 gazette
 しかしながら、確率的丸めに積和演算器内の加算回路を用いる従来技術では、乗算結果に対する丸めが行われる。先に述べたとおり、一般には、乗算結果への入力値のビット幅は、乗算結果の累積レジスタのビット幅よりはるかに狭い。そのため、乗算結果の丸めにより累積レジスタの値を低精度に丸めることは、丸め値が不適切な値になるおそれがある。また、丸め回路に乱数生成器から取得した値を用いて丸めを判断させる従来技術では、丸め回路への大規模な改造が加えられる。丸め回路による丸め処理は浮動小数点演算のクリティカルパスを有する。そのため、丸め回路に対して確率的丸めを行うための規模の大きな改造を加えた場合、既存演算のクリティカルパスのディレイ悪化要因となるおそれがある。すなわち、浮動小数点積和演算器に大規模な改造を加えることは困難である。 However, in the conventional technique that uses the adder circuit in the product-sum calculator for probabilistic rounding, rounding is performed on the multiplication result. As described above, generally, the bit width of the input value to the multiplication result is much narrower than the bit width of the multiplication result accumulation register. Therefore, rounding the value of the accumulating register with low precision by rounding the multiplication result may result in an incorrect rounded value. Further, in the conventional technique in which the rounding circuit determines the rounding by using the value obtained from the random number generator, a large-scale modification to the rounding circuit is added. The rounding process by the rounding circuit has a critical path for floating point arithmetic. Therefore, when a large-scale modification is performed on the rounding circuit to perform the probabilistic rounding, there is a risk that it may become a factor of worsening the delay of the critical path of the existing operation. That is, it is difficult to add a large-scale modification to the floating point multiply-add calculator.
 開示の技術は、上記に鑑みてなされたものであって、簡易な構成で適切な確率的丸めを実行する演算処理装置及び演算処理装置の制御方法を提供することを目的とする。 The disclosed technique has been made in view of the above, and an object thereof is to provide an arithmetic processing device that executes appropriate probabilistic rounding with a simple configuration, and a control method for the arithmetic processing device.
 本願の開示する演算処理装置及び演算処理装置の制御方法は、一つの態様において、乱数生成部は、乱数を生成する。乱数移動部は、丸め対象数が配置される位置及び出力データの小数点位置情報を基に、前記乱数の先頭が前記丸め対象数の丸め位置に一致するように前記乱数の位置を移動する。加算部は、前記乱数移動部により移動された前記乱数と前記所定位置に配置された前記丸め対象数とを加算する。出力部は、前記加算部による加算結果における前記丸め位置から所定桁の有効数字を含む所定範囲のデータを前記出力データとして出力する。 In one aspect of the arithmetic processing device and the control method of the arithmetic processing device disclosed in the present application, the random number generation unit generates a random number. The random number moving unit moves the position of the random number based on the position where the rounding target number is arranged and the decimal point position information of the output data so that the beginning of the random number coincides with the rounding position of the rounding target number. The adding unit adds the random number moved by the random number moving unit and the rounding target number arranged at the predetermined position. The output unit outputs, as the output data, data in a predetermined range including a significant digit of a predetermined digit from the rounded position in the addition result of the addition unit.
 1つの側面では、本発明は、簡易な構成で適切な確率的丸めを実行することができる。 In one aspect, the present invention can perform appropriate probabilistic rounding with a simple configuration.
図1は、情報処理装置の全体構成図を示す図である。FIG. 1 is a diagram showing an overall configuration diagram of an information processing apparatus. 図2は、積和演算器の回路図である。FIG. 2 is a circuit diagram of the product-sum calculator. 図3は、実施例に係る積和演算器による確率的丸め処理の計算の概要を表す図である。FIG. 3 is a diagram showing an outline of the calculation of the probabilistic rounding process by the product-sum calculator according to the embodiment. 図4は、乱数の桁合わせシフトを説明するための図である。FIG. 4 is a diagram for explaining digit shift of random numbers. 図5は、正規化シフタによる加算値の位置合わせを説明するための図である。FIG. 5 is a diagram for explaining alignment of added values by the normalization shifter. 図6は、確率的丸め処理についての具体例を表す図である。FIG. 6 is a diagram illustrating a specific example of the probabilistic rounding process. 図7は、積和演算部が実行する処理全体のフローチャートである。FIG. 7 is a flowchart of the entire processing executed by the product-sum calculation unit. 図8は、実施例に係る積和演算器による確率的丸め処理のフローチャートである。FIG. 8 is a flowchart of the probabilistic rounding process by the product-sum calculator according to the embodiment.
 以下に、本願の開示する演算処理装置及び演算処理装置の制御方法の実施例を図面に基づいて詳細に説明する。なお、以下の実施例により本願の開示する演算処理装置及び演算処理装置の制御方法が限定されるものではない。 An embodiment of the arithmetic processing device and the control method for the arithmetic processing device disclosed in the present application will be described below in detail with reference to the drawings. It should be noted that the arithmetic processing device and the method for controlling the arithmetic processing device disclosed in the present application are not limited by the following embodiments.
 図1は、情報処理装置の全体構成図である。情報処理装置50は、PCI(Peripheral Component Interconnect)カード1及びホストコンピュータ2を有する。PCIカード1とホストコンピュータ2とはPCIバスで接続され、互いにデータの送受信を行う。 1 is an overall configuration diagram of the information processing device. The information processing device 50 includes a PCI (Peripheral Component Interconnect) card 1 and a host computer 2. The PCI card 1 and the host computer 2 are connected by a PCI bus and exchange data with each other.
 ホストコンピュータ2は、例えば、深層学習を実行する際の全体的な管理を行う。深層学習を実行する場合、ホストコンピュータ2は、PCIカード1に対して畳み込み演算などの深層学習における所定の演算の実行を指示する。 The host computer 2, for example, performs overall management when executing deep learning. When executing deep learning, the host computer 2 instructs the PCI card 1 to execute a predetermined calculation in deep learning such as a convolution calculation.
 PCIカード1は、ホストコンピュータ2からの指示を受けて演算を実行し、演算結果をホストコンピュータ2に出力する。PCIカード1は、図1に示すように、複数の処理ユニット10、全体命令制御部11、メモリコントローラ12、メモリ13及びPCI制御部14を有する。このPCIカード1が、「演算処理装置」の一例にあたる。 The PCI card 1 receives a command from the host computer 2, executes a calculation, and outputs the calculation result to the host computer 2. As shown in FIG. 1, the PCI card 1 has a plurality of processing units 10, an overall command control unit 11, a memory controller 12, a memory 13 and a PCI control unit 14. The PCI card 1 corresponds to an example of “arithmetic processing device”.
 PCI制御部14は、演算の実行を指示する演算命令や演算で使用する演算データの入力をホストコンピュータ2から受ける。そして、PCI制御部14は、取得した演算命令や演算データをメモリコントローラ12へ出力する。 The PCI control unit 14 receives from the host computer 2 an input of an operation instruction for instructing execution of operation and operation data used in the operation. Then, the PCI control unit 14 outputs the acquired operation command and operation data to the memory controller 12.
 また、PCI制御部14は、指示された演算に対する演算結果の入力をメモリコントローラ12から受ける。そして、PCI制御部14は、演算結果をホストコンピュータ2へ出力する。具体的には、PCI制御部14は、メモリコントローラ12に、メモリ13内の演算結果を読み出す指示を出し、読み出したデータを、自信を経由させてホストコンピュータ2へ出力させる。 Further, the PCI control unit 14 receives the input of the calculation result for the designated calculation from the memory controller 12. Then, the PCI control unit 14 outputs the calculation result to the host computer 2. Specifically, the PCI control unit 14 issues an instruction to the memory controller 12 to read the calculation result in the memory 13, and causes the read data to be output to the host computer 2 via the confidence.
 メモリコントローラ12は、演算命令や演算で使用する演算データの入力をPCI制御部14から受ける。そして、メモリコントローラ12は、取得した演算命令及び演算データをメモリ13へ格納する。 The memory controller 12 receives, from the PCI control unit 14, input of operation instructions and operation data used in the operation. Then, the memory controller 12 stores the acquired operation instruction and operation data in the memory 13.
 また、メモリコントローラ12は、演算を実行する際に用いる演算データのベクタレジスタ111への格納の指示を全体命令制御部11から受ける。そして、メモリコントローラ12は、指定された演算データを指定されたベクタレジスタ111へ格納する。ここで、メモリコントローラ12は、直列に並んだ処理ユニット10のうちの後段の処理ユニット10に対してデータを送信する場合、積和演算部100を迂回させてマルチプレクサ103へ演算データを出力する。 Further, the memory controller 12 receives from the overall instruction control unit 11 an instruction to store the operation data used when executing the operation in the vector register 111. Then, the memory controller 12 stores the designated operation data in the designated vector register 111. Here, when transmitting data to the subsequent processing unit 10 of the processing units 10 arranged in series, the memory controller 12 bypasses the product-sum calculation unit 100 and outputs the calculation data to the multiplexer 103.
 また、全体命令制御部11は、演算が完了すると、演算完了の通知を全体命令制御部11から受けるとともに、所定の命令によりメモリコントローラ12に指示することで、ベクタレジスタ111内の演算結果を直列に並んだ処理ユニット10のチェーン(連鎖)か引き出し、メモリ13に格納させる。 When the operation is completed, the overall instruction control unit 11 receives a notification of the completion of the operation from the overall instruction control unit 11 and instructs the memory controller 12 with a predetermined instruction to serially output the operation result in the vector register 111. The processing units 10 arranged in line are pulled out or stored in the memory 13.
 全体命令制御部11は、ホストコンピュータ2から実行が指示された演算の全体の統括管理を行う。全体命令制御部11は、ホストコンピュータ2からの指示をPCI制御部14経由で受け、メモリ13に格納された全体命令列を次々と読んでは実行する。全体命令には、メモリ13から演算命令バッファ102に演算命令列を転送する命令、メモリ13から演算データをベクタレジスタ111に格納する命令、演算命令バッファ102に格納された演算命令列を演算命令制御部101に実行開始させる命令、ベクタレジスタ111に格納された演算結果をメモリ13に格納する命令、命令列の実行を終了する命令などがある。 The overall command control unit 11 performs overall management of the operations instructed to be executed by the host computer 2. The overall command control unit 11 receives an instruction from the host computer 2 via the PCI control unit 14, and sequentially reads and executes the overall command sequence stored in the memory 13. As the overall instruction, an instruction for transferring an operation instruction sequence from the memory 13 to the operation instruction buffer 102, an instruction for storing operation data from the memory 13 in the vector register 111, and an operation instruction control for the operation instruction sequence stored in the operation instruction buffer 102 There are an instruction for causing the unit 101 to start execution, an instruction for storing the operation result stored in the vector register 111 in the memory 13, an instruction for ending the execution of the instruction sequence, and the like.
 全体命令制御部11は、演算命令列を処理ユニット10に実行させる。演算を処理ユニットに実行させる場合、全体命令制御部11は、演算を実行する際に用いる演算データの取得をメモリコントローラ12に指示する。また、処理ユニット10における演算が完了すると、全体命令制御部11は、演算結果の格納をメモリコントローラ12に指示する。さらに、実行が指示された演算の全ての処理が完了すると、全体命令制御部11は、演算完了をメモリコントローラ12へ通知する。 The overall instruction control unit 11 causes the processing unit 10 to execute the arithmetic instruction sequence. When causing the processing unit to execute an operation, the overall instruction control unit 11 instructs the memory controller 12 to acquire the operation data used when executing the operation. Further, when the calculation in the processing unit 10 is completed, the overall command control unit 11 instructs the memory controller 12 to store the calculation result. Further, when all the processes of the operation instructed to be executed are completed, the overall instruction control unit 11 notifies the memory controller 12 of the completion of the operation.
 次に、処理ユニット10について説明する。処理ユニット10は、図1に示すように1つのPCIカード1に複数搭載される。各処理ユニット10は、並列及び直列に複数接続される。処理ユニット10の数は、ある太陽においては128個である。処理ユニット10は、積和演算部100、演算命令制御部101、演算命令バッファ102及びマルチプレクサ103を有する。 Next, the processing unit 10 will be described. A plurality of processing units 10 are mounted on one PCI card 1 as shown in FIG. A plurality of processing units 10 are connected in parallel and in series. The number of processing units 10 is 128 in one sun. The processing unit 10 includes a product-sum calculation unit 100, a calculation instruction control unit 101, a calculation instruction buffer 102, and a multiplexer 103.
 演算命令制御部101は、演算命令の実行処理を管理制御する。演算命令制御部101は、演算命令列の実行の指示を全体命令制御部11から受ける。ここで、処理ユニット10で実行できる命令を、全体命令と対比させて演算命令と呼んでいるが、命令には、積和演算部に演算を行わせる狭義の演算命令のほか、汎用レジスタ(図示しない)の操作命令、分岐命令、繰り返し命令、命令列の実行を停止する命令などが含まれる。 The arithmetic instruction control unit 101 manages and controls the execution processing of arithmetic instructions. The arithmetic instruction control unit 101 receives an instruction to execute an arithmetic instruction sequence from the overall instruction control unit 11. Here, an instruction that can be executed by the processing unit 10 is called an arithmetic instruction in contrast with the whole instruction. The instruction includes an arithmetic instruction in a narrow sense that causes the product-sum operation unit to perform an operation, and a general-purpose register (illustrated No) operation instructions, branch instructions, repeat instructions, and instructions to stop the execution of instruction sequences.
 そして、演算命令制御部101は、演算命令バッファ102に格納された演算命令を順次取得する。次に、演算命令制御部101は、取得した演算命令で指定された演算データの出力をベクタレジスタ111に指示する。また、演算命令制御部101は、取得した演算命令にしたがい、積和演算器112に演算実行の指示を積和演算器112へ出力する。その後、演算命令制御部101は、積和演算器112内で演算結果を用いた演算をループさせる。 Then, the arithmetic instruction control unit 101 sequentially acquires the arithmetic instructions stored in the arithmetic instruction buffer 102. Next, the arithmetic instruction control unit 101 instructs the vector register 111 to output the arithmetic data designated by the acquired arithmetic instruction. Further, the arithmetic instruction control unit 101 outputs an instruction to execute an operation to the product-sum arithmetic unit 112 to the product-sum arithmetic unit 112 according to the acquired arithmetic instruction. After that, the operation instruction control unit 101 loops the operation using the operation result in the product-sum operation unit 112.
 そして、演算が完了すると、演算命令制御部101は、確率的丸め処理の実行の指示とともに、累積レジスタの第何ビットを出力の小数点位置とするかを示す情報である小数点位置情報を積和演算器112へ出力する。小数点位置情報は、ニューラルネットワークにおける各層の重みパラメータを、演算可能なビット幅の範囲でできるだけ有効な値に保つように学習結果から算出される小数点位置を表す。この値は深層学習のプログラムを実行していく過程で決まる値であり、情報処理装置50にとっては可変の値である。その後、メモリコントローラ12からの指示があると、ベクタレジスタ111に格納された確率的丸め処理が施された演算結果は、処理ユニットのチェーンとメモリコントローラ12を経由してメモリ13に格納される。 Then, when the operation is completed, the operation instruction control unit 101 gives an instruction to execute the probabilistic rounding process, and performs a product-sum operation on decimal point position information which is information indicating which bit of the accumulating register is to be the decimal point position of the output. Output to the device 112. The decimal point position information represents the decimal point position calculated from the learning result so as to keep the weighting parameter of each layer in the neural network as effective as possible within the range of the bit width that can be calculated. This value is a value determined in the process of executing the deep learning program, and is a variable value for the information processing device 50. Then, when an instruction is given from the memory controller 12, the stochastic rounding operation result stored in the vector register 111 is stored in the memory 13 via the chain of processing units and the memory controller 12.
 演算命令制御部101は、例えば、VECTOR.h.accstrnd ELE#,QNUM,DST#といった命令を発行する。ここで、QNUMは、小数点位置情報を表す。また、ELE#は、積和演算器112内に複数要素のレジスタがある場合に、どの要素を確率的丸め対象とするかを示す番号である。DST#は、確率的丸めの結果を格納するレジスタの番号を表す。この命令が実行されると、ELE#で指定された固定小数点累積レジスタの要素のQNUMに応じたビット範囲について、指定範囲よりも下位の値に基づく確率的丸めが行われ、その結果が固定小数点値をDST#で指定されたレジスタに格納される。 The operation command control unit 101 may, for example, VECTOR. h. Issue commands such as accstrnd ELE #, QNUM, DST #. Here, QNUM represents decimal point position information. ELE # is a number indicating which element is to be subjected to stochastic rounding when the product-sum calculator 112 has a register having a plurality of elements. DST # represents the number of the register that stores the result of the probabilistic rounding. When this instruction is executed, the bit range corresponding to the QNUM of the element of the fixed-point accumulation register specified by ELE # is subjected to probabilistic rounding based on the value lower than the specified range, and the result is fixed-point. The value is stored in the register designated by DST #.
 演算命令バッファ102は、演算命令列を格納する記憶領域である。演算命令バッファ102は、メモリコントローラ12から入力された演算命令列を指示されたアドレスから入力順に格納する。その後、演算命令制御部101からの演算命令の取得要求を受けて、演算命令バッファ102は、演算命令制御部101に要求されたアドレスの演算命令を出力する。 The arithmetic instruction buffer 102 is a storage area for storing an arithmetic instruction sequence. The arithmetic instruction buffer 102 stores the arithmetic instruction sequence input from the memory controller 12 in the input order from the designated address. After that, in response to a request to acquire the arithmetic instruction from the arithmetic instruction controller 101, the arithmetic instruction buffer 102 outputs the arithmetic instruction of the requested address to the arithmetic instruction controller 101.
 積和演算部100は、ベクタレジスタ111及び積和演算器112を有する。ただし、積和演算部100が有するベクタレジスタ111は、処理ユニット10に搭載されたベクタレジスタ全体の一部にあたる。 The product-sum calculation unit 100 has a vector register 111 and a product-sum calculation unit 112. However, the vector register 111 included in the product-sum calculation unit 100 corresponds to a part of the entire vector register mounted in the processing unit 10.
 ベクタレジスタ111は、演算を実行する際に用いる演算データの入力をメモリコントローラ12から受けて、入力された演算データを格納する。その後、ベクタレジスタ111は、演算命令制御部101からの指示を受けて、演算で使用する演算データを積和演算器112に出力する。また、積和累積演算の場合、積和演算器112による演算のループ処理が完了後、ベクタレジスタ111は、確率的丸め処理が施された演算結果の入力を積和演算器112から受ける。そして、メモリ13への出力の指示をメモリコントローラ12から受けると、ベクタレジスタ111は、保持する確率的丸め処理が施された積和演算器112の演算結果をマルチプレクサ103へ出力する。 The vector register 111 receives an input of operation data used when executing an operation from the memory controller 12, and stores the input operation data. After that, the vector register 111 receives the instruction from the arithmetic instruction control unit 101 and outputs the arithmetic data used in the arithmetic to the product-sum arithmetic unit 112. In addition, in the case of the product-sum accumulation operation, the vector register 111 receives an input of the operation result subjected to the probabilistic rounding process from the product-sum operation unit 112 after the operation loop process by the product-sum operation unit 112 is completed. When the memory controller 12 receives an instruction to output to the memory 13, the vector register 111 outputs to the multiplexer 103 the operation result of the product-sum operation unit 112 that has been subjected to the stochastic rounding processing.
 積和演算器112は、演算命令制御部101からの演算実行の指示を受ける。そして、積和演算器112は、ベクタレジスタ111から入力された演算データを用いて積和演算を実行する。その後、積和演算器112は、演算結果をベクタレジスタ111へ出力する。命令により累積を指示された場合は、積和演算器112は、累積演算結果を演算器内のレジスタ(アキュムレータ)に保持し、後続の累積演算命令で使用する。積和演算器112は、積和累積演算が完了するまでベクタレジスタ111から入力された値に対する積和演算を繰返す。 The product-sum calculator 112 receives an instruction to execute a calculation from the calculation instruction control unit 101. Then, the product-sum calculation unit 112 executes the product-sum calculation using the calculation data input from the vector register 111. After that, the product-sum calculator 112 outputs the calculation result to the vector register 111. When the accumulation is instructed by the instruction, the product-sum calculation unit 112 holds the accumulation calculation result in a register (accumulator) in the calculation unit and uses it in the subsequent accumulation calculation instruction. The product-sum calculator 112 repeats the product-sum calculation on the value input from the vector register 111 until the product-sum accumulation calculation is completed.
 その後、積和累積演算のループ処理が終了すると、積和演算器112は、確率的丸め処理の実行の指示を演算命令制御部101から受ける。そのとき、積和演算器112は、小数点位置情報の入力を演算命令制御部101から受ける。そして、積和演算器112は、内部のレジスタに保持している、ループ処理が完了した演算結果を確率的丸め対象数として、積和演算を行う回路を用いて確率的丸め対象数に対する確率的丸め処理を実行する。その後、積和演算器112は、確率的丸め処理を施した演算結果をベクタレジスタ111へ出力し、格納させる。 After that, when the loop processing of the product-sum accumulation operation is completed, the product-sum operation unit 112 receives an instruction to execute the probabilistic rounding process from the operation instruction control unit 101. At that time, the product-sum calculation unit 112 receives input of decimal point position information from the calculation instruction control unit 101. Then, the product-sum calculation unit 112 uses the calculation result, which has been stored in the internal register and which has completed the loop processing, as the probabilistic rounding target number, and uses the circuit that performs the product-sum calculation to calculate the stochastic rounding target number. Executes rounding processing. After that, the product-sum calculator 112 outputs the calculation result that has been subjected to the probabilistic rounding process to the vector register 111 and stores it.
 ここで、図2を参照して、積和演算器112の確率的丸めを行う機能について詳細に説明する。図2は、積和演算器の回路図である。 Here, the function of the product-sum calculator 112 for performing probabilistic rounding will be described in detail with reference to FIG. FIG. 2 is a circuit diagram of the product-sum calculator.
 積和演算器112は、乱数生成回路121、冪乗数生成部122、マルチプレクサ123及び124、乗算器125、指数符号演算部126、桁合シフタ127、マルチプレクサ128、加算器129を有する。さらに、積和演算器112は、固定小数点レジスタ130、桁落量予測部131、シフト量算出部132、マルチプレクサ133、正規化シフタ134及び丸め回路135を有する。ここで、積和演算器112は、畳み込みなどのホストコンピュータ2が要求した実際の演算処理と、演算結果の確率的丸め値を算出する処理との2つの処理を行う。そこで、実際の演算処理を実演算といい、演算結果の確率的丸め値を算出する処理を確率的丸め処理という。 The product-sum calculation unit 112 includes a random number generation circuit 121, a power generation unit 122, multiplexers 123 and 124, a multiplier 125, an exponent code calculation unit 126, a digit shifter 127, a multiplexer 128, and an adder 129. Further, the product-sum calculator 112 includes a fixed-point register 130, a precision loss prediction unit 131, a shift amount calculation unit 132, a multiplexer 133, a normalization shifter 134, and a rounding circuit 135. Here, the product-sum calculator 112 performs two processes: an actual calculation process requested by the host computer 2, such as convolution, and a process of calculating a probabilistic rounded value of the calculation result. Therefore, the actual calculation process is called an actual calculation, and the process of calculating the probabilistic rounding value of the calculation result is called the probabilistic rounding process.
 実演算における各部の動作を説明する。実演算の場合、マルチプレクサ124は、乗算を行う一方の演算データの入力を受ける。また、マルチプレクサ123は、乗算を行う他方の演算データの入力を受ける。この場合、マルチプレクサ123及び124は、演算データを乗算器125へ出力する。乗算器125は、マルチプレクサ123及び124から入力された2つの演算データを乗算する。そして、乗算器125は、乗算結果を加算器129へ出力する。 Explain the operation of each part in actual calculation. In the case of actual calculation, the multiplexer 124 receives the input of one calculation data to be multiplied. The multiplexer 123 also receives the other operation data to be multiplied. In this case, the multiplexers 123 and 124 output the operation data to the multiplier 125. The multiplier 125 multiplies the two calculation data input from the multiplexers 123 and 124. Then, the multiplier 125 outputs the multiplication result to the adder 129.
 一方、指数符号演算部126は、実演算に用いる3つの演算データの入力をベクタレジスタ111から受ける。そして、指数符号演算部126は、演算命令が浮動小数点の積和演算である場合に、積と加数の仮数どうしを桁合わせするためのシフト量を算出する。固定小数点演算命令の場合は、指数符号演算部126は、桁合わせするためシフト量を算出する。固定小数点演算命令の場合は、指数符号演算部126は、乗算結果の桁に加数の桁が合うようハードウェアにあらかじめ組み込まれた定数を桁合シフタのシフト量とする。また、指数符号演算部126は、演算結果の符号を算出する。そして、指数符号演算部126は、マルチプレクサ123及び124に入力された演算データ以外の残りの演算データ及びその演算データ(加数)と乗算器125の乗算結果とを桁合わせするためのシフト量を桁合シフタ127へ出力する。 On the other hand, the exponent code calculator 126 receives inputs of three calculation data used for actual calculation from the vector register 111. Then, the exponent sign operation unit 126 calculates the shift amount for aligning the digits of the mantissas of the product and the addend when the operation instruction is a floating-point product sum operation. In the case of a fixed-point arithmetic instruction, the exponent sign arithmetic unit 126 calculates the shift amount for digit alignment. In the case of a fixed-point arithmetic instruction, the exponent sign arithmetic unit 126 uses a constant pre-installed in hardware as the shift amount of the digit shifter so that the digit of the addend matches the digit of the multiplication result. Further, the exponent code calculator 126 calculates the sign of the calculation result. Then, the exponent sign operation unit 126 determines the remaining operation data other than the operation data input to the multiplexers 123 and 124, and the shift amount for digitizing the operation data (addend) and the multiplication result of the multiplier 125. Output to the digit shifter 127.
 また、指数符号演算部126は、演算命令が浮動小数点の積和演算である場合、桁落ち量の予測結果の入力を桁落量予測部131から受ける。そして、指数符号演算部126は、取得した桁落ち量から、演算結果の仮数の正規化に用いる左シフト量を算出する。仮数の正規化とは、仮数の最上位桁が1になるように仮数全体のビット位置を調整(シフト)させることである。その後、指数符号演算部126は、算出した左シフト量をマルチプレクサ133へ出力する。 Further, the exponent sign operation unit 126 receives an input of the result of predicting the amount of precision loss from the precision loss unit 131 when the arithmetic instruction is a floating-point multiply-add operation. Then, the exponent code calculation unit 126 calculates the left shift amount used for normalizing the mantissa of the calculation result from the obtained precision loss amount. The normalization of the mantissa is to adjust (shift) the bit positions of the entire mantissa so that the most significant digit of the mantissa is 1. After that, the exponent code calculator 126 outputs the calculated left shift amount to the multiplexer 133.
 桁落量予測部131が求める桁落ち量は、回路構成により、本当の桁落ち量から、決まった範囲内の誤差を含む場合がある。正規化シフタ134で、指定された量のシフトをおこなった結果が、仮数の正規化が正しくなされているか否かを見ることで、予測した桁落ち量の誤差の有無を知ることができる。正規化シフタ134は、桁落ち量に誤差があったと分かった場合、調整のための追加のシフトをおこなうとともに、指数符号演算部に、予測に誤差があったことを通知する。指数符号演算部126は、正規化シフタ134から誤差あり通知を受けた場合、誤差調整のための追加シフトを考慮に入れた指数を算出する。これに対して、正規化シフタ134から誤差あり通知を受けない場合、指数符号演算部126は、追加シフトのない場合の指数を算出する。また、指数符号演算部126は、仮数の丸めで桁上がりが発生した場合、丸め回路135から通知を受け、指数を調整する。指数符号演算部126は、最終的に求まった符号と指数を出力し、これを後述する丸め回路135から出力される仮数の丸め結果と連接することで、浮動小数点演算結果が完成する。演算結果は、ベクタレジスタ111に出力される。 The digit loss amount obtained by the digit loss amount prediction unit 131 may include an error within a predetermined range from the true digit loss amount depending on the circuit configuration. Whether or not there is an error in the predicted digit cancellation amount can be known by checking whether or not the result of shifting the designated amount by the normalization shifter 134 is that the mantissa is normalized correctly. When it is found that there is an error in the digit cancellation amount, the normalization shifter 134 performs an additional shift for adjustment and notifies the exponent code operation unit that there is an error in the prediction. When the exponent code calculation unit 126 receives the error notification from the normalization shifter 134, the exponent code calculation unit 126 calculates the exponent in consideration of the additional shift for error adjustment. On the other hand, when the normalization shifter 134 does not receive the error notification, the exponent code calculator 126 calculates the exponent when there is no additional shift. In addition, the exponent sign operation unit 126 receives a notification from the rounding circuit 135 and adjusts the exponent when a carry occurs due to rounding of the mantissa. The exponent sign operation unit 126 outputs the finally obtained sign and exponent, and concatenates this with the rounding result of the mantissa output from the rounding circuit 135 described later, thereby completing the floating-point operation result. The calculation result is output to the vector register 111.
 桁合シフタ127は、指数符号演算部126から指定されたシフト量だけ、指数符号演算部126から入力された演算データをシフトさせる。そして、桁合わせを行った演算データをマルチプレクサ128へ出力する。 The digit shifter 127 shifts the operation data input from the exponent code operation unit 126 by the shift amount specified by the exponent code operation unit 126. Then, the arithmetic data having undergone digit alignment is output to the multiplexer 128.
 マルチプレクサ128は、浮動小数点の実演算の場合、桁合シフタ127からの入力を選択する。固定小数点の累積演算で、累積途中の場合、加算対象の数は桁合わせの必要はないので、固定小数点レジスタ130からの入力を選択する。固定小数点の累積演算で、累積の初回の場合は、ベクタレジスタ111から積和演算器112に入力された数を加数として選択するため、桁合シフタ127からの入力を選択する。そして、マルチプレクサ128は、桁合シフタ127からの入力された演算データを加算器129へ出力する。 The multiplexer 128 selects the input from the digit shifter 127 in the case of a floating point actual operation. In the fixed-point cumulative calculation, when the accumulation is in progress, it is not necessary to align the numbers to be added, so the input from the fixed-point register 130 is selected. In the fixed-point cumulative calculation, in the case of the first cumulative calculation, the number input from the vector register 111 to the product-sum calculation unit 112 is selected as an addend, and therefore the input from the digit shifter 127 is selected. Then, the multiplexer 128 outputs the operation data input from the digit shifter 127 to the adder 129.
 加算器129は、2つの演算データの乗算結果の入力を乗算器125から受ける。また、加算器129は、桁合わせされた残りの演算データ(加数)の入力をマルチプレクサ128から受ける。そして、加算器129は、2つの演算データの乗算結果と桁合わせされた残りの演算データ(加数)を加算する。そして、加算器129は、加算結果を正規化シフタ134及び桁落量予測部131へ出力する。ここで、実際には加算器129は加算結果信号及び桁上げ信号の2つの数を用いて、桁上げ保存加算と桁上げ伝搬加算という2段階の加算を行うが、本実施例では、この2段階の加算をまとめたものを単に加算とよぶ。そして、実際には、加算器129は、桁上げ保存加算の結果を桁落量予測部131へ出力する。 The adder 129 receives an input of the multiplication result of two pieces of operation data from the multiplier 125. The adder 129 also receives from the multiplexer 128 the input of the remaining digitized digitized operation data (addend). Then, the adder 129 adds the multiplication result of the two calculation data and the remaining calculation data (addend) aligned with the digit. Then, the adder 129 outputs the addition result to the normalization shifter 134 and the precision loss prediction unit 131. Here, in practice, the adder 129 performs two-stage addition, that is, carry save addition and carry propagation addition, using two numbers of the addition result signal and the carry signal. The sum of the stages of addition is simply called addition. Then, actually, the adder 129 outputs the result of the carry save addition to the carry loss amount prediction unit 131.
 桁落量予測部131は、加算途中結果の入力を加算器129から受ける。そして、桁落量予測部131は、取得した加算途中結果から桁落ち量を予測する。その後、桁落量予測部131は、予測した桁落ち量を指数符号演算部126へ出力する。 The carry loss prediction unit 131 receives the input of the intermediate addition result from the adder 129. Then, the digit loss amount prediction unit 131 predicts the digit loss amount from the acquired addition result. After that, the digit loss amount prediction unit 131 outputs the predicted digit loss amount to the exponent code calculation unit 126.
 マルチプレクサ133は、実演算の場合、指数符号演算部126からの入力を選択する。そして、マルチプレクサ133は、指数符号演算部126から入力された左シフト量を正規化シフタ134へ出力する。 The multiplexer 133 selects the input from the exponent code calculator 126 in the case of actual calculation. Then, the multiplexer 133 outputs the left shift amount input from the exponent code calculation unit 126 to the normalization shifter 134.
 正規化シフタ134は、加算結果の入力を加算器129から受ける。また、正規化シフタ134は、左シフト量の入力をマルチプレクサ133から受ける。そして、正規化シフタ134は、入力された左シフト量に合わせて加算結果を左シフトさせて出力される位置を合わせる。その後、正規化シフタ134は、左シフトさせた演算結果を丸め回路135へ出力する。ここで、左シフトによる位置合わせで桁落ち量の予測に誤差があったと判明した場合、正規化シフタ134は、誤差調整のための追加シフトを行うとともに、誤差あり通知を指数符号演算部126に送信する。 The normalization shifter 134 receives the addition result input from the adder 129. Further, the normalization shifter 134 receives the input of the left shift amount from the multiplexer 133. Then, the normalization shifter 134 shifts the addition result to the left according to the input left shift amount and adjusts the output position. After that, the normalization shifter 134 outputs the left-shifted operation result to the rounding circuit 135. Here, when it is determined that there is an error in the prediction of the digit cancellation amount due to the alignment by the left shift, the normalization shifter 134 performs an additional shift for error adjustment and notifies the exponent code operation unit 126 of the error notification. Send.
 丸め回路135は、演算結果の入力を正規化シフタ134から受ける。そして、丸め回路135は、所定の桁数における丸めを実行する。その後、丸め回路135は、所定の桁で丸められた演算結果をベクタレジスタ111へ出力する。 The rounding circuit 135 receives the input of the calculation result from the normalization shifter 134. Then, the rounding circuit 135 executes rounding with a predetermined number of digits. After that, the rounding circuit 135 outputs the calculation result rounded to a predetermined digit to the vector register 111.
 次に、確率的丸め処理における各部の動作を説明する。確率的丸め処理の場合、乱数生成回路121が、nビットの一様乱数を生成する。そして、乱数生成回路121は、生成した乱数をマルチプレクサ123へ出力する。ここで、乱数生成回路121により生成される乱数は、真に一様な乱数でなくても、実用に耐え得る範囲の疑似乱数でよい。 Next, the operation of each part in the probabilistic rounding process will be explained. In the case of probabilistic rounding processing, the random number generation circuit 121 generates an n-bit uniform random number. Then, the random number generation circuit 121 outputs the generated random number to the multiplexer 123. Here, the random number generated by the random number generation circuit 121 does not have to be a truly uniform random number, but may be a pseudo random number within a practically usable range.
 乱数生成回路121は、例えば、積和演算器112内に配置されたLFSR(Linear Feedback Shift Register)を利用する。例えば、LFSRを用いる場合、本実施例では、確率的丸め命令を実行する毎にLFSRの内部状態が更新され、次の確率的丸め命令の発行時には、LFSRは新しい乱数値を出力する。 The random number generation circuit 121 uses, for example, an LFSR (Linear Feedback Shift Register) arranged in the product-sum calculator 112. For example, in the case of using the LFSR, in the present embodiment, the internal state of the LFSR is updated every time the probabilistic rounding instruction is executed, and the LFSR outputs a new random number value when the next probabilistic rounding instruction is issued.
 ただし、乱数生成回路121は、LFSRに限らず、よりランダム性の高い疑似乱数発生回路でもよいし、環境の揺らぎからランダムビットを取得する回路でもよい。また、乱数生成回路121は、各積和演算器112に付随する回路であってもよいし、複数の積和演算器112で共有する回路であってもよい。この乱数生成回路121が、「乱数生成部」の一例にあたる。 However, the random number generation circuit 121 is not limited to the LFSR, and may be a pseudo random number generation circuit having higher randomness, or a circuit that acquires a random bit from the fluctuation of the environment. The random number generation circuit 121 may be a circuit attached to each product-sum calculation unit 112 or a circuit shared by a plurality of product-sum calculation units 112. The random number generation circuit 121 corresponds to an example of “random number generation unit”.
 ここで、乱数のビット数の選び方について説明する。確率的に丸めを行う場合、最終的に確率的丸め対象数において丸め位置より下位にあるビットの値が捨てられる。例えば、乱数のビット数をnとし、捨てられるビット数をmとした場合、乱数が一様であれば、nがmと同じもしくはそれ以上であれば、確率的丸め後の期待値を確率的丸め対象数と等しい値にすることができる。しかし、mが多い場合でも同様にn≧mの関係を維持しようとすると、演算するビット数が多くなり、それに応じて桁合わせや加算を行う回路が大きくなる。あまり回路が大きくなると、既存の積和演算器112のハードウェアでは、使用するビット数の演算回路が収納が困難となる。その場合、不足する部分について、確率的丸めのために演算回路を付加するなどの対処は回路量が増加するため好ましくない。 Here, I will explain how to select the number of bits of the random number. When performing rounding stochastically, the value of the bit lower than the rounding position in the number of stochastic rounding targets is finally discarded. For example, if the number of bits of the random number is n and the number of discarded bits is m, and if the random number is uniform and n is equal to or greater than m, then the expected value after the probabilistic rounding is stochastic. It can be set to a value equal to the number of rounding targets. However, if the relationship of n ≧ m is to be maintained even when there are many m, the number of bits to be calculated increases, and the circuit for digit alignment and addition correspondingly increases. If the circuit becomes too large, it becomes difficult for the existing hardware of the product-sum calculation unit 112 to accommodate the calculation circuit having the number of bits to be used. In that case, it is not preferable to add an arithmetic circuit for probabilistic rounding of the lacking portion, because the circuit amount increases.
 そこで、n≧mの関係を維持せず、nの方がmより小さくてもよいとすれば、既存の積和演算器112に収納できるだけのビット数にnをとどめておく方法が考えられる。その場合、捨てられるmビットのうち、丸め位置から下位にnビットよりさらに下位のビットについては、乱数との加算が行われず、丸め後の結果に寄与しないことになる。丸め後の期待値は、その分丸め前の値から乖離する。しかし、nを1増やすごとに期待値のずれは半分程度に縮小するため、ある程度以上にnを大きくすれば、mが大きいことによるずれは十分小さくなり、実用上は問題にならない。そのため、nは実用上の要件と既存の演算回路量に応じて妥当な値が決定されることが好ましい。 Therefore, if the relationship of n ≧ m is not maintained and n may be smaller than m, a method of keeping n in the number of bits that can be stored in the existing product-sum calculator 112 can be considered. In that case, of the m bits to be discarded, the bits lower than the n-th bit lower than the rounding position are not added to the random number and do not contribute to the result after rounding. The expected value after rounding deviates from the value before rounding by that amount. However, the deviation of the expected value is reduced to about half each time n is increased by 1. Therefore, if n is increased to a certain degree or more, the deviation due to the large value of m becomes sufficiently small, which is not a practical problem. Therefore, it is preferable that an appropriate value be determined for n according to practical requirements and the amount of existing arithmetic circuits.
 確率的丸め処理の場合、マルチプレクサ123は、乱数生成回路121から入力されたnビットの一様乱数を選択する。そして、マルチプレクサ123は、nビットの一様乱数を乗算器125へ出力する。 In the case of probabilistic rounding processing, the multiplexer 123 selects the n-bit uniform random number input from the random number generation circuit 121. Then, the multiplexer 123 outputs the n-bit uniform random number to the multiplier 125.
 冪乗数生成部122は、確率的丸め命令の入力を演算命令制御部101から受ける。同時に、冪乗数生成部122は、確率的丸め命令に含まれるオペランドの小数点位置情報を取得する。そして、冪乗数生成部122は、小数点位置情報を用いて丸めたい位置に応じた2の冪乗数を生成する。ここで、丸めたい位置とは、その桁の1つ上の桁が有効数字の桁となる桁にあたる。その後、冪乗数生成部122は、生成した2の冪乗数をマルチプレクサ124へ出力する。 The power generation unit 122 receives an input of a probabilistic rounding instruction from the arithmetic instruction control unit 101. At the same time, the power generation unit 122 acquires the decimal point position information of the operand included in the probabilistic rounding instruction. Then, the power generation unit 122 uses the decimal point position information to generate a power of 2 according to the position to be rounded. Here, the position to be rounded corresponds to a digit in which the digit immediately above the digit is a significant digit. Then, the power generation unit 122 outputs the generated power of 2 to the multiplexer 124.
 確率的丸め処理の場合、マルチプレクサ124は、冪乗数生成部122から入力された2の冪乗数を選択する。そして、マルチプレクサ124は、2の冪乗数を乗算器125へ出力する。 In the case of probabilistic rounding processing, the multiplexer 124 selects the power of 2 input from the power generation unit 122. Then, the multiplexer 124 outputs the power of 2 to the multiplier 125.
 乗算器125は、nビットの一様乱数の入力をマルチプレクサ123から受ける。また、乗算器125は、丸めたい位置に応じた2の冪乗数の入力をマルチプレクサ124から受ける。そして、乗算器125は、取得した2の冪乗数を取得した乱数に乗算することで、乱数の先頭の桁を確率的丸め対象数における丸めたい位置に合わせる。 The multiplier 125 receives an n-bit uniform random number input from the multiplexer 123. Further, the multiplier 125 receives an input of a power of 2 corresponding to the position to be rounded from the multiplexer 124. Then, the multiplier 125 multiplies the obtained random number by the obtained power of 2 to match the leading digit of the random number with the position to be rounded in the probabilistic rounding target number.
 ここで、図3を参照して、本実施例に係る積和演算器112による確率的丸め処理の計算の概要を説明する。図3は、実施例に係る積和演算器による確率的丸め処理の計算の概要を表す図である。確率的丸め対象数200における位置Pは、丸め位置を表す。また、範囲Lは、演算結果として使用される範囲である。 Here, with reference to FIG. 3, an outline of calculation of the probabilistic rounding processing by the product-sum calculator 112 according to the present embodiment will be described. FIG. 3 is a diagram showing an outline of the calculation of the probabilistic rounding process by the product-sum calculator according to the embodiment. The position P in the probabilistic rounding target number 200 represents a rounding position. The range L is a range used as a calculation result.
 まず、乱数生成回路121により、nビットの一様乱数である乱数201が生成される。そして、乱数201は、乗算器125により、2の冪乗数が乗算され、乱数201の先頭が確率的丸め対象数200の丸め位置Pに一致するように乱数201が左シフトされる。この時、乱数201は丸め位置から下位にnビットの桁を有する乱数となる。 First, the random number generation circuit 121 generates a random number 201 which is an n-bit uniform random number. Then, the random number 201 is multiplied by a power of 2 by the multiplier 125, and the random number 201 is left-shifted so that the head of the random number 201 matches the rounding position P of the probabilistic rounding target number 200. At this time, the random number 201 is a random number having a lower n-bit digit from the rounding position.
 ここで、図4を参照して、乱数の桁合わせシフトをさらに詳細に説明する。図4は、乱数の桁合わせシフトを説明するための図である。ここでは、乱数201が12ビットの乱数で且つ小数点以下3ビットの乱数である場合で説明する。また、固定小数アキュムレータ211(確率的丸め対象数)の小数点位置が最下位ビットの直ぐ右である場合で説明する。 Now, referring to FIG. 4, the digit alignment shift of the random number will be described in more detail. FIG. 4 is a diagram for explaining digit shift of random numbers. Here, a case will be described in which the random number 201 is a 12-bit random number and a decimal point is a 3-bit random number. Further, a case will be described where the decimal point position of the fixed-point decimal accumulator 211 (the probabilistic rounding target number) is immediately to the right of the least significant bit.
 乗算器125は、乱数201に2の冪数である2^Eを乗算することで、乱数をシフトする。ここで、小数点位置情報をQNUMとして表し、E=QNUM-9とする。QNUM=0で、乱数の全桁が、固定小数アキュムレータ211の最下位ビットより下に埋没する。また、QNUM=12の時、乱数の最下位桁が固定小数アキュムレータ211の最下位に合う位置となる。この場合、位置Dが小数点位置を表す。 The multiplier 125 shifts the random number by multiplying the random number 201 by 2 ^ E which is a power of 2. Here, the decimal point position information is represented as QNUM, and E = QNUM-9. When QNUM = 0, all digits of the random number are buried below the least significant bit of the fixed-point decimal accumulator 211. Further, when QNUM = 12, the least significant digit of the random number is located at the least significant digit of the fixed decimal accumulator 211. In this case, the position D represents the decimal point position.
 図4において枠210は、単精度で24ビット×24ビットの乗算を行う乗算器125による乱数201のシフトを表す。図4において、12ビット乱数201Aが、QNUM=0の場合の位置である。ここで、この場合の乗算器125の最下位桁は、16ビットの固定小数で考えると2^-8にあたる。QNUM=0のときはE=-9であり、乗算器125は2^-9を乗算する処理を行うことになるが、乗算器125その桁の回路を有さない。しかし、QNUM=0の場合、乱数全桁が固定小数アキュムレータ211の最下位ビットより下位に埋没し、丸めでは切り捨てとなるため、乱数を足さなくてもよい。そこで、QNUM=0の場合、乗算器125は、乗算する値である2^Eを0とする。 In FIG. 4, a frame 210 represents a shift of the random number 201 by the multiplier 125 that performs a multiplication of 24 bits × 24 bits with single precision. In FIG. 4, the 12-bit random number 201A is the position when QNUM = 0. Here, the least significant digit of the multiplier 125 in this case is 2 ^ −8 when considering a fixed decimal number of 16 bits. When QNUM = 0, E = -9, and the multiplier 125 performs a process of multiplying 2 ^ -9, but the multiplier 125 does not have a circuit of that digit. However, in the case of QNUM = 0, all digits of the random number are buried below the least significant bit of the fixed-point decimal accumulator 211 and rounded down. Therefore, it is not necessary to add the random number. Therefore, when QNUM = 0, the multiplier 125 sets 2 ^ E, which is the value to be multiplied, to 0.
 QNUM=1の場合、E=-8となり、乗算器125は、12ビット乱数201Aに2^-8を乗算して、12ビット乱数201Aをシフトさせて12ビット乱数201Bとする。また、QNUM=8の場合、E=-1となり、乗算器125は、12ビット乱数201に2^-1を乗算して、12ビット乱数201Aをシフトさせて12ビット乱数201Cとする。また、QNUM=24の場合、E=15となり、乗算器125は、12ビット乱数201に2^15を乗算して、12ビット乱数201Aをシフトさせて12ビット乱数201Dとする。 When QNUM = 1, E = -8, and the multiplier 125 multiplies the 12-bit random number 201A by 2 ^ -8 and shifts the 12-bit random number 201A to obtain the 12-bit random number 201B. Further, when QNUM = 8, E = −1, and the multiplier 125 multiplies the 12-bit random number 201 by 2̂−1 and shifts the 12-bit random number 201A to obtain the 12-bit random number 201C. Further, when QNUM = 24, E = 15, and the multiplier 125 multiplies the 12-bit random number 201 by 2 ^ 15 and shifts the 12-bit random number 201A to obtain the 12-bit random number 201D.
 すなわち、固定小数アキュムレータ211の最下位側の16ビットであるデータ213を出力結果とする場合、QNUM=0となる。また、固定小数アキュムレータ211の最上位側の16ビットであるデータ212を出力結果とする場合、QNUM=24となる。すなわち、命令により与えられるQNUMにより、固定小数アキュムレータ211におけるどの位置の桁以上を有効数字とするかが決定される。そして、図4の場合、乗算器125は、0~24の間の値を採るQNUMから求められた2の冪乗数を乱数201に乗算することで、固定小数アキュムレータ211における有効数字の最下位の桁の1つ下の桁に乱数201の先頭の位置を合わせる。 That is, when the output result is the least significant 16-bit data 213 of the fixed-point decimal accumulator 211, QNUM = 0. Further, when the most significant 16-bit data 212 of the fixed-point decimal accumulator 211 is taken as the output result, QNUM = 24. That is, the QNUM given by the instruction determines which digit in the fixed-point decimal accumulator 211 or more positions is to be the significant digit. Then, in the case of FIG. 4, the multiplier 125 multiplies the random number 201 by a power of 2 obtained from QNUM that takes a value between 0 and 24, and thus the lowest significant digit in the fixed-point decimal accumulator 211. The leading position of the random number 201 is aligned with the next lower digit.
 図2に戻って説明を続ける。乗算器125は、乗算結果である有効数字の最下位の桁の1つ下の桁に先頭の桁を合わせた乱数を加算器129へ出力する。この乗算器125が、「乱数移動部」の一例にあたる。 Return to Figure 2 and continue the explanation. The multiplier 125 outputs to the adder 129 a random number in which the leading digit is aligned with the digit one lower than the least significant digit of the significant digit that is the multiplication result. The multiplier 125 corresponds to an example of the “random number moving unit”.
 固定小数点レジスタ130は、積和累積演算で用いられる累積レジスタ(アキュムレータ)である。固定小数点レジスタ130は、確率的丸めの対象となる確率的丸め対象数を格納する。確率的丸め対象数は、実演算において算出された算出結果である。累積演算では、累積レジスタの値に乗算結果を加算していくため、固定小数点レジスタ130は、累積レジスタの値をセットするのに十分なビット幅を有する。そして、確率的丸め処理の場合、固定小数点レジスタ130は、確率的丸め対象数をマルチプレクサ128へ出力する。 The fixed-point register 130 is a cumulative register (accumulator) used in product-sum cumulative calculation. The fixed point register 130 stores the number of probabilistic rounding targets that are targets of probabilistic rounding. The probabilistic rounding target number is a calculation result calculated in actual calculation. Since the multiplication result is added to the value of the accumulation register in the accumulation operation, the fixed-point register 130 has a bit width sufficient to set the value of the accumulation register. Then, in the case of probabilistic rounding processing, the fixed point register 130 outputs the number of probabilistic rounding targets to the multiplexer 128.
 確率的丸め処理の場合、マルチプレクサ128は、固定小数点レジスタ130から入力された確率的丸め対象数を選択する。そして、マルチプレクサ128は、確率的丸め対象数を加算器129へ出力する。 In the case of probabilistic rounding processing, the multiplexer 128 selects the number of probabilistic rounding targets input from the fixed-point register 130. Then, the multiplexer 128 outputs the probabilistic rounding target number to the adder 129.
 加算器129は、有効数字の最下位の桁の1つ下の桁に先頭の桁を合わせた乱数の入力を乗算器125から受ける。また、加算器129は、確率的丸め対象数の入力をマルチプレクサ128から受ける。そして、加算器129は、確率的丸め対象数に乱数を加算する。これにより、確率的丸め対象数のうち乱数の先頭より下の桁の数に応じて確率的に繰り上げが行われる。すなわち、加算器129は、確率的丸め対象数に乱数を加算することで確率的丸めを行う。その後、加算器129は、加算結果を正規化シフタ134へ出力する。この加算器129が、「加算部」の一例にあたる。 The adder 129 receives from the multiplier 125 an input of a random number in which the leading digit is aligned with the digit one lower than the least significant digit. Further, the adder 129 receives the input of the number of probabilistic rounding targets from the multiplexer 128. Then, the adder 129 adds a random number to the number of probabilistic rounding targets. As a result, probabilistic rounding is performed according to the number of digits below the beginning of the random number in the number of probabilistic rounding targets. That is, the adder 129 performs probabilistic rounding by adding a random number to the number of probabilistic rounding targets. After that, the adder 129 outputs the addition result to the normalization shifter 134. The adder 129 corresponds to an example of “adding unit”.
 ここで、図3を参照して、加算器129による確率的丸めの処理をさらに説明する。確率的丸め対象数200は、固定小数点レジスタ130から提供される。そして、範囲Lが丸め処理の結果として使用したい数値の範囲を表す。加算器129は、乗算器125により先頭位置が丸め位置であるPにシフトされた乱数201を確率的丸め対象数200に加算する。ここで、確率的丸め対象数200の最下位桁が乱数201の最下位桁よりも高い場合、加算器129は、確率的丸め対象数200に0の値のビットを付加して最下位桁を一致させた上で加算を行う。この加算により、丸め位置P以下の値に応じて確率的に桁上がりM1が発生した加算値202が得られる。 Here, the probabilistic rounding processing by the adder 129 will be further described with reference to FIG. The probabilistic rounding target number 200 is provided from the fixed-point register 130. Then, the range L represents a range of numerical values which is desired to be used as a result of the rounding process. The adder 129 adds the random number 201 whose start position has been shifted to the rounding position P by the multiplier 125 to the probabilistic rounding target number 200. Here, when the least significant digit of the stochastic rounding target number 200 is higher than the least significant digit of the random number 201, the adder 129 adds a bit having a value of 0 to the stochastic rounding target number 200 to set the least significant digit. Match and then add. By this addition, the added value 202 in which the carry M1 is stochastically generated according to the value equal to or smaller than the rounding position P is obtained.
 ここで、桁上がりはおおよそ下の値に応じて確率的に発生する。これは、前述の通り乱数のビット数nの選び方によっては、確率的丸め結果の期待値が、丸め前の値からずれることがあり、そのずれ量はnの値により異なる。本実施例では、既存の単精度浮動小数点用の積和演算器112に収納可能な乗加算ビット数と、深層学習の収束シミュレーションに基づいた実用性の判断によりn=12とした。この場合、丸め後の期待値のずれは、最大で0.00025程度である。ただし、乱数のビット数はこれに限らず、演算に期待される要件と、既存回路や許容できる追加回路量のバランスに応じて、1以上の任意の妥当なビット数が選択されることが好ましい。 Here, carry occurs stochastically according to the lower value. This is because the expected value of the probabilistic rounding result may deviate from the value before rounding depending on how to select the bit number n of the random number as described above, and the amount of deviation differs depending on the value of n. In this embodiment, n = 12 is set based on the number of multiplication / addition bits that can be stored in the existing product accumulator 112 for single precision floating point and the practicality judgment based on the convergence simulation of deep learning. In this case, the maximum deviation of the expected value after rounding is about 0.00025. However, the number of bits of the random number is not limited to this, and it is preferable that an appropriate number of bits of 1 or more is selected according to the requirements expected for the operation and the balance between the existing circuit and the allowable additional circuit amount. .
 図2に戻って説明を続ける。シフト量算出部132は、小数点位置情報の入力を演算命令制御部101から受ける。そして、シフト量算出部132は、小数点位置情報に応じたシフト量を算出する。具体的には、シフト量算出部132は、加算器129から出力されるデータのうち有効とするデータが配置されている位置から、正規化シフタの出力における有効データの位置への移動に用いるシフト量を、QNUM算出する。その後、シフト量算出部132は、算出したシフト量をマルチプレクサ133へ出力する。 Return to Figure 2 and continue the explanation. The shift amount calculation unit 132 receives input of decimal point position information from the arithmetic instruction control unit 101. Then, the shift amount calculation unit 132 calculates the shift amount according to the decimal point position information. Specifically, the shift amount calculation unit 132 uses the shift used to move from the position where valid data is placed in the data output from the adder 129 to the valid data position in the output of the normalization shifter. QNUM is calculated. After that, the shift amount calculation unit 132 outputs the calculated shift amount to the multiplexer 133.
 確率的丸め処理の場合、マルチプレクサ133は、シフト量算出部132から入力されたシフト量を選択する。そして、マルチプレクサ133は、シフト量を正規化シフタ134へ出力する。 In the case of probabilistic rounding processing, the multiplexer 133 selects the shift amount input from the shift amount calculation unit 132. Then, the multiplexer 133 outputs the shift amount to the normalization shifter 134.
 正規化シフタ134は、確率的丸め処理が施された確率的丸め対象数の入力を加算器129から受ける。また、正規化シフタ134は、シフト量の入力をマルチプレクサ133から受ける。そして、正規化シフタ134は、確率的丸め対象数をシフト量に応じてシフトさせる。具体的には、正規化シフタ134は、入力された対象数のうち丸め位置のすぐ上の桁がちょうど積和演算器112から出力される有効な数の最下位桁に移動されるように左シフトをおこなう。正規化シフタ134は、左シフトを施した確率的丸め対象数を丸め回路135へ出力する。この正規化シフタ134が、「移動部」の一例にあたる。 The normalization shifter 134 receives from the adder 129 the input of the number of probabilistic rounding targets subjected to the probabilistic rounding process. The normalization shifter 134 also receives an input of the shift amount from the multiplexer 133. Then, the normalization shifter 134 shifts the number of probabilistic rounding targets according to the shift amount. Specifically, the normalization shifter 134 shifts the digit right above the rounding position of the input target number to the leftmost digit of the valid number output from the product-sum calculator 112. Make a shift. The normalization shifter 134 outputs the left-shifted probabilistic rounding target number to the rounding circuit 135. The normalization shifter 134 is an example of the “moving unit”.
 例えば、図3において、正規化シフタ134は、加算値202において演算結果として使用する範囲Lの最下位桁が出力されるデータの最下位桁に一致するように加算値202をシフトさせてシフト値203とする。 For example, in FIG. 3, the normalization shifter 134 shifts the added value 202 so that the least significant digit of the range L used as the calculation result in the added value 202 matches the least significant digit of the output data, and shifts the shift value. 203.
 ここで、図5を参照して正規化シフタ134による加算値の位置合わせについて説明する。図5は、正規化シフタによる加算値の位置合わせを説明するための図である。 Alignment of the added value by the normalization shifter 134 will be described with reference to FIG. FIG. 5 is a diagram for explaining alignment of added values by the normalization shifter.
 図5では、以下の条件の場合を例に記載した。加算器129から、40ビットの加算値202が出力される。そして、加算器129から出力された値は、正規化シフタ134の出力前の中間バスのビット16からビット55までの位置にあたる。さらに、値の内16ビットのデータが出力される。そして、正規化シフタ134からデータが出力される演算結果バスのビット48からビット63までがデータとして出力される。 In Fig. 5, the case of the following conditions is described as an example. The 40-bit addition value 202 is output from the adder 129. The value output from the adder 129 corresponds to the position from bit 16 to bit 55 of the intermediate bus before the output of the normalization shifter 134. Further, 16-bit data of the value is output. Then, bits 48 to 63 of the operation result bus from which data is output from the normalization shifter 134 are output as data.
 正規化シフタ134は、加算値202を左シフトする。これにより、正規化シフタ134は、加算値202の使用される16ビットを演算結果バスのビット48からビット63にあたる出力位置214に移動させる。 The normalization shifter 134 shifts the added value 202 to the left. As a result, the normalization shifter 134 moves the used 16 bits of the addition value 202 from the bit 48 of the operation result bus to the output position 214 corresponding to the bit 63.
 例えば、図5では、小数点位置情報を表す値をQNUMとしてQNUM=0の場合に、固定小数アキュムレータ211の最下位側の16ビットのデータ213が出力対象であるとする。この場合、QNUM=0であれば、正規化シフタ134は、中間結果バスのビット16からビット32までのデータ213を演算結果バスのビット48からビット63の出力位置214に移動させる。この場合、左シフト量は32である。また、QNUM=24の場合に、固定小数アキュムレータ211の最上位側の16ビットのデータ212が出力対象であるとする。この場合、QNUM=24であれば、正規化シフタ134は、中間結果バスのビット40からビット55までのデータ212を演算結果バスのビット48からビット63の出力位置214に移動させる。この場合、左シフト量は8である。すなわち、図5の例では、シフト量算出部132により「32-QNUM」として求められた左シフト量を用いて、正規化シフタ134は、左シフトを実行する。 For example, in FIG. 5, when the value indicating the decimal point position information is QNUM and QNUM = 0, it is assumed that the lowest 16-bit data 213 of the fixed-point decimal accumulator 211 is the output target. In this case, if QNUM = 0, the normalization shifter 134 moves the data 213 from bit 16 to bit 32 of the intermediate result bus to the output position 214 of bit 48 to bit 63 of the operation result bus. In this case, the left shift amount is 32. Further, when QNUM = 24, it is assumed that the uppermost 16-bit data 212 of the fixed-point decimal accumulator 211 is an output target. In this case, if QNUM = 24, the normalization shifter 134 moves the data 212 from bit 40 to bit 55 of the intermediate result bus to the output position 214 of bit 48 to bit 63 of the operation result bus. In this case, the left shift amount is 8. That is, in the example of FIG. 5, the normalization shifter 134 performs the left shift by using the left shift amount obtained as “32-QNUM” by the shift amount calculation unit 132.
 図2に戻って説明を続ける。丸め回路135は、左シフトが施された確率的丸め対象数の入力を正規化シフタ134から受ける。そして、確率的丸め処理の場合、丸め回路135は、入力された確率的丸め対象数の所定桁より下の桁を打ち切る。そして、丸め回路135は、所定桁より下の桁を打ち切った確率的丸め対象数を出力する。丸め回路135からの出力が出力データとしてベクタレジスタ111に送られる。呼の丸め回路135が、「出力部」の一例にあたる。 Return to Figure 2 and continue the explanation. The rounding circuit 135 receives, from the normalization shifter 134, an input of the left-shifted stochastic rounding target number. Then, in the case of the probabilistic rounding processing, the rounding circuit 135 cuts off the digits below the predetermined digit of the input number of stochastic rounding targets. Then, the rounding circuit 135 outputs the probabilistic rounding target number in which digits below the predetermined digit are truncated. The output from the rounding circuit 135 is sent to the vector register 111 as output data. The call rounding circuit 135 is an example of the “output unit”.
 例えば、図3において、丸め回路135は、シフト値203のうち使用する範囲Lよりも下位の桁に対して打ち切りM2を行う。そして、丸め回路135から出力されたデータの、使用する範囲Lよりも上位の桁である範囲204は、出力データ205には含まれず破棄される。そして、出力データ205が、ベクタレジスタ111に送られる。 For example, in FIG. 3, the rounding circuit 135 performs the cutoff M2 on the lower digit of the range L used in the shift value 203. Then, the range 204, which is the higher digit of the range L to be used, of the data output from the rounding circuit 135 is not included in the output data 205 and is discarded. Then, the output data 205 is sent to the vector register 111.
 ここで、丸め回路135は、通常の浮動小数点演算の回路である。浮動小数点演算の通常の丸めでは、下位桁から得られる丸めビットやstickyビット、丸め最下位桁の値、演算結果の正負及び指定された丸めモードに基づき、丸め回路135は、入力値に対して以下の2処理のいずれかを行う。第1の処理は、丸め回路135は、入力値の丸め位置より下を打ち切った値をそのまま出力する処理である。第2の処理は、丸め回路135は、入力値の丸め位置より下を打ち切った値に1を加算して出力する処理である。確率的丸め処理を実行する場合、丸め回路135は、上述した第1の処理を常に選択するように論理が指定される。 Here, the rounding circuit 135 is an ordinary floating point arithmetic circuit. In normal rounding of floating-point arithmetic, the rounding circuit 135 determines the rounding bit or sticky bit obtained from the lower digit, the value of the least significant digit of the round, the positive / negative of the operation result, and the specified rounding mode, and the rounding circuit 135 One of the following two processes is performed. The first process is a process in which the rounding circuit 135 outputs a value obtained by cutting the input value below the rounding position as it is. The second process is a process in which the rounding circuit 135 adds 1 to a value obtained by cutting the input value below the rounding position and outputs the value. When the probabilistic rounding process is executed, the rounding circuit 135 is designated to always select the above-described first process.
 次に、図6を参照して、具体的なデータを用いて確率的丸め処理について説明する。図6は、確率的丸め処理についての具体例を表す図である。図6では、64ビットの幅のバスを用いて処理を行う場合で説明する。また、各データの位置を[x:y]と表し、xが最上位ビット、yが最下位ビットを表すものとして説明する。 Next, the probabilistic rounding process will be described using specific data with reference to FIG. FIG. 6 is a diagram illustrating a specific example of the probabilistic rounding process. In FIG. 6, a case where processing is performed using a 64-bit wide bus will be described. Further, the position of each data will be described as [x: y], where x represents the most significant bit and y represents the least significant bit.
 この場合、演算命令制御部101は、小数点位置情報としてQNUM=16を出力する。固定小数点レジスタ130から出力された確率的丸め対象数300は、[55:16]に位置する。ここで、ビット15以下が、小数点以下にあたる。そして、確率的丸め対象数300のうちビット31以下は丸められ、[47:32]が使用される範囲である。 In this case, the arithmetic instruction control unit 101 outputs QNUM = 16 as the decimal point position information. The stochastic rounding target number 300 output from the fixed-point register 130 is located at [55:16]. Here, bits 15 and below correspond to decimals and below. Then, bits 31 and below of the probabilistic rounding target number 300 are rounded, and [47:32] is a range to be used.
 乱数生成回路121により生成された乱数301は、論理的には、小数点以下がビット15以下となるように配置され、破線で囲われた場所に位置する。実際には、乱数301は、QNUM=0の場合の位置に回路上の初期位置として配置される。QNUM=0の場合の位置は、具体的には、確率的丸め対象数300の最下位ビットの次のビットに乱数301の先頭のビットが来る位置である。そして、冪乗数生成部122は、QNUM=16に応じた2の冪乗数として2^7を求める。乗算器125は、乱数に2^7を乗算して乱数301を[31:20]の位置にシフトさせる。 The logical random number 301 generated by the random number generation circuit 121 is logically arranged such that the decimal point is 15 or less, and is located in the place surrounded by the broken line. In reality, the random number 301 is arranged as the initial position on the circuit at the position when QNUM = 0. Specifically, the position in the case of QNUM = 0 is the position where the leading bit of the random number 301 comes to the bit next to the least significant bit of the stochastic rounding target number 300. Then, the power generation unit 122 obtains 2 ^ 7 as a power of 2 according to QNUM = 16. The multiplier 125 multiplies the random number by 2 ^ 7 to shift the random number 301 to the position [31:20].
 加算器129は、[31:20]に位置する乱数301に確率的丸め対象数300と最下位ビットが一致するように0の値のビットを付加して、確率的丸め対象数300に加算し、加算値302を算出する。加算値302は、[55:16]に位置する。ここで、出力範囲303は[63:48]である。 The adder 129 adds a bit having a value of 0 to the random number 301 located at [31:20] so that the least significant bit coincides with the stochastic rounding target number 300, and adds it to the stochastic rounding target number 300. , The addition value 302 is calculated. The added value 302 is located at [55:16]. Here, the output range 303 is [63:48].
 そこで、正規化シフタ134は、確率的丸め対象数300における使用される範囲[47:32]のデータが[63:48]に位置するように確率的丸め対象数300をシフトさせる。  Therefore, the normalization shifter 134 shifts the stochastic rounding target number 300 so that the data of the range [47:32] used in the stochastic rounding target number 300 is located at [63:48]. ‥
 その後、丸め回路135により、シフト後の確率的丸め対象数300におけるビット49以下が打ち切られ破棄される。さらに、シフト後の確率的丸め対象数300におけるビット64以上が出力されず破棄される。これにより、残った16ビットの出力データ304が演算結果として出力される。 After that, the rounding circuit 135 aborts and discards bits 49 or less in the number 300 of stochastic rounding targets after the shift. Furthermore, 64 or more bits in the number 300 of stochastic rounding targets after shifting are not output and are discarded. As a result, the remaining 16-bit output data 304 is output as the calculation result.
 次に、図7を参照して、積和演算部100が実行する処理の全体的な流れを説明する。図7は、積和演算部が実行する処理全体のフローチャートである。 Next, with reference to FIG. 7, an overall flow of processing executed by the product-sum calculation unit 100 will be described. FIG. 7 is a flowchart of the entire processing executed by the product-sum calculation unit.
 積和演算部100は、積和演算器112を用いて実演算における積和演算を実行する(ステップS1)。 The product-sum calculation unit 100 executes the product-sum calculation in the actual calculation using the product-sum calculation unit 112 (step S1).
 そして、積和演算部100は、積和累積演算が完了したか否かを判定する(ステップS2)。積和累積演算が完了していない場合(ステップS2:否定)、積和演算部100は、ステップS1へ戻り積和演算を繰返す。 Then, the product-sum calculation unit 100 determines whether the product-sum accumulation calculation is completed (step S2). When the product-sum accumulation operation is not completed (step S2: No), the product-sum operation unit 100 returns to step S1 and repeats the product-sum operation.
 これに対して、積和累積演算が完了した場合(ステップS2:肯定)、積和演算部100は、積和演算器112を用いて確率的丸め処理を実行する(ステップS3)。 On the other hand, when the product-sum accumulation operation is completed (step S2: Yes), the product-sum operation unit 100 uses the product-sum operation unit 112 to perform the probabilistic rounding process (step S3).
 その後、メモリコントローラ12からの指示にしたがい、ベクタレジスタ111に格納された演算結果は、処理ユニットのチェーンとメモリコントローラ12とを経由してメモリ13に出力される(ステップS4)。 Thereafter, according to the instruction from the memory controller 12, the calculation result stored in the vector register 111 is output to the memory 13 via the chain of processing units and the memory controller 12 (step S4).
 次に、図8を参照して、本実施例に係る積和演算器による確率的丸め処理の流れを説明する。図8は、実施例に係る積和演算器112による確率的丸め処理のフローチャートである。図8のフローチャートで示した処理は、図7のステップS3で行われる処理の一例にあたる。 Next, with reference to FIG. 8, a flow of the probabilistic rounding process by the product-sum calculator according to the present embodiment will be described. FIG. 8 is a flowchart of the probabilistic rounding process by the product-sum calculator 112 according to the embodiment. The process shown in the flowchart of FIG. 8 is an example of the process performed in step S3 of FIG.
 乱数生成回路121は、n桁の一様な乱数を取得する(ステップS101)。そして、乱数生成回路121は、生成した乱数をマルチプレクサ123を介して乗算器125へ出力する。 The random number generation circuit 121 acquires an n-digit uniform random number (step S101). Then, the random number generation circuit 121 outputs the generated random number to the multiplier 125 via the multiplexer 123.
 また、冪乗数生成部122は、QNUMに応じた2の冪乗数を生成する(ステップS102)。そして、乱数生成回路121は、生成した2の冪乗数をマルチプレクサ124を介して乗算器125へ出力する。 Further, the power generation unit 122 generates a power of 2 according to QNUM (step S102). Then, the random number generation circuit 121 outputs the generated power of 2 to the multiplier 125 via the multiplexer 124.
 乗算器125は、乱数の入力を乱数生成回路121から受ける。また、乗算器125は、QNUMに応じた2の冪乗数の入力をマルチプレクサ124から受ける。そして、乗算器125は、乱数と2の冪乗数とを乗算して乱数の先頭が丸め位置に位置するように乱数をシフトさせる(ステップS103)。その後、乗算器125は、乗算結果を加算器129へ出力する。 The multiplier 125 receives a random number input from the random number generation circuit 121. The multiplier 125 also receives from the multiplexer 124 an input of a power of 2 according to QNUM. Then, the multiplier 125 multiplies the random number by a power of 2, and shifts the random number so that the beginning of the random number is located at the rounding position (step S103). After that, the multiplier 125 outputs the multiplication result to the adder 129.
 加算器129は、マルチプレクサ128を介して、固定小数点レジスタ130に格納された確率的丸め対象数を取得する(ステップS104)。 The adder 129 acquires the number of probabilistic rounding targets stored in the fixed point register 130 via the multiplexer 128 (step S104).
 また、加算器129は、乗算結果の入力を乗算器125から受ける。そして、加算器129は、確率的丸め対象数に乗算器125による乗算結果を加算して確率的丸めを実行する(ステップS105)。その後、加算器129は、確率的丸めの処理を施した確率的丸め対象数を表す加算結果を正規化シフタ134へ出力する。 Further, the adder 129 receives the input of the multiplication result from the multiplier 125. Then, the adder 129 executes the stochastic rounding by adding the multiplication result of the multiplier 125 to the number of probabilistic rounding targets (step S105). Then, the adder 129 outputs the addition result representing the number of probabilistic rounding targets subjected to the probabilistic rounding process to the normalization shifter 134.
 シフト量算出部132は、演算命令制御部101から取得した小数点位置情報に応じたシフト量を算出する(ステップS106)。そして、シフト量算出部132は、算出したシフト量をマルチプレクサ133を介して正規化シフタ134へ出力する。 The shift amount calculation unit 132 calculates the shift amount according to the decimal point position information acquired from the arithmetic instruction control unit 101 (step S106). Then, the shift amount calculation unit 132 outputs the calculated shift amount to the normalization shifter 134 via the multiplexer 133.
 正規化シフタ134は、確率的丸めの処理を施した確率的丸め対象数を表す加算結果の入力を加算器129から受ける。また、正規化シフタ134は、シフト量の入力をシフト量算出部132から受ける。そして、加算結果をシフト量分左シフトする(ステップS107)。そして、正規化シフタ134は、左シフトした値を丸め回路135へ出力する。 The normalization shifter 134 receives, from the adder 129, an input of the addition result representing the number of probabilistic rounding targets subjected to the probabilistic rounding process. Further, the normalization shifter 134 receives the input of the shift amount from the shift amount calculation unit 132. Then, the addition result is shifted to the left by the shift amount (step S107). Then, the normalization shifter 134 outputs the left-shifted value to the rounding circuit 135.
 丸め回路135は、左シフトされた値の入力を正規化シフタ134から受ける。そして、丸め回路135は、左シフトされた値の所定桁以下を打ち切り出力範囲以下のビットを破棄する(ステップS108)。 The rounding circuit 135 receives the input of the left-shifted value from the normalization shifter 134. Then, the rounding circuit 135 cuts off a predetermined digit or less of the left-shifted value and discards bits below the output range (step S108).
 その後、丸め回路135から、下位から所定のビット数が結果として出力される(ステップS109)。 After that, the rounding circuit 135 outputs a predetermined number of bits from the lower order as a result (step S109).
 本実施例に係る積和演算器112は、浮動小数点の演算を行う回路に、乱数生成回路121、冪乗数生成回路122、シフト量算出部132、並びに、マルチプレクサ123、124及び133を追加することで、固定小数点を用いた確率的丸め処理を実行する。 The product-sum calculator 112 according to the present embodiment has a random number generation circuit 121, a power multiplier generation circuit 122, a shift amount calculation unit 132, and multiplexers 123, 124, and 133 added to a circuit that performs floating-point calculation. Then, the probabilistic rounding process using fixed point is executed.
 以上に説明したように、本実施例に係る積和演算器112は、浮動小数点の積和演算に用いる回路に対して少量の回路追加を行うことで、固定小数点の確率的丸めが実行可能となる。また、本実施例に係る積和演算器112は、乗算の累積演算結果に対する確率的丸めを実行する。したがって、簡易な構成で適切な確率的丸めを実行することができる。 As described above, the product-sum calculator 112 according to the present embodiment can execute the fixed-point stochastic rounding by adding a small amount of circuits to the circuit used for the floating-point product-sum calculation. Become. Further, the product-sum calculator 112 according to the present embodiment executes stochastic rounding on the cumulative calculation result of multiplication. Therefore, it is possible to execute appropriate probabilistic rounding with a simple configuration.
 1 PCIカード
 2 ホストコンピュータ
 10 処理ユニット
 11 全体命令制御部
 12 メモリコントローラ
 13 メモリ
 14 PCI制御部
 50 情報処理装置
 100 積和演算部
 101 演算命令制御部
 102 演算命令バッファ
 103 マルチプレクサ
 111 ベクタレジスタ
 112 積和演算器
 121 乱数生成回路
 122 冪乗数生成部 
 125 乗算器
 126 指数符号演算部
 127 桁合シフタ
 129 加算器
 130 固定小数点レジスタ
 131 桁落量予測部
 132 シフト量算出部
 134 正規化シフタ
 135 丸め回路
 123,124,128,133 マルチプレクサ
1 PCI Card 2 Host Computer 10 Processing Unit 11 Overall Command Control Unit 12 Memory Controller 13 Memory 14 PCI Control Unit 50 Information Processing Device 100 Product Sum Operation Unit 101 Operation Command Control Unit 102 Operation Command Buffer 103 Multiplexer 111 Vector Register 112 Product Sum Operation Unit 121 random number generation circuit 122 power multiplier generation unit
125 Multiplier 126 Exponent code calculator 127 Digit shifter 129 Adder 130 Fixed point register 131 Digit loss predictor 132 Shift amount calculator 134 Normalization shifter 135 Rounding circuit 123, 124, 128, 133 Multiplexer

Claims (5)

  1.  乱数を生成する乱数生成部と、
     丸め対象数が配置される所定位置及び出力データの小数点位置情報を基に、前記乱数の先頭が前記丸め対象数の丸め位置に一致するように前記乱数の位置を移動する乱数移動部と、
     前記乱数移動部により移動された前記乱数と前記所定位置に配置された前記丸め対象数とを加算する加算部と、
     前記加算部による加算結果における前記丸め位置から所定桁の有効数字を含む所定範囲のデータを前記出力データとして出力する出力部と
     を備えたことを特徴とする演算処理装置。
    A random number generator that generates random numbers,
    Based on the predetermined position where the rounding target number is arranged and the decimal point position information of the output data, a random number moving unit that moves the position of the random number so that the beginning of the random number matches the rounding position of the rounding target number,
    An adding unit that adds the random number moved by the random number moving unit and the rounding target number arranged at the predetermined position;
    And an output unit that outputs, as the output data, data in a predetermined range including a significant digit of a predetermined digit from the rounded position in the addition result of the addition unit.
  2.  前記加算部による加算結果における前記丸め位置から前記所定桁の前記有効数字が、予め決められた所定桁の出力位置に一致するように移動する移動部をさらに備え、
     前記出力部は、前記丸め位置から前記所定桁の前記有効数字が前記出力位置に一致するように移動された前記加算結果の前記出力位置以外の値を破棄して出力する
     ことを特徴とする請求項1に記載の演算処理装置。
    Further comprising a moving unit for moving the valid digit of the predetermined digit from the rounded position in the addition result of the adding unit so as to match the output position of a predetermined predetermined digit,
    The output unit discards and outputs a value other than the output position of the addition result that has been moved from the rounding position so that the significant digit of the predetermined digit matches the output position. The arithmetic processing unit according to Item 1.
  3.  前記丸め対象数が配置される位置及び前記出力データの小数点位置情報を基に、前記乱数の先頭を前記丸め対象数の前記丸め位置に一致させるための2の冪乗数を求める冪乗数生成部をさらに備え、
     前記乱数生成部は、2進数で表される乱数を生成し
     前記乱数移動部は、前記冪乗数生成部により求められた前記冪乗数を前記乱数に乗算することで前記乱数の位置を移動する
     ことを特徴とする請求項1に記載の演算処理装置。
    A power multiplier generating unit that obtains a power of 2 for matching the head of the random number with the rounding position of the rounding target number based on the position where the rounding target number is arranged and the decimal point position information of the output data. Further preparation,
    The random number generating unit generates a random number represented by a binary number, and the random number moving unit moves the position of the random number by multiplying the random number by the power factor obtained by the power factor generating unit. The arithmetic processing device according to claim 1.
  4.  前記乱数移動部は、2つの入力データを乗算して浮動小数点演算を行い、
     前記加算部は、前記乱数移動部による乗算結果に対して入力されたデータを加算して浮動小数点演算を行う
     ことを特徴とする請求項1に記載の演算処理装置。
    The random number moving unit multiplies two pieces of input data to perform a floating point operation,
    The arithmetic processing apparatus according to claim 1, wherein the addition unit performs floating point arithmetic by adding the input data to the multiplication result by the random number movement unit.
  5.  乱数を生成し、
     丸め対象数が配置される所定位置及び出力データの小数点位置情報を基に、前記乱数の先頭が前記丸め対象数の丸め位置に一致するように前記乱数の位置を移動し、
     移動した前記乱数と前記所定位置に配置された前記丸め対象数とを加算し、
     加算結果の前記丸め位置から所定桁の有効数字を含むデータを前記出力データとして出力する
     ことを特徴とする演算処理装置の制御方法。
    Generate a random number,
    Based on the predetermined position where the rounding target number is arranged and the decimal point position information of the output data, the position of the random number is moved so that the beginning of the random number matches the rounding position of the rounding target number,
    Add the moved random number and the rounding target number arranged at the predetermined position,
    A method of controlling an arithmetic processing device, comprising outputting data including a significant digit of a predetermined digit from the rounded position of an addition result as the output data.
PCT/JP2018/039370 2018-10-23 2018-10-23 Computation processing device and computation processing device control method WO2020084692A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/JP2018/039370 WO2020084692A1 (en) 2018-10-23 2018-10-23 Computation processing device and computation processing device control method
JP2020551748A JP6984762B2 (en) 2018-10-23 2018-10-23 Arithmetic processing unit and control method of arithmetic processing unit

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2018/039370 WO2020084692A1 (en) 2018-10-23 2018-10-23 Computation processing device and computation processing device control method

Publications (1)

Publication Number Publication Date
WO2020084692A1 true WO2020084692A1 (en) 2020-04-30

Family

ID=70331786

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2018/039370 WO2020084692A1 (en) 2018-10-23 2018-10-23 Computation processing device and computation processing device control method

Country Status (2)

Country Link
JP (1) JP6984762B2 (en)
WO (1) WO2020084692A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS62260229A (en) * 1986-05-06 1987-11-12 Yamaha Corp Multiplying circuit
JP2009010489A (en) * 2007-06-26 2009-01-15 Fujifilm Corp Image processor and image processing method
US20170102920A1 (en) * 2015-10-08 2017-04-13 Via Alliance Semiconductor Co., Ltd. Neural network unit that performs stochastic rounding

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS62260229A (en) * 1986-05-06 1987-11-12 Yamaha Corp Multiplying circuit
JP2009010489A (en) * 2007-06-26 2009-01-15 Fujifilm Corp Image processor and image processing method
US20170102920A1 (en) * 2015-10-08 2017-04-13 Via Alliance Semiconductor Co., Ltd. Neural network unit that performs stochastic rounding

Also Published As

Publication number Publication date
JP6984762B2 (en) 2021-12-22
JPWO2020084692A1 (en) 2021-09-02

Similar Documents

Publication Publication Date Title
CN107077416B (en) Apparatus and method for vector processing in selective rounding mode
US8990282B2 (en) Apparatus and method for performing fused multiply add floating point operation
TWI438678B (en) Method, system, and computer program product for large number multiplication
KR101913094B1 (en) Apparatus and method for rounding a floating-point value to an integral floating-point value
US10579338B2 (en) Apparatus and method for processing input operand values
JP2013543176A (en) Multiplication and addition functional unit capable of executing SCALE, ROUND, GETEXP, ROUND, GETMANT, REDUCE, RANGE and CLASS instructions
US6295597B1 (en) Apparatus and method for improved vector processing to support extended-length integer arithmetic
EP3769208B1 (en) Stochastic rounding logic
JP2018535476A (en) Floating point rounding
JP4388980B2 (en) Arithmetic apparatus and method for performing floating-point division or square root operation
CN106250098B (en) Apparatus and method for controlling rounding when performing floating point operations
JP4476210B2 (en) Data processing apparatus and method for obtaining initial estimated value of result value of reciprocal operation
JP4601544B2 (en) Data processing apparatus and method for generating result value by performing reciprocal operation on input value
US10061561B2 (en) Floating point addition with early shifting
US6941334B2 (en) Higher precision divide and square root approximations
US10459688B1 (en) Encoding special value in anchored-data element
WO2020084692A1 (en) Computation processing device and computation processing device control method
JP6886927B2 (en) Equipment and methods for processing floating point values
US10963245B2 (en) Anchored data element conversion
US10936285B2 (en) Overflow or underflow handling for anchored-data value
Fiolhais et al. An efficient exact fused dot product processor in FPGA
US20210224040A1 (en) Arithmetic processing apparatus and control method for arithmetic processing apparatus
US20220107805A1 (en) Floating Point Number Format
JP3517162B2 (en) Division and square root arithmetic unit
JP2005128907A (en) Method for controlling arithmetic unit, arithmetic unit, its program, and recording medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18937770

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2020551748

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18937770

Country of ref document: EP

Kind code of ref document: A1