WO2022088157A1 - 一种浮点数计算电路以及浮点数计算方法 - Google Patents

一种浮点数计算电路以及浮点数计算方法 Download PDF

Info

Publication number
WO2022088157A1
WO2022088157A1 PCT/CN2020/125676 CN2020125676W WO2022088157A1 WO 2022088157 A1 WO2022088157 A1 WO 2022088157A1 CN 2020125676 W CN2020125676 W CN 2020125676W WO 2022088157 A1 WO2022088157 A1 WO 2022088157A1
Authority
WO
WIPO (PCT)
Prior art keywords
mantissa
floating
point number
split
order
Prior art date
Application number
PCT/CN2020/125676
Other languages
English (en)
French (fr)
Inventor
蒋东龙
董镇江
谢环
李震桁
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2020/125676 priority Critical patent/WO2022088157A1/zh
Priority to EP20959318.5A priority patent/EP4220379A4/en
Priority to CN202080102852.5A priority patent/CN115812194A/zh
Publication of WO2022088157A1 publication Critical patent/WO2022088157A1/zh
Priority to US18/309,269 priority patent/US20230266941A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/483Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
    • G06F7/485Adding; Subtracting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/483Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F5/00Methods or arrangements for data conversion without changing the order or content of the data handled
    • G06F5/01Methods or arrangements for data conversion without changing the order or content of the data handled for shifting, e.g. justifying, scaling, normalising
    • G06F5/012Methods or arrangements for data conversion without changing the order or content of the data handled for shifting, e.g. justifying, scaling, normalising in floating-point computations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/483Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
    • G06F7/487Multiplying; Dividing
    • G06F7/4876Multiplying

Definitions

  • the embodiments of the present application relate to the field of computers, and further relate to the application of artificial intelligence (artificial intelligence, AI) technology in the field of computers, in particular to a floating-point number calculation circuit and a floating-point number calculation method.
  • artificial intelligence artificial intelligence
  • Artificial Intelligence is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.
  • artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that responds in a similar way to human intelligence.
  • Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • Research in the field of artificial intelligence includes robotics, natural language processing, computer vision, decision-making and reasoning, human-computer interaction, recommendation and search, and basic AI theory.
  • Convolutional neural network is currently widely used in many types of image processing applications.
  • FP floating point
  • FP16 data Insufficient accuracy will lead to non-convergence or slow convergence of network training, so it is necessary to use higher-precision FP32 data to ensure the effect of network training.
  • higher-precision FP64 data needs to be used for numerical calculations.
  • multiplier In the existing data calculation scheme, a multiplier with a larger number of digits is usually used to calculate the data. For example, multipliers used to calculate FP64 data are often multiplexed to calculate FP64 data and FP32 data.
  • the existing calculation scheme designs a 54-bit (binarb digit, bit) multiplier to directly support the calculation of the mantissa of FP64 data.
  • the 54-bit multiplier When the multiplier calculates FP32 data, the 54-bit multiplier is logically divided into two 27-bit parts, which are respectively used to support the calculation of the mantissa part of two pairs of FP32 data.
  • the eap processing unit of the FP64 part is directly copied and used to process the additional eap part of the FP32.
  • the area overhead of one FP64 multiplier is approximately equal to four FP32 multipliers.
  • the performance in terms of timing overhead and hardware design when using a multiplier with a larger number of digits to calculate data is unsatisfactory.
  • the embodiments of the present application provide a floating-point number calculation circuit and a floating-point number calculation method.
  • the floating-point number calculation circuit can split a floating-point number with a larger number of digits into a floating-point number with a smaller number of digits, so as to use a floating-point number with a smaller number of digits.
  • the multiplier is used to calculate the floating-point number with the larger number of digits.
  • the floating-point number calculation circuit has short sequence overhead, low hardware design cost, and reasonably utilizes the computing performance of the multiplier.
  • a first aspect of the embodiments of the present application provides a floating-point number calculation circuit
  • the floating-point number calculation circuit includes: a memory controller, a split circuit, a storage circuit, an exponent processing circuit, and a calculation circuit; the input end of the split circuit is connected to the the output terminal of the memory controller is electrically connected, the output terminal of the split circuit is electrically connected to the input terminal of the storage circuit; the input terminal of the index processing circuit is electrically connected to the first output terminal of the storage circuit, The output terminal of the index processing circuit is electrically connected to the first input terminal of the calculation circuit; the second input terminal of the calculation circuit is electrically connected to the second output terminal of the storage circuit; the memory controller is used for obtaining a first floating point number and a second floating point number; the splitting circuit is used for splitting the mantissa part of the first floating point number and the mantissa part of the second floating point number and obtaining the first a shift number; the storage circuit is used to store the split mantissa parts, the exponent parts
  • An embodiment of the present application provides a floating-point number calculation circuit
  • the floating-point number calculation circuit includes a splitting circuit that splits a mantissa part of a first floating-point number and a mantissa part of a second floating-point number.
  • the exponent processing circuit obtains the second shift numbers of the split mantissa parts.
  • the calculation circuit calculates the product of the first floating point number and the mantissa part of the second floating point number according to the split mantissa parts and the second shift numbers of the split mantissa parts.
  • the floating-point number calculation circuit can split a floating-point number with a large number of digits into a floating-point number with a small number of digits, so as to use a multiplier with a small number of digits to calculate the floating-point number with a large number of digits.
  • the point calculation circuit has short sequence overhead, low hardware design cost, and reasonably utilizes the computing performance of the multiplier.
  • the splitting circuit is configured to split the mantissa part of the first floating-point number into a first high-order mantissa and a first low-order mantissa, and the second floating-point number
  • the mantissa part of is split into a second high-order mantissa and a second low-order mantissa
  • the first shift number is used to indicate the shift difference between the highest bit of each high-order mantissa and the highest bit of each low-order mantissa.
  • the floating-point number calculation circuit can split the mantissa part with the larger number of digits of the first floating-point number into the first high-order mantissa and the first low-order mantissa with smaller digits, and the second The mantissa part with a large number of floating-point digits is split into the second-highest mantissa and the second-lowest mantissa with smaller digits, so that a multiplier with a smaller number of digits is used to calculate the product of the split mantissa parts, which reduces the The design cost of the hardware makes reasonable use of the computing performance of the multiplier.
  • the first high-order mantissa includes a first mantissa
  • the first low-order mantissa includes a second mantissa
  • the second high-order mantissa includes a third mantissa
  • the second The low-order mantissa includes the fourth mantissa.
  • a specific splitting method for the mantissa part of the floating-point number is provided. After the mantissa part of the FP32 type floating point number is split by this splitting method, the FP16 type multiplication can be used. to perform the calculation. Similarly, after splitting the mantissa part of the FP64 type floating point number using this splitting method, you can use the FP32 type multiplier for calculation, and the mantissa part of the FP128 type floating point number using this splitting method. After splitting, FP64-type multipliers can be used to perform calculations. This splitting method can realize the use of a multiplier with a smaller number of digits to calculate the product of the mantissa part with a larger number of digits.
  • the first high-order mantissa includes a first mantissa
  • the first low-order mantissa includes a second mantissa, a third mantissa, a fourth mantissa, and a fifth mantissa
  • the first mantissa The two high-order mantissas include the sixth mantissa
  • the second low-order mantissas include the seventh mantissa, the eighth mantissa, the ninth mantissa, and the tenth mantissa.
  • a specific splitting method for the mantissa part of the floating-point number is provided. After the mantissa part of the FP64 type floating point number is split by this splitting method, the FP16 type multiplication can be used. to perform the calculation. Similarly, after the mantissa part of the FP128 type floating point number is split by this splitting method, the FP32 type multiplier can be used for calculation. This splitting method can realize the use of a multiplier with a smaller number of digits to calculate the product of the mantissa part with a larger number of digits.
  • the exponent processing circuit includes a first adder, a selection circuit and a second adder; an input end of the first adder and a first output of the storage circuit The output end of the first adder is electrically connected to the first input end of the second adder; the second input end of the second adder is electrically connected to the output end of the selection circuit, The output end of the second adder is electrically connected to the first input end of the calculation circuit; the first adder is used to combine the first shift number of the split mantissa parts with the split The exponent parts corresponding to the mantissa parts are added to obtain a plurality of second operation results; the selection circuit is used for selecting the maximum value among the plurality of second operation results; the second adder is used for adding the plurality of second operation results; The maximum value in the two operation results is respectively subtracted from each second operation result to obtain the second shift numbers of the split mantissa parts.
  • This possible implementation provides a specific implementation form in terms of hardware, which improves the implementability of the solution.
  • the calculation circuit includes a multiplier, a shift register and a third adder; an input end of the multiplier is electrically connected to a second output end of the storage circuit, The output end of the multiplier is electrically connected to the first input end of the shift register; the second input end of the shift register is electrically connected to the output end of the second adder; the The output end is electrically connected to the input end of the third adder; the multiplier is used to separate the mantissa parts from the first high-order mantissa and the first low-order mantissa from the second high-order mantissa and the second low-order mantissa respectively The obtained mantissa parts are multiplied to obtain a plurality of multiplication data; the shift register is used to perform shift processing on the plurality of multiplication data according to the second shift number of the split mantissa parts; the The third adder is configured to perform an addition operation on the shifted multiplication data to obtain the product of the man
  • This possible implementation provides a specific implementation form in terms of hardware, which improves the implementability of the solution.
  • a second aspect of an embodiment of the present application provides a floating-point number calculation method, the method includes: acquiring a first floating-point number and a second floating-point number; splitting the mantissa part of the first floating-point number and the second floating-point number The mantissa part and the first shift number of the split mantissa parts; store the split mantissa parts, the exponent parts corresponding to the split mantissa parts, and the first number of the split mantissa parts; A shift number; adding the exponent part of the first floating point number and the exponent part of the second floating point number to obtain a first operation result, and shifting the first shift of the split mantissa parts The number and the exponent parts corresponding to the split mantissa parts are added to obtain multiple second operation results, and the second shift numbers of the split mantissa parts are obtained according to the multiple second operation results; according to the split The product of the first floating-point number and the mantissa part of the second floating-point number is
  • the second shift numbers of the split mantissa parts are obtained.
  • the product of the first floating point number and the mantissa part of the second floating point number is calculated according to the split mantissa parts and the second shift numbers of the split mantissa parts.
  • This method can split a floating-point number with a large number of digits into a floating-point number with a small number of digits, so that a multiplier with a small number of digits is used to calculate the floating-point number with a large number of digits.
  • the floating-point number calculation method provided by the present application
  • the computing device has short time sequence overhead, low hardware design cost, and reasonably utilizes the computing performance of the multiplier included in the computing device.
  • the splitting the mantissa part of the first floating point number and the mantissa part of the second floating point number includes: splitting the mantissa part of the first floating point number It is divided into a first high-order mantissa and a first low-order mantissa, and the mantissa part of the second floating-point number is divided into a second high-order mantissa and a second low-order mantissa, and the first shift number is used to indicate the highest mantissa of each high-order mantissa.
  • the shift difference between the bits and the MSB of each low-order mantissa is a possible implementation manner of the second aspect.
  • the floating-point number calculation method provided by the present application can split the mantissa part with the larger number of digits of the first floating-point number into the first high-order mantissa and the first low-order mantissa with smaller digits, and the second The mantissa part with a large number of floating-point digits is split into the second-highest mantissa and the second-lowest mantissa with smaller digits, so that a multiplier with a smaller number of digits is used to calculate the product of the split mantissa parts, which reduces the The design cost of the hardware makes reasonable use of the computing performance of the multiplier.
  • the first high-order mantissa includes a first mantissa
  • the first low-order mantissa includes a second mantissa
  • the second high-order mantissa includes a third mantissa
  • the second The low-order mantissa includes the fourth mantissa.
  • a specific splitting method for the mantissa part of the floating-point number is provided. After the mantissa part of the FP32 type floating point number is split by this splitting method, the FP16 type multiplication can be used. to perform the calculation. Similarly, after splitting the mantissa part of the FP64 type floating point number using this splitting method, you can use the FP32 type multiplier for calculation, and the mantissa part of the FP128 type floating point number using this splitting method. After splitting, FP64-type multipliers can be used to perform calculations. This splitting method can realize the use of a multiplier with a smaller number of digits to calculate the product of the mantissa part with a larger number of digits.
  • the first high-order mantissa includes a first mantissa
  • the first low-order mantissa includes a second mantissa, a third mantissa, a fourth mantissa, and a fifth mantissa
  • the first mantissa The two high-order mantissas include the sixth mantissa
  • the second low-order mantissas include the seventh mantissa, the eighth mantissa, the ninth mantissa, and the tenth mantissa.
  • a specific splitting method for the mantissa part of the floating-point number is provided. After the mantissa part of the FP64 type floating point number is split by this splitting method, the FP16 type multiplication can be used. to perform the calculation. Similarly, after the mantissa part of the FP128 type floating point number is split by this splitting method, the FP32 type multiplier can be used for calculation. This splitting method can realize the use of a multiplier with a smaller number of digits to calculate the product of the mantissa part with a larger number of digits.
  • a third aspect of the embodiments of the present application provides a computing device, where the computing device includes a control circuit and a floating-point number computing circuit.
  • the floating-point number calculation circuit calculates data under the control of the control circuit, and the floating-point number calculation circuit is the floating-point number calculation circuit described in the first aspect or any possible implementation manner of the first aspect.
  • Fig. 1 is the processing principle diagram of the convolutional neural network provided by this application.
  • FIG. 2 is a schematic diagram of the composition of a floating point number of an FP32 type provided by an embodiment of the present application;
  • FIG. 3 is a schematic structural diagram of a floating-point number calculation circuit provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of an embodiment of a floating-point number calculation circuit provided by an embodiment of the present application.
  • FIG. 5 is another schematic structural diagram of a floating-point number calculation circuit provided by an embodiment of the present application.
  • FIG. 6 is a schematic diagram of another embodiment of a floating-point number calculation circuit provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of another embodiment of a floating-point number calculation circuit provided by an embodiment of the present application.
  • FIG. 8 is a schematic diagram of another embodiment of a floating-point number calculation circuit provided by an embodiment of the present application.
  • FIG. 9 is a schematic diagram of another embodiment of a floating-point number calculation circuit provided by an embodiment of the present application.
  • FIG. 10 is a schematic diagram of another embodiment of a floating-point number calculation circuit provided by an embodiment of the present application.
  • FIG. 11 is a schematic diagram of another embodiment of a floating-point number calculation circuit provided by an embodiment of the present application.
  • FIG. 12 is a schematic diagram of another embodiment of a floating-point number calculation circuit provided by an embodiment of the present application.
  • FIG. 13 is a schematic diagram of another embodiment of a floating-point number calculation circuit provided by an embodiment of the present application.
  • Artificial Intelligence is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.
  • artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that responds in a similar way to human intelligence.
  • Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • Research in the field of artificial intelligence includes robotics, natural language processing, computer vision, decision-making and reasoning, human-computer interaction, recommendation and search, and basic AI theory.
  • FIG. 1 is a processing principle diagram of a convolutional neural network provided by this application.
  • Convolutional neural network CNN has a wide range of application prospects in the fields of image and speech recognition.
  • the convolutional neural network needs to perform convolution operations on multiple convolution kernels and one or more feature maps. Specifically, for each convolution kernel, it starts from the first pixel of the feature map and moves pixel by pixel in the row direction. When the end of the row is reached, it moves down one pixel in the column direction, and returns to the row direction. starting point, and repeat the row-direction movement process until all pixels of the feature map are traversed.
  • the parameters in the convolution kernel and the data at the corresponding position in the feature map are input as two parts of the convolution operation, and the convolution operation is performed (multiplying the two by two and then accumulating the products one by one), After the convolution result is obtained, the convolution result is output.
  • Convolutional neural network is currently widely used in various types of image processing applications.
  • Image processing applications use floating point (FP) 16 types of data for network training.
  • FP16 type Insufficient data accuracy of FP16 type will lead to non-convergence or slow convergence of network training, so it is necessary to use higher-precision FP32 type data to ensure the effect of network training.
  • higher-precision FP64 type data In addition, in some applications, it is necessary to use higher-precision FP64 type data as well as FP128 type data for model training.
  • the floating-point number calculation circuit involved in the present invention can be applied not only to the field of artificial intelligence, but also to the field of data signal processing, such as image processing systems, radar systems and communication systems.
  • the circuit and method can optimize the performance of digital signal processing (DSP) or other digital devices.
  • DSP digital signal processing
  • LTE Long Term Evolution
  • UMTS Universal Mobile Telecommunications System
  • GSM Global System for Mobile Communications
  • multiplier In the existing data calculation scheme, a multiplier with a larger number of digits is usually used to calculate the data. For example, multipliers used to calculate FP64 data are often multiplexed to calculate FP64 data and FP32 data. Some computing schemes design a 54-bit multiplier for directly supporting the mantissa of FP64 data. When the multiplier calculates FP32 data, the 54-bit multiplier is logically divided into two 27-bit parts, which are respectively used to support the calculation of the mantissa part of two pairs of FP32 data. However, in terms of area ratio, the area overhead of one FP64 multiplier is approximately equal to four FP32 multipliers.
  • an embodiment of the present application provides a floating-point number calculation circuit.
  • the splitting circuit included in the floating-point number calculation circuit splits the mantissa part of the first floating-point number and the second floating-point number.
  • the mantissa part and the first shift number of each mantissa part after splitting are obtained, the exponent processing circuit adds the first shift number of each mantissa part after splitting and the exponent part corresponding to each mantissa part after splitting to add A plurality of second operation results are obtained, and a second shift number of each split mantissa part is obtained according to the plurality of second operation results.
  • the calculation circuit calculates the product of the first floating point number and the mantissa part of the second floating point number according to the split mantissa parts and the second shift numbers of the split mantissa parts.
  • the floating-point number calculation circuit can split a floating-point number with a large number of digits into a floating-point number with a small number of digits, so that a multiplier with a small number of digits is used to calculate the floating-point number with a large number of digits, and the multiplication is reasonably used.
  • the computing performance of the controller is short, the timing overhead is short, and the hardware design cost is low.
  • each floating point number is composed of three parts, namely the sign bit (sign), the exponent bit (exp) and the mantissa bit (mantissa).
  • sign the sign bit
  • exp the exponent bit
  • mantissa the mantissa bit
  • the actual value of a float is equal to sign*2 exp *mantissa.
  • FIG. 2 is a schematic diagram of the composition of a floating point number of FP32 type provided by an embodiment of the present application.
  • FP32 type floating-point numbers have 1-bit sign, 8-bit exp and 24-bit mantissa, showing a total of 32 bits stored. Among them, the highest bit of mantissa is implicitly stored (if exp is not 0, the hidden bit is 1, otherwise the hidden bit is 0), and the three parts total 32 bits.
  • FIG. 3 is a schematic structural diagram of a floating-point number calculation circuit provided by an embodiment of the present application.
  • the floating-point number calculation circuit 100 includes: a memory controller 101 , a split circuit 102 , a storage circuit 103 , an exponent processing circuit 104 and a calculation circuit 105 .
  • the input terminal of the split circuit 102 is electrically connected to the output terminal of the memory controller 101 , and the output terminal of the split circuit 102 is electrically connected to the input terminal of the storage circuit 103 .
  • the input terminal of the index processing circuit 104 is electrically connected to the first output terminal of the storage circuit 103 , and the output terminal of the index processing circuit 104 is electrically connected to the first input terminal of the calculation circuit 105 .
  • the second input terminal of the calculation circuit 105 is electrically connected to the second output terminal of the storage circuit 103 .
  • the first floating point number and the second floating point number are stored in the memory, and the memory controller 101 is used to obtain the first floating point number and the second floating point number.
  • the memory may be a double data rate (double data rate, DDR) memory, or other memory, which is not limited here.
  • the memory controller may be a DDR controller or other types of memory controllers, which are not specifically limited here.
  • the splitting circuit 102 is configured to split the mantissa part of the first floating point number and the mantissa part of the second floating point number, and obtain the first shift number of each split mantissa part.
  • the storage circuit 103 is used for storing each split mantissa part, the exponent part corresponding to each split mantissa part, and the first shift number of each split mantissa part.
  • the splitting circuit 102 can split the mantissa part of the first floating point number into an A part with a length of 12 bits and a B part with a length of 12 bits, where the A part is 100000000000 and the B part is 000000000001. If the A part is used as the benchmark, the B part obtained after splitting needs to be shifted to the right by 12 bits and then added to the A part to obtain the mantissa part of the first floating point number. Therefore, the splitting circuit 102 obtains the split B part The first shift number of 12 bits to the right.
  • the first floating-point number can be a floating-point number of FP32 type, the first floating-point number can also be a floating-point number of FP64 type, and the first floating-point number can also be a floating-point number of FP128 type.
  • the number of points is not limited here.
  • the mantissa part of the first floating-point number may be divided into two parts or may be divided into multiple parts, which is not specifically limited here.
  • the number of digits of each mantissa part after splitting can be equal, and the number of digits of each mantissa part after splitting can also be unequal, which is not limited here.
  • the data type of the second floating-point number is similar to the data type of the first floating-point number
  • the splitting method of the mantissa part of the second floating-point number is similar to the splitting method of the mantissa part of the first floating-point number
  • the exponent processing circuit 104 is configured to add the exponent part of the first floating point number and the exponent part of the second floating point number to obtain a first operation result, and the first operation result is the first floating point number and the second floating point number.
  • the exponent processing circuit 104 is further configured to add the first shift numbers of the split mantissa parts and the exponent parts corresponding to the split mantissa parts to obtain a plurality of second operation results, according to the plurality of second operation results. Obtain the second shift number of each mantissa part after splitting.
  • the calculation circuit 105 is configured to calculate the product of the first floating point number and the mantissa part of the second floating point number according to the split mantissa parts and the second shift numbers of the split mantissa parts.
  • FIG. 4 is a schematic diagram of an embodiment of a floating-point number calculation circuit provided by an embodiment of the present application.
  • the splitting circuit can split the mantissa part of the first floating point number into the first high-order mantissa and the first low-order mantissa, and split the mantissa part of the second floating-point number into the second high-order mantissa and the first low-order mantissa.
  • the first shift number is used to indicate the shift difference between the highest bit of each high-order mantissa and the highest bit of each low-order mantissa.
  • the first high-order mantissa includes the first mantissa
  • the first low-order mantissa includes the second mantissa
  • the second high-order mantissa includes the third mantissa
  • the second low-order mantissa includes the fourth mantissa.
  • the splitting circuit 102 can split the mantissa part of the first floating point number into a first mantissa with a length of 11 bits and a second mantissa with a length of 13 bits.
  • the first mantissa belongs to the first high-order mantissa
  • the second mantissa belongs to the first low-order mantissa.
  • the first shift number is used to indicate the shift difference between the highest bit of each high-order mantissa and the highest bit of each low-order mantissa, that is, the shift number of the first mantissa is 0, and the first shift number of the second mantissa is
  • the shift difference between the first position of the second mantissa and the first position of the first mantissa is 11 bits, so the first shift number of the second mantissa is a right shift of 11 bits.
  • the splitting manner of the second high-order mantissa and the first high-order mantissa is similar, and the splitting manner of the second low-order mantissa and the first low-order mantissa is similar, and details are not repeated here.
  • the first high mantissa includes the first mantissa
  • the first low mantissa includes the second mantissa
  • the third mantissa includes the fourth mantissa and the fifth mantissa
  • the second high mantissa includes the sixth mantissa
  • the second low mantissa includes the seventh mantissa number, eighth mantissa, ninth mantissa and tenth mantissa.
  • the splitting circuit 102 can split the mantissa of the first floating point number into a first mantissa 10001 with a length of 5 bits, a second mantissa with a length of 12 bits 100000000001, a third mantissa with a length of 12 bits 100000000011, and a fourth mantissa with a length of 12 bits.
  • the mantissa is 100000000111 and the fifth mantissa with a length of 12 bits is 100000001111.
  • the first mantissa belongs to the first high-order mantissa
  • the second mantissa, the third mantissa, the fourth mantissa, and the fifth mantissa belong to the first low-order mantissa.
  • the first shift number is used to indicate the shift difference between the highest bit of the high-order mantissa and the highest bit of each low-order mantissa, that is, the shift number of the first mantissa is 0, and the first shift number of the second mantissa is the th
  • the shift difference between the first position of the second mantissa and the first position of the first mantissa is 5 bits, which is the same as the number of digits of the first mantissa, so the first shift of the second mantissa is 5 bits to the right.
  • the first shift number of the third mantissa is the shift difference of 17 bits between the first position of the third mantissa and the first position of the first mantissa, which is the same as the sum of the shift numbers of the first mantissa and the second mantissa, so the third The first shift of the mantissa is 17 bits to the right.
  • the first shift number of the fourth mantissa is the shift difference of 29 bits between the first position of the fourth mantissa and the first position of the first mantissa, which is the same as the sum of the shift numbers of the first mantissa, the second mantissa and the third mantissa.
  • the first shift number of the fifth mantissa is the shift difference of 41 bits between the first position of the fifth mantissa and the first position of the first mantissa, and the shift of the first mantissa, the second mantissa, the third mantissa and the fourth mantissa The sum of the numbers is the same, so the first shift of the fifth mantissa is 41 bits to the right.
  • the first high-order mantissa and the second high-order mantissa may also have other different splitting methods.
  • the length of the first digit is 9 bits
  • the second mantissa, the third mantissa, the fourth mantissa and the fifth mantissa are all It is 11 bits, which is not specifically limited here.
  • the splitting manner of the second high-order mantissa and the first high-order mantissa is similar, and the splitting manner of the second low-order mantissa and the first low-order mantissa is similar, and details are not repeated here.
  • FIG. 5 is another schematic structural diagram of a floating-point number calculation circuit provided by an embodiment of the present application.
  • the index processing circuit includes a first adder, a selection circuit and a second adder.
  • the input end of the first adder is electrically connected to the first output end of the storage circuit, and the output end of the first adder is electrically connected to the first input end of the second adder.
  • the second input terminal of the second adder is electrically connected to the output terminal of the selection circuit, and the output terminal of the second adder is electrically connected to the first input terminal of the calculation circuit.
  • the first adder is configured to add the first shift numbers of the split mantissa parts and the exponent parts corresponding to the split mantissa parts to obtain a plurality of second operation results.
  • the selection circuit is used for selecting the maximum value among the plurality of second operation results.
  • the second adder is used for subtracting the maximum value among the plurality of second operation results from each second operation result to obtain the second shift numbers of the split mantissa parts.
  • the calculation circuit may include a multiplier, a shift register and a third adder.
  • the input terminal of the multiplier is electrically connected to the second output terminal of the storage circuit, and the output terminal of the multiplier is electrically connected to the first input terminal of the shift register.
  • the second input terminal of the shift register is electrically connected to the output terminal of the second adder.
  • the output terminal of the shift register is electrically connected to the input terminal of the third adder.
  • the multiplier is used to multiply the mantissa parts split from the first high-order mantissa and the first low-order mantissa with the mantissa parts split from the second high-order mantissa and the second low-order mantissa to obtain multiple multiplications data.
  • the shift register is used to perform shift processing on a plurality of multiplication data according to the second shift numbers of the split mantissa parts.
  • the third adder is configured to perform an addition operation on the shifted multiplication data to obtain the product of the mantissa part of the first floating point number and the second floating point number.
  • FIG. 6 is a schematic diagram of another embodiment of a floating-point number calculation circuit provided by an embodiment of the present application.
  • the mantissa part of the first floating point number is split into A MSB and A LSB in two parts.
  • the mantissa part of the second floating-point number is divided into two parts: B MSB and B LSB , where A MSB , A LSB , B MSB and B LSB are all 12 bits. Then, the multiplication of the mantissa part of the first floating-point number A and the mantissa part of the second floating-point number B can be expressed as formula 1.
  • the exponent part corresponding to A MSB is A_EXP
  • the exponent part corresponding to B MSB is B_EXP.
  • the shift number of A MSB obtained according to the splitting circuit is 0, and the shift number of B MSB is also 0. Therefore, EXP offset (the first adder) adds the results of A MSB -0 and B MSB -0 to obtain A_EXP+B_EXP, A_EXP+B_EXP is the second operation result corresponding to A MSB *B MSB .
  • the operation result can represent the operation result after multiplying the exponent parts corresponding to A MSB *B MSB .
  • the exponent part corresponding to the A MSB is A_EXP
  • the exponent part corresponding to the B LSB is B_EXP.
  • the shift number of the A MSB obtained according to the splitting circuit is 0, and the shift number of the B LSB is -12.
  • the shift number -12 can be split into -6 and -6, and the exponent part is recorded as A_EXP-6 and B_EXP-6 respectively.
  • EXP offset (the first adder) adds the results of A MSB -6 and B LSB -6 to get A_EXP+B_EXP-12
  • A_EXP+B_EXP-12 is the second operation result corresponding to A MSB *B LSB .
  • the second operation result can represent the operation result after multiplying the exponent parts corresponding to A MSB *B LSB .
  • the exponent part corresponding to A LSB is A_EXP
  • the exponent part corresponding to B MSB is B_EXP.
  • the shift number of A LSB obtained according to the splitting circuit is -12
  • the shift number of B MSB is 0.
  • the shift number -12 can be split into -6 and -6, and the exponent part is recorded as A_EXP-6 and B_EXP-6 respectively.
  • EXP offset adds the results of A LSB -6 and B MSB -6 to get A_EXP+B_EXP-12
  • A_EXP+B_EXP-12 is the second operation result corresponding to A LSB *B MSB
  • the The second operation result can represent the operation result after multiplying the exponent parts corresponding to A LSB *B MSB .
  • the exponent part corresponding to A LSB is A_EXP
  • the exponent part corresponding to B LSB is B_EXP.
  • the shift number of A LSB obtained according to the splitting circuit is -12
  • the shift number of B LSB is -12.
  • EXP offset (first adder) adds the results of A LSB -12 and B LSB -12 to get A_EXP+B_EXP-24
  • A_EXP+B_EXP-24 is the second operation result corresponding to A LSB *B LSB
  • the The second operation result can represent the operation result after multiplying the exponent parts corresponding to A LSB *B LSB .
  • the selection circuit obtains MAX EXP (the maximum value among the multiple second operation results), and then inputs the MAX EXP into each delta (second adder). Each delta subtracts MAX EXP from each second operation result to obtain the second shift number of each mantissa part after splitting.
  • Each 13 bit Mul unit calculates A MSB *B MSB , A MSB *B LSB , A LSB *B MSB and A LSB *B LSB respectively to obtain multiple multiplication data
  • shift receives delta After the second shift number is sent, each part of the input multiplication data is shifted, and the adder (third adder) adds the shifted multiplication data to obtain the mantissa part of the first floating point number and the second floating point number product of .
  • the shift number -12 may also have other splitting methods, which can be split into -3 and -9, -4 and -8, or other splitting methods.
  • the total number of shift numbers of the two parts of can be -12, which is not specifically limited here.
  • the shift number -24 can also have different splitting methods, which are not limited here.
  • the shift number -12 may also have other splitting methods, which can be split into -3 and -9, -4 and -8, or other splitting methods.
  • the total number of shift numbers of the two parts of can be -12, which is not specifically limited here.
  • the shift number -24 can also have different splitting methods, which are not limited here.
  • FIG. 7 is a schematic diagram of another embodiment of a floating-point number calculation circuit provided by an embodiment of the present application.
  • the embodiment shown in FIG. 6 is regarded as a calculation module. If multiple calculation modules perform multiplication operations on multiple pairs of floating-point numbers, the selection circuit can select one of the multiple calculation modules.
  • the maximum value (max exp) of all the second operation results, the maximum value of all the second operation results is returned to each calculation module, and each calculation module is divided according to the maximum value of all the second operation results.
  • the second shift number of each mant issa part of .
  • Example 2 If both the first floating point number A and the second floating point number B are FP64 floating point numbers, when processing the calculation of FP64 floating point numbers, split the mantissa part of the first floating point number into a0, a1, a2 , a3, a4 and a5 five parts. Split the mantissa part of the first floating point number into five parts b0, b1, b2, b3, b4 and b5. Among them, the digits of a1, a2, a3, a4, b1, b2, b3, and b4 are all 12 bits, and the digits of a0 and b0 are 5 bits.
  • the multiplication of the mantissa part of the first floating point number A and the mantissa part of the second floating point number B can be expressed as Equation 2.
  • the total length of the mantissa part obtained by the calculation of A_mantissa*B_mantissa is 106 bits. If you want to directly complete the calculation of the mantissa part of a pair of FP64 floating-point numbers in a calculation module, the adder (third adder) needs to be expanded into an adder that supports data calculation with a length of 106 bits. Both the area penalty and the timing penalty are too high. Therefore, one can choose to split the multiplication of a pair of FP64 mantissas into two parts.
  • FIG. 8 is a schematic diagram of another embodiment of a floating-point number calculation circuit provided by an embodiment of the present application.
  • the floating-point number calculation circuit can combine 13 pairs of partial multiplications of higher bits to form a high-order part (part1), and 12 pairs of partial multiplications of lower bits can be combined to form another part Form the lower part (part2).
  • the high-order part requires a 60-bit wide addition tree, and the actual number of bits required to calculate the low-order part is 53 bits.
  • FIG. 9 is a schematic diagram of another embodiment of a floating-point number calculation circuit provided by an embodiment of the present application.
  • FIG. 9 they are the corresponding positions in the addition tree of the calculation results obtained by each part of part1 and part2 after calculation.
  • a 60-bit addition tree can cover the computation of part1.
  • the addition tree of the lowest bits cannot be completely covered, but these bits also do not need to participate in the calculation.
  • the floating-point number calculation circuit provided in the embodiment of the present application can be applied to a convolutional neural network, and the specific application process is described in detail in the following embodiments.
  • first floating-point number A and the second floating-point number B are both FP32 floating-point numbers, and the first floating-point number A is data in the feature image.
  • FIG. 10 is a schematic diagram of another embodiment of a floating-point number calculation circuit provided by an embodiment of the present application.
  • Step 1 Please refer to Figure 10, the second floating point number B is the data in the filter matrix.
  • the DDR controller memory controller
  • the DDR controller reads a plurality of first floating-point numbers A and second floating-point numbers B from the DDR (memory), and divides the mantissa part of the first floating-point number A through the high-low split logic (split circuit). It is split into two parts, MSB and LSB, and stored in the data RAM (storage circuit).
  • the contents included in I, II, ... X in Figure 10 are the A_MSB and A_LSB obtained after the mantissa splitting of each first floating-point number A.
  • FIG. 11 is a schematic diagram of another embodiment of a floating-point number calculation circuit provided by an embodiment of the present application.
  • Step 2 Please refer to Figure 11.
  • the mantissa after splitting in the weight RAM is preloaded into the convolution calculation unit, and the EXP (exponential part corresponding to each mantissa part after splitting) is processed by the EXP offset (second adder). , which is also preloaded into the convolution computation unit.
  • FIG. 12 is a schematic diagram of another embodiment of a floating-point number calculation circuit provided by an embodiment of the present application.
  • Step 3 Refer to Figure 12, extract the first segment of mantissa data from the data RAM (Part I), and the EXP part is also processed by exp offset, and then placed in the convolution calculation unit, and the preloaded parameters (Part 1) ) to calculate and obtain the result.
  • FIG. 13 is a schematic diagram of another embodiment of a floating-point number calculation circuit provided by an embodiment of the present application.
  • Step 4 Referring to FIG. 13 , the convolution processing unit 1 forwards the first segment of data (Part I) to the computing unit 2, and obtains the second segment of data (Part II) from the data RAM. After the calculation unit 1 acquires the II part data, the calculation unit 2 completes the operation to generate the result after acquiring the I part data. After each clock, computing units 2 to N forward the data processed by the previous clock to the next computing unit, and computing unit 1 acquires new data from the data RAM each time.
  • Step 5 Repeat Step 4 until all the data are calculated and the result is generated.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Nonlinear Science (AREA)
  • Complex Calculations (AREA)

Abstract

一种浮点数计算电路(100)以及浮点数计算方法,浮点数计算电路(100)包括的拆分电路(102)拆分第一浮点数的尾数部分与第二浮点数的尾数部分。指数处理电路(104)得到拆分后的各尾数部分的第二移位数。计算电路(105)根据拆分后的各尾数部分以及拆分后的各尾数部分的第二移位数计算第一浮点数与所述第二浮点数的尾数部分的乘积。该浮点数计算电路(100)可以把位数较大的浮点数拆分为位数较小的浮点数,从而采用较小位数的乘法器来计算该位数较大的浮点数,该浮点数计算电路(100)时序开销短,硬件设计代价低,合理的利用了乘法器的计算性能。

Description

一种浮点数计算电路以及浮点数计算方法 技术领域
本申请实施例涉及计算机领域,进一步涉及人工智能(artificial intelligence,AI)技术在计算机领域中的应用,尤其是一种浮点数计算电路以及浮点数计算方法。
背景技术
人工智能(Artificial Intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个分支,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式作出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。人工智能领域的研究包括机器人,自然语言处理,计算机视觉,决策与推理,人机交互,推荐与搜索,AI基础理论等。
卷积神经网络(convolution neural network,CNN)目前被广泛应用于多种类型的图像处理应用中,此类应用在使用浮点数(floating point,FP)16数据对模型进行网络训练时,由于FP16数据精度不足,会导致网络训练不收敛或者收敛速度慢,所以需要使用更高精度的FP32数据来保证网络训练效果。此外,在超算应用中,需要使用更高精度的FP64数据来进行数值计算。
现有的数据计算方案中,通常采用较大位数的乘法器来计算数据。例如,通常复用用于计算FP64数据的乘法器来计算FP64数据和FP32数据。现有的计算方案设计了一个54比特(binarb digit,bit)的乘法器,用于直接支持FP64数据的尾数(mantissa)的计算。当该乘法器计算FP32数据时,逻辑上将54bit的乘法器分为两个27bit部分,分别用于支持两对FP32数据的mantissa部分的计算。至于指数(eaponential,eap)部分的处理,则直接将FP64部分的eap处理单元拷贝了一份,用作处理额外的FP32的eap部分。然而,从面积比例来看,一个FP64乘法器的面积开销,约等于四个FP32乘法器。但复用FP64乘法器计算FP32数据时,只实现了两倍FP32乘法器的计算性能,且FP64乘法器的时序开销长,硬件设计代价较高。因此,采用较大位数的乘法器来计算数据时时序开销以及硬件设计等方面的表现令人不甚满意。
发明内容
本申请实施例提供了一种浮点数计算电路以及浮点数计算方法,该浮点数计算电路可以把位数较大的浮点数拆分为位数较小的浮点数,从而采用较小位数的乘法器来计算该位数较大的浮点数,该浮点数计算电路时序开销短,硬件设计代价低,合理的利用了乘法器的计算性能。
本申请实施例第一方面提供一种浮点数计算电路,该浮点数计算电路包括:内存控制器、拆分电路、存储电路、指数处理电路以及计算电路;所述拆分电路的输入端与所述内 存控制器的输出端电连接,所述拆分电路的输出端与所述存储电路的输入端电连接;所述指数处理电路的输入端与所述存储电路的第一输出端电连接,所述指数处理电路的输出端与所述计算电路的第一输入端电连接;所述计算电路的第二输入端与所述存储电路的第二输出端电连接;所述内存控制器用于获取第一浮点数以及第二浮点数;所述拆分电路用于拆分所述第一浮点数的尾数部分与所述第二浮点数的尾数部分以及得到拆分后的各尾数部分的第一移位数;所述存储电路用于存储拆分后的各尾数部分、拆分后的各尾数部分对应的指数部分以及所述拆分后的各尾数部分的第一移位数;所述指数处理电路用于将所述第一浮点数的指数部分以及所述第二浮点数的指数部分相加以得到第一运算结果,以及,将所述拆分后的各尾数部分的第一移位数与拆分后的各尾数部分对应的指数部分相加以得到多个第二运算结果,根据多个第二运算结果得到拆分后的各尾数部分的第二移位数;所述计算电路用于根据所述拆分后的各尾数部分以及所述拆分后的各尾数部分的第二移位数计算所述第一浮点数与所述第二浮点数的尾数部分的乘积。
本申请实施例提供了一种浮点数计算电路,该浮点数计算电路包括的拆分电路拆分第一浮点数的尾数部分与第二浮点数的尾数部分。指数处理电路得到拆分后的各尾数部分的第二移位数。计算电路根据拆分后的各尾数部分以及拆分后的各尾数部分的第二移位数计算第一浮点数与第二浮点数的尾数部分的乘积。该浮点数计算电路可以把位数较大的浮点数拆分为位数较小的浮点数,从而采用较小位数的乘法器来计算该位数较大的浮点数,本申请提供的浮点数计算电路时序开销短,硬件设计代价低,合理的利用了乘法器的计算性能。
在第一方面的一种可能的实现方式中,所述拆分电路用于将所述第一浮点数的尾数部分拆分为第一高位尾数与第一低位尾数,将所述第二浮点数的尾数部分拆分为第二高位尾数与第二低位尾数,所述第一移位数用于指示各个高位尾数的最高位与各个低位尾数的最高位之间的移位差值。
该种可能的实现方式中,本申请提供的浮点数计算电路可以把第一浮点数位数较大的尾数部分拆分为位数较小的第一高位尾数与第一低位尾数,把第二浮点数位数较大的尾数部分拆分为位数较小的第二高位尾数与第二低位尾数,从而采用较小位数的乘法器来计算拆分后的各尾数部分的乘积,降低了硬件的设计成本,合理的利用了乘法器的计算性能。
在第一方面的一种可能的实现方式中,所述第一高位尾数包括第一尾数,所述第一低位尾数包括第二尾数,所述第二高位尾数包括第三尾数,所述第二低位尾数包括第四尾数。
该种可能的实现方式中,提供了对于浮点数尾数部分的一种具体的拆分方式,将FP32类型的浮点数的尾数部分采用该种拆分方式进行拆分后,可以使用FP16类型的乘法器来进行计算。同理,将FP64类型的浮点数的尾数部分采用该种拆分方式进行拆分后,可以使用FP32类型的乘法器来进行计算,将FP128类型的浮点数的尾数部分采用该种拆分方式进行拆分后,可以使用FP64类型的乘法器来进行计算。该种拆分方式可以实现采用较小位数的乘法器来计算位数较大的尾数部分的乘积。
在第一方面的一种可能的实现方式中,所述第一高位尾数包括第一尾数,所述第一低位尾数包括第二尾数、第三尾数、第四尾数以及第五尾数,所述第二高位尾数包括第六尾 数,所述第二低位尾数包括第七尾数、第八尾数,第九尾数以及第十尾数。
该种可能的实现方式中,提供了对于浮点数尾数部分的一种具体的拆分方式,将FP64类型的浮点数的尾数部分采用该种拆分方式进行拆分后,可以使用FP16类型的乘法器来进行计算。同理,将FP128类型的浮点数的尾数部分采用该种拆分方式进行拆分后,可以使用FP32类型的乘法器来进行计算。该种拆分方式可以实现采用较小位数的乘法器来计算位数较大的尾数部分的乘积。
在第一方面的一种可能的实现方式中,所述指数处理电路包括第一加法器、选择电路以及第二加法器;所述第一加法器的输入端与所述存储电路的第一输出端电连接,所述第一加法器的输出端与所述第二加法器的第一输入端电连接;所述第二加法器的第二输入端与所述选择电路的输出端电连接,所述第二加法器的输出端与所述计算电路的第一输入端电连接;所述第一加法器用于将所述拆分后的各尾数部分的第一移位数与拆分后的各尾数部分对应的指数部分相加以得到多个第二运算结果;所述选择电路用于选择所述多个第二运算结果中的最大值;所述第二加法器用于将所述多个第二运算结果中的最大值分别与各第二运算结果相减以得到所述拆分后的各尾数部分的第二移位数。
该种可能的实现方式提供了一种硬件方面具体的实现形式,提升了方案的可实现性。
在第一方面的一种可能的实现方式中,所述计算电路包括乘法器、移位寄存器以及第三加法器;所述乘法器的输入端与所述存储电路的第二输出端电连接,所述乘法器的输出端与所述移位寄存器的第一输入端电连接;所述移位寄存器的第二输入端与所述第二加法器的输出端电连接;所述移位寄存器的输出端与所述第三加法器的输入端电连接;所述乘法器用于将第一高位尾数以及第一低位尾数拆分出的各尾数部分分别与第二高位尾数以及第二低位尾数拆分出的各尾数部分相乘得到多个乘法数据;所述移位寄存器用于根据所述拆分后的各尾数部分的第二移位数对所述多个乘法数据做移位处理;所述第三加法器用于对移位处理后的多个乘法数据做加法运算以得到所述第一浮点数与所述第二浮点数的尾数部分的乘积。
该种可能的实现方式提供了一种硬件方面具体的实现形式,提升了方案的可实现性。
本申请实施例第二方面提供了一种浮点数计算方法,该方法包括:获取第一浮点数以及第二浮点数;拆分所述第一浮点数的尾数部分与所述第二浮点数的尾数部分以及得到拆分后的各尾数部分的第一移位数;存储拆分后的各尾数部分、拆分后的各尾数部分对应的指数部分以及所述拆分后的各尾数部分的第一移位数;将所述第一浮点数的指数部分以及所述第二浮点数的指数部分相加以得到第一运算结果,以及,将所述拆分后的各尾数部分的第一移位数与拆分后的各尾数部分对应的指数部分相加以得到多个第二运算结果,根据多个第二运算结果得到拆分后的各尾数部分的第二移位数;根据所述拆分后的各尾数部分以及所述拆分后的各尾数部分的第二移位数计算所述第一浮点数与所述第二浮点数的尾数部分的乘积。
本申请实施例中,拆分第一浮点数的尾数部分与第二浮点数的尾数部分后得到拆分后的各尾数部分的第二移位数。然后,根据拆分后的各尾数部分以及拆分后的各尾数部分的第二移位数计算第一浮点数与所述第二浮点数的尾数部分的乘积。该方法可以把位数较大 的浮点数拆分为位数较小的浮点数,从而采用较小位数的乘法器来计算该位数较大的浮点数,本申请提供的浮点数计算方法使得计算装置时序开销短,硬件设计代价低,合理的利用了计算装置中包括的乘法器的计算性能。
在第二方面的一种可能的实现方式中,所述拆分所述第一浮点数的尾数部分与所述第二浮点数的尾数部分,包括:将所述第一浮点数的尾数部分拆分为第一高位尾数与第一低位尾数,将所述第二浮点数的尾数部分拆分为第二高位尾数与第二低位尾数,所述第一移位数用于指示各个高位尾数的最高位与各个低位尾数的最高位之间的移位差值。
该种可能的实现方式中,本申请提供的浮点数计算方法可以把第一浮点数位数较大的尾数部分拆分为位数较小的第一高位尾数与第一低位尾数,把第二浮点数位数较大的尾数部分拆分为位数较小的第二高位尾数与第二低位尾数,从而采用较小位数的乘法器来计算拆分后的各尾数部分的乘积,降低了硬件的设计成本,合理的利用了乘法器的计算性能。
在第二方面的一种可能的实现方式中,所述第一高位尾数包括第一尾数,所述第一低位尾数包括第二尾数,所述第二高位尾数包括第三尾数,所述第二低位尾数包括第四尾数。
该种可能的实现方式中,提供了对于浮点数尾数部分的一种具体的拆分方式,将FP32类型的浮点数的尾数部分采用该种拆分方式进行拆分后,可以使用FP16类型的乘法器来进行计算。同理,将FP64类型的浮点数的尾数部分采用该种拆分方式进行拆分后,可以使用FP32类型的乘法器来进行计算,将FP128类型的浮点数的尾数部分采用该种拆分方式进行拆分后,可以使用FP64类型的乘法器来进行计算。该种拆分方式可以实现采用较小位数的乘法器来计算位数较大的尾数部分的乘积。
在第二方面的一种可能的实现方式中,所述第一高位尾数包括第一尾数,所述第一低位尾数包括第二尾数、第三尾数、第四尾数以及第五尾数,所述第二高位尾数包括第六尾数,所述第二低位尾数包括第七尾数、第八尾数,第九尾数以及第十尾数。
该种可能的实现方式中,提供了对于浮点数尾数部分的一种具体的拆分方式,将FP64类型的浮点数的尾数部分采用该种拆分方式进行拆分后,可以使用FP16类型的乘法器来进行计算。同理,将FP128类型的浮点数的尾数部分采用该种拆分方式进行拆分后,可以使用FP32类型的乘法器来进行计算。该种拆分方式可以实现采用较小位数的乘法器来计算位数较大的尾数部分的乘积。
本申请实施例第三方面提供了一种计算装置,所述计算装置包括控制电路以及浮点数计算电路。所述浮点数计算电路在所述控制电路的控制下计算数据,所述浮点数计算电路为如上述第一方面或第一方面任意一种可能实现方式中所描述的浮点数计算电路。
附图说明
图1为本申请提供的卷积神经网络的处理原理图;
图2为本申请实施例提供的FP32类型的浮点数的组成示意图;
图3为本申请实施例提供的浮点数计算电路的一结构示意图;
图4为本申请实施例提供的浮点数计算电路的一实施例示意图;
图5为本申请实施例提供的浮点数计算电路的另一结构示意图;
图6为本申请实施例提供的浮点数计算电路的另一实施例示意图;
图7为本申请实施例提供的浮点数计算电路的另一实施例示意图;
图8为本申请实施例提供的浮点数计算电路的另一实施例示意图;
图9为本申请实施例提供的浮点数计算电路的另一实施例示意图;
图10为本申请实施例提供的浮点数计算电路的另一实施例示意图;
图11为本申请实施例提供的浮点数计算电路的另一实施例示意图;
图12为本申请实施例提供的浮点数计算电路的另一实施例示意图;
图13为本申请实施例提供的浮点数计算电路的另一实施例示意图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,下面结合附图,对本申请的实施例进行描述,显然,所描述的实施例仅仅是本申请一部分的实施例,而不是全部的实施例。本领域普通技术人员可知,随着新应用场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或模块的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或模块,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或模块。在本申请中出现的对步骤进行的命名或者编号,并不意味着必须按照命名或者编号所指示的时间/逻辑先后顺序执行方法流程中的步骤,已经命名或者编号的流程步骤可以根据要实现的技术目的变更执行次序,只要能达到相同或者相类似的技术效果即可。
人工智能(Artificial Intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个分支,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式作出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。人工智能领域的研究包括机器人,自然语言处理,计算机视觉,决策与推理,人机交互,推荐与搜索,AI基础理论等。
图1为本申请提供的卷积神经网络的处理原理图。
卷积神经网络CNN在图像、语音识别等领域有广泛的应用前景。如图1所示,卷积神经网络需要对多个卷积核以及一张或者多张特征图进行卷积运算。具体的,对于每一个卷积核,将其从特征图的第一个像素开始,延行方向逐像素移动,当到达此行的终点时,在列方向下移一个像素,行方向的回到起点,并且重复行方向移动过程,直到遍历特征图的所有像素。在卷积核移动的过程中,将卷积核中的参数和特征图相中对应位置的数据作为 卷积运算的两部分输入,进行卷积操作(两两相乘再将乘积逐个累加),得到卷积结果后输出该卷积结果。
卷积神经网络(convolution neural network,CNN)目前被广泛应用于多种类型的图像处理应用中,图像处理应用在使用浮点数(floating point,FP)16类型的数据对模型进行网络训练时,由于FP16类型的数据精度不足,会导致网络训练不收敛或者收敛速度慢,所以需要使用更高精度的FP32类型的数据来保证网络训练效果。此外,在某些应用中,需要使用更高精度的FP64类型的数据以及FP128类型的数据来进行模型训练。
需要说明的是,本发明中涉及的浮点数计算电路除了可以应用于人工智能领域外,还可以应用于数据信号处理领域,比如图像处理系统,雷达系统和通讯系统。此电路和方法可以优化数字信号处理(digital signal processing,DSP)或其它数字设备的性能。比如应用于长期演进(long term evolution,LTE)、通用移动通信系统(universal mobile telecommunications system,UMTS)、全球移动通信系统(global system for mobile communications,GSM)等现行通讯系统中的数字设备。
现有的数据计算方案中,通常采用较大位数的乘法器来计算数据。例如,通常复用用于计算FP64数据的乘法器来计算FP64数据和FP32数据。某些计算方案设计了一个54bit的乘法器,用于直接支持FP64数据的尾数(mantissa)的计算。当该乘法器计算FP32数据时,逻辑上将54bit的乘法器分为两个27bit部分,分别用于支持两对FP32数据的mantissa部分的计算。然而,从面积比例来看,一个FP64乘法器的面积开销,约等于四个FP32乘法器。但现有技术复用FP64乘法器计算FP32数据时,只实现了两倍FP32乘法器的计算性能,且FP64乘法器的时序开销长,硬件设计代价较高。因此,采用较大位数的乘法器来计算数据时时序开销以及硬件设计等方面的表现令人不甚满意。
针对现有的数据计算方案所存在的上述问题,本申请实施例提供了一种浮点数计算电路,该浮点数计算电路包括的拆分电路拆分第一浮点数的尾数部分与第二浮点数的尾数部分以及得到拆分后的各尾数部分的第一移位数,指数处理电路将拆分后的各尾数部分的第一移位数与拆分后的各尾数部分对应的指数部分相加以得到多个第二运算结果,根据多个第二运算结果得到拆分后的各尾数部分的第二移位数。计算电路根据拆分后的各尾数部分以及拆分后的各尾数部分的第二移位数计算所述第一浮点数与所述第二浮点数的尾数部分的乘积。该浮点数计算电路可以把位数较大的浮点数拆分为位数较小的浮点数,从而采用较小位数的乘法器来计算该位数较大的浮点数,合理的利用了乘法器的计算性能,时序开销短,硬件设计代价较低。
下面将结合本申请中的附图,对本申请中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。下面几个具体实施例可以相互结合,对于相同或相似的内容,在不同的实施例中不再进行赘述。还需说明的是,本申请实施例中所示出的各种部件的长度、宽度、高度(或厚度)仅为示例性说明,并非对本申请的存储单元的限定。
目前有四种格式的浮点数较为常见,分别为FP16、FP32、FP64以及FP128。其中,每种浮点数都由三部分组成,分别是符号位(sign)、指数位(exp)和尾数位(mantissa)。 一个浮点数的实际值等于sign*2 exp*mantissa。
图2为本申请实施例提供的FP32类型的浮点数的组成示意图。
如图2所示,FP32类型的浮点数有1bit的sign,8bit的exp和24bit的mantissa,显示存储的共计32bit。其中mantissa的最高位隐式存储(如果exp不为0,则hiden bit为1,否则hiden bit为0),三部分共计32bit。
在计算浮点数A*B时,指数部分的计算过程为A_exp+B_exp,尾数部分的计算过程为A_mantissa*B_mantissa。然后将新得到的exp和mantissa按照标准中的格式生成新的浮点数。
在计算浮点数A+B时,先求出A_exp和B_exp中较大的一个。假设,A_exp比B_exp大n。然后在mantissa相加时,就需要先将B_mantissa右移n个bit,然后再与A_mantissa相加,得到新的mantissa,再根据标准生成新的的浮点数。在计算多个浮点数一起相加时,会先求得其中最大的exp,然后根据最大的exp与各个浮点数的exp之间的差值,对mantissa做相应的移位,然后再将移位后的mantissa相加。
图3是本申请实施例提供的浮点数计算电路的一结构示意图。
请参阅图3,本申请提供的浮点数计算电路100包括:内存控制器101、拆分电路102、存储电路103、指数处理电路104以及计算电路105。
本申请实施例中,拆分电路102的输入端与内存控制器101的输出端电连接,拆分电路102的输出端与存储电路103的输入端电连接。指数处理电路104的输入端与存储电路103的第一输出端电连接,指数处理电路104的输出端与计算电路105的第一输入端电连接。计算电路105的第二输入端与存储电路103的第二输出端电连接。
本申请实施例中,内存中存储有第一浮点数以及第二浮点数,内存控制器101用于获取第一浮点数以及第二浮点数,可选的,该内存可以是双倍数据速率(double data rate,DDR)内存,也可以是其他内存,具体此处不做限定。该内存控制器可以是DDR控制器,也可以是其他类型的内存控制器,具体此处不做限定。
本申请实施例中,拆分电路102用于拆分第一浮点数的尾数部分与第二浮点数的尾数部分以及得到拆分后的各尾数部分的第一移位数。存储电路103用于存储拆分后的各尾数部分、拆分后的各尾数部分对应的指数部分以及拆分后的各尾数部分的第一移位数。
例如,若第一浮点数为FP32类型的浮点数,假设第一浮点数的mantissa部分为100000000000000000000001。拆分电路102可以将第一浮点数的尾数部分拆分为长度为12bit的A部分以及长度为12bit的B部分,A部分为100000000000,B部分为000000000001。若以A部分作为基准,拆分后得到的B部分要右移12位后再与A部分相加才会得到第一浮点数的mantissa部分,因此,拆分电路102得到拆分后的B部分的第一移位数为右移12位。
上述拆分方式仅用于举例说明,可选的,第一浮点数可以是FP32类型的浮点数,第一浮点数也可以是FP64类型的浮点数,第一浮点数还可以是FP128类型的浮点数,具体此处不做限定。可选的,第一浮点数的尾数部分拆分时可以拆分为两个部分,也可以拆分为多个部分,具体此处不做限定。拆分后的各尾数部分的位数可以相等,拆分后的各尾数部分 的位数也可以不相等,具体此处不做限定。
本申请实施例中,第二浮点数的数据类型与第一浮点数的数据类型相类似,第二浮点数的尾数部分的拆分方式与第一浮点数的尾数部分的拆分方式相类似,具体此处不做赘述。
本申请实施例中,指数处理电路104用于将第一浮点数的指数部分以及第二浮点数的指数部分相加以得到第一运算结果,该第一运算结果则为第一浮点数与第二浮点数相乘时指数部分的运算结果。指数处理电路104还用于将拆分后的各尾数部分的第一移位数与拆分后的各尾数部分对应的指数部分相加以得到多个第二运算结果,根据多个第二运算结果得到拆分后的各尾数部分的第二移位数。计算电路105用于根据拆分后的各尾数部分以及拆分后的各尾数部分的第二移位数计算第一浮点数与第二浮点数的尾数部分的乘积。
图4是本申请实施例提供的浮点数计算电路的一实施例示意图。
请参阅图4,可选的,拆分电路可以将第一浮点数的尾数部分拆分为第一高位尾数与第一低位尾数,将第二浮点数的尾数部分拆分为第二高位尾数与第二低位尾数。第一移位数用于指示各个高位尾数的最高位与各个低位尾数的最高位之间的移位差值。
本申请中,提供了两种第一高位尾数与第一低位尾数的具体的拆分方式,将在下面的实施例中详细说明。
方式1:第一高位尾数包括第一尾数,第一低位尾数包括第二尾数,第二高位尾数包括第三尾数,第二低位尾数包括第四尾数。
例如,若第一浮点数为FP32类型的浮点数,假设第一浮点数的mantissa部分为100000000011000000000001。拆分电路102可以将第一浮点数的尾数部分拆分为长度为11bit的第一尾数以及长度为13bit的第二尾数,第一尾数为10000000001,第二尾数为1000000000001。
本实施例中,第一尾数属于第一高位尾数,第二尾数属于第一低位尾数。第一移位数用于指示各个高位尾数的最高位与各个低位尾数的最高位之间的移位差值,即第一尾数的移位数为0,第二尾数的第一移位数为第二尾数的首位与第一尾数的首位之间的移位差值11位,所以第二尾数的第一移位数为右移11位。
本实施例中,第二高位尾数与第一高位尾数的拆分方式相类似,第二低位尾数与第一低位尾数的拆分方式相类似,具体此处不做赘述。
方式2:第一高位尾数包括第一尾数,第一低位尾数包括第二尾数、第三尾数、第四尾数以及第五尾数,第二高位尾数包括第六尾数,第二低位尾数包括第七尾数、第八尾数,第九尾数以及第十尾数。
例如,若第一浮点数为FP64类型的浮点数。假设拆分电路102可以将第一浮点数的尾数部分拆分为长度为5bit的第一尾数10001、长度为12bit的第二尾数100000000001、长度为12bit的第三尾数100000000011、长度为12bit的第四尾数100000000111以及长度为12bit的第五尾数100000001111。
本实施例中,第一尾数属于第一高位尾数,第二尾数、第三尾数、第四尾数、第五尾数属于第一低位尾数。第一移位数用于指示高位尾数的最高位与各个低位尾数的最高位之间的移位差值,即第一尾数的移位数为0,第二尾数的第一移位数为第二尾数的首位与第 一尾数的首位之间的移位差值5位,与第一尾数的位数相同,所以第二尾数的第一移位数为右移5位。第三尾数的第一移位数为第三尾数的首位与第一尾数的首位之间的移位差值17位,与第一尾数和第二尾数的移位数之和相同,所以第三尾数的第一移位数为右移17位。第四尾数的第一移位数为第四尾数的首位与第一尾数的首位之间的移位差值29位,与第一尾数、第二尾数以及第三尾数的移位数之和相同,所以第四尾数的第一移位数为右移29位。第五尾数的第一移位数为第五尾数的首位与第一尾数的首位之间的移位差值41位,与第一尾数、第二尾数、第三尾数以及第四尾数的移位数之和相同,所以第五尾数的第一移位数为右移41位。
本实施例中,第一高位尾数与第二高位尾数还可以有其他不同的拆分方式,例如,第一位数长度为9bit,第二尾数、第三尾数、第四尾数与第五尾数均为11bit,具体此处不做限定。
本实施例中,第二高位尾数与第一高位尾数的拆分方式相类似,第二低位尾数与第一低位尾数的拆分方式相类似,具体此处不做赘述。
本申请实施例中,除上述方式1、方式2所提供的拆分方式外,浮点数计算电路计算浮点数的乘积时还可以采用其他的拆分方式,具体此处不做限定。
图5是本申请实施例提供的浮点数计算电路的另一结构示意图。
请参阅图5,本申请实施例中,指数处理电路包括第一加法器、选择电路以及第二加法器。
本申请实施例中,第一加法器的输入端与存储电路的第一输出端电连接,第一加法器的输出端与第二加法器的第一输入端电连接。第二加法器的第二输入端与选择电路的输出端电连接,第二加法器的输出端与计算电路的第一输入端电连接。
本申请实施例中,第一加法器用于将拆分后的各尾数部分的第一移位数与拆分后的各尾数部分对应的指数部分相加以得到多个第二运算结果。选择电路用于选择多个第二运算结果中的最大值。第二加法器用于将多个第二运算结果中的最大值分别与各第二运算结果相减以得到拆分后的各尾数部分的第二移位数。
可选的,计算电路可以包括乘法器、移位寄存器以及第三加法器。
本申请实施例中,乘法器的输入端与存储电路的第二输出端电连接,乘法器的输出端与移位寄存器的第一输入端电连接。移位寄存器的第二输入端与第二加法器的输出端电连接。移位寄存器的输出端与第三加法器的输入端电连接。
本申请实施例中,乘法器用于将第一高位尾数以及第一低位尾数拆分出的各尾数部分分别与第二高位尾数以及第二低位尾数拆分出的各尾数部分相乘得到多个乘法数据。移位寄存器用于根据拆分后的各尾数部分的第二移位数对多个乘法数据做移位处理。第三加法器用于对移位处理后的多个乘法数据做加法运算以得到第一浮点数与第二浮点数的尾数部分的乘积。
示例1:
图6是本申请实施例提供的浮点数计算电路的另一实施例示意图。
请参阅图6,若第一浮点数A与第二浮点数B均为FP32类型的浮点数,在处理FP32 类型的浮点数的计算时,将第一浮点数的mantissa部分拆分为A MSB与A LSB两个部分。将第二浮点数的mantissa部分拆分为B MSB与B LSB两个部分,A MSB、A LSB、B MSB与B LSB均为12bit。则第一浮点数A的mantissa部分与第二浮点数B的mantissa部分的乘法,可以表示为公式1。
公式1:
A mantissa*B mantissa
=(A MSB+A LSB>>12bit)*(B MSB+B LSB>>12bit)
=A MSB*B MSB+A MSB*B LSB>>12bit+A LSB*B MSB>>12bit+A LSB*B LSB>>24bit
如图6所示,A MSB对应的指数部分为A_EXP,与B MSB对应的指数部分为B_EXP,根据拆分电路得到的A MSB的移位数为0,B MSB的移位数也为0,因此,EXP offset(第一加法器)将A MSB-0与B MSB-0的结果相加得到A_EXP+B_EXP,A_EXP+B_EXP便是A MSB*B MSB所对应的第二运算结果,该第二运算结果就可以表示A MSB*B MSB所对应的指数部分相乘之后的运算结果。
A MSB对应的指数部分为A_EXP,与B LSB对应的指数部分为B_EXP,根据拆分电路得到的A MSB的移位数为0,B LSB的移位数为-12。为了方便计算,可以将移位数-12拆分为-6与-6,指数部分分别记录为A_EXP-6与B_EXP-6。EXP offset(第一加法器)将A MSB-6与B LSB-6的结果相加得到A_EXP+B_EXP-12,A_EXP+B_EXP-12便是A MSB*B LSB所对应的第二运算结果,该第二运算结果就可以表示A MSB*B LSB所对应的指数部分相乘之后的运算结果。
A LSB对应的指数部分为A_EXP,与B MSB对应的指数部分为B_EXP,根据拆分电路得到的A LSB的移位数为-12,B MSB的移位数为0。为了方便计算,可以将移位数-12拆分为-6与-6,指数部分分别记录为A_EXP-6与B_EXP-6。EXP offset(第一加法器)将A LSB-6与B MSB-6的结果相加得到A_EXP+B_EXP-12,A_EXP+B_EXP-12便是A LSB*B MSB所对应的第二运算结果,该第二运算结果就可以表示A LSB*B MSB所对应的指数部分相乘之后的运算结果。
A LSB对应的指数部分为A_EXP,与B LSB对应的指数部分为B_EXP,根据拆分电路得到的A LSB的移位数为-12,B LSB的移位数为-12。EXP offset(第一加法器)将A LSB-12与B LSB-12的结果相加得到A_EXP+B_EXP-24,A_EXP+B_EXP-24便是A LSB*B LSB所对应的第二运算结果,该第二运算结果就可以表示A LSB*B LSB所对应的指数部分相乘之后的运算结果。
计算得到多个第二运算结果之后,选择电路得到MAX EXP(多个第二运算结果中的最大值)后将MAX EXP输入各个delta(第二加法器)。各个delta将MAX EXP分别与各第二运算结果相减以得到拆分后的各尾数部分的第二移位数。
各13 bit Mul单元(乘法器)分别计算A MSB*B MSB、A MSB*B LSB、A LSB*B MSB和A LSB*B LSB以得到多个乘法数据,shift(移位寄存器)接收到delta发送的第二移位数后对输入的各部分乘法数据进行移位,adder(第三加法器)将移位后的多个乘法数据相加则得到第一浮点数与第二浮点数尾数部分的乘积。
本实施例中,可选的,移位数-12还可以有其他的拆分方式,可以拆分为-3与-9、-4与-8或其他多种拆分方式,满足拆分后的两部分移位数的总数为-12即可,具体此处不做限定。同理,移位数-24也可以有不同的拆分方式,具体此处不做限定。
本实施例中,可选的,移位数-12还可以有其他的拆分方式,可以拆分为-3与-9、-4与-8或其他多种拆分方式,满足拆分后的两部分移位数的总数为-12即可,具体此处不做限定。同理,移位数-24也可以有不同的拆分方式,具体此处不做限定。
图7为本申请实施例提供的浮点数计算电路的另一实施例示意图。
本申请实施例中,请参阅图7,将上述图6所示的实施例视为一个计算模块,若多个计算模块对多对浮点数进行乘法运算时,选择电路可以选择多个计算模块中所有的第二运算结果的最大值(max exp),将所有的第二运算结果中的最大值返回至各个计算模块,各个计算模块根据该所有的第二运算结果中的最大值得到拆分后的各尾数部分的第二移位数。
示例2:若第一浮点数A与第二浮点数B均为FP64类型的浮点数,在处理FP64类型的浮点数的计算时,将第一浮点数的mantissa部分拆分为a0、a1、a2、a3、a4和a5五个部分。将第一浮点数的mantissa部分拆分为b0、b1、b2、b3、b4和b5五个部分。其中,a1、a2、a3、a4、b1、b2、b3、b4的位数均为12bit,a0、b0的位数为5bit。第一浮点数A的mantissa部分与第二浮点数B的mantissa部分的乘法可以表示为公式2。
公式2:
A mantissa*B mantissa
=(a0<<48bit+a1<<36bit+a2<<24bit+a3<<12bit+a4)*
(b0<<48bit+b1<<36bit+b2<<24bit+b3<<12bit+b4)
=a0*b0<<96bit
+(a0*b1+b0*a1)<<84bit
+(a0*b2+b0*a2+a1*b1)<<72bit
+(a0*b3+b0*a3+a1*b2+b1*a2)<<60bit
+(a0*b4+b0*a4+a1*b3+b1*a3+b2*a2)<<48bit
+(a1*b4+b1*a4+a2*b3+b2*a3)<<36bit
+(a2*b4+b2*a4+a3*b3)<<24bit
+(a3*b4+b3*a4)<<12bit
+a4*b4
指数电路以及计算电路计算第一浮点数与第二浮点数尾数部分的乘积的过程与上述示例1所示的实施例相类似,具体此处不做赘述。
本实施例中,由于FP64类型的浮点数的mantissa部分长度为53bit,所以,A_mantissa*B_mantissa计算后得到的尾数部分的总长度数为106bit。如果想在一个计算模块内直接完成一对FP64类型的浮点数的尾数部分的计算,adder(第三加法器)需要扩位成支持长度为106bit的数据计算的加法器,括位后的adder的面积代价和时序代价均过高。因此,可以选择将一对FP64的mantissa的乘法拆成两个部分来做。
图8为本申请实施例提供的浮点数计算电路的另一实施例示意图。
请参阅图8,本实施例中,可选的,浮点数计算电路可以将其中较高位的13对部分乘法组合到一起组成高位部分(part1),较低位的12对部分乘法组合为另一部分组成低位部分(part2)。高位部分共需60bit位宽的加法树,低位部分实际所需计算的bit数为53bit。
图9为本申请实施例提供的浮点数计算电路的另一实施例示意图。
如图9所示,分别是part1、part2各部分在计算后得到的计算结果在加法树中的对应位置。60bit的加法树能覆盖part1的计算。在进行part2的计算时,最低的几个bit加法树无法完全覆盖,但这些bit同样也不用参与计算。在处理这部分加法书无法覆盖的数据时,可以选择将该部分数据存储下来,进而将该部分存储好的数据参与后续的计算,也可以选择直接截掉该部分数据,具体此处不做限定。
本申请实施例中提供的浮点数计算电路可以应用于卷积神经网络中,具体的应用过程在下面的实施例中进行详细的说明。
假设第一浮点数A与第二浮点数B均为FP32类型的浮点数,且第一浮点数A是特征图像中的数据。
图10为本申请实施例提供的浮点数计算电路的另一实施例示意图。
步骤一:请参阅图10,第二浮点数B是滤波矩阵中的数据。DDR控制器(内存控制器)从DDR(内存)中读取多个第一浮点数A与第二浮点数B,通过高低位拆分逻辑(拆分电路)将第一浮点数A的mantissa部分拆分为MSB和LSB两个部分并且存入数据RAM(存储电路),图10中I、II、…X中包括的内容即为各第一浮点数A的mantissa拆分后得到的A_MSB与A_LSB,以及各A_MSB、A_LSB所对应的指数部分EXP,将第二浮点数B的mantissa部分拆分为MSB和LSB两个部分并且存入权重RAM(存储电路),图10中1、2、N中包括的内容即为各第二浮点数B的mantissa拆分后得到的B_MSB与B_LSB,以及各B_MSB、B_LSB所对应的指数部分EXP。
图11为本申请实施例提供的浮点数计算电路的另一实施例示意图。
步骤二:请参阅图11,权重RAM中拆分之后的mantissa预加载到卷积计算单元中,同时EXP(拆分后各尾数部分对应的指数部分)经过EXP offset(第二加法器)处理后,同样预加载到卷积计算单元中。
图12为本申请实施例提供的浮点数计算电路的另一实施例示意图。
步骤三:请参阅图12,从数据RAM中提取第一段mantissa数据(I部分),同样EXP部分也先经过exp offset处理后,放置到卷积计算单元中,与预加载的参数(1部分)进行计算并且获得结果。
图13为本申请实施例提供的浮点数计算电路的另一实施例示意图。
步骤四:请参阅图13,卷积处理单元1将第一段数据(I部分)转发给计算单元2,并且从数据RAM中获取第二段数据(II部分)。计算单元1在获取II部分数据之后、计算单元2在获取I部分数据之后完成运算生成结果。此后每个时钟,计算单元2~N将上一个时钟处理完毕的数据转发给下一个计算单元,计算单元1每次从数据RAM中获取新的数据。
步骤五:重复步骤四直到所有的数据完成运算,生成结果。
以上对本申请实施例所提供的浮点数计算电路以及浮点数计算方法进行了详细介绍, 本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想。同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (11)

  1. 一种浮点数计算电路,其特征在于,所述浮点数计算电路包括:内存控制器、拆分电路、存储电路、指数处理电路以及计算电路;
    所述拆分电路的输入端与所述内存控制器的输出端电连接,所述拆分电路的输出端与所述存储电路的输入端电连接;
    所述指数处理电路的输入端与所述存储电路的第一输出端电连接,所述指数处理电路的输出端与所述计算电路的第一输入端电连接;
    所述计算电路的第二输入端与所述存储电路的第二输出端电连接;
    所述内存控制器用于获取第一浮点数以及第二浮点数;
    所述拆分电路用于拆分所述第一浮点数的尾数部分与所述第二浮点数的尾数部分以及得到拆分后的各尾数部分的第一移位数;
    所述存储电路用于存储拆分后的各尾数部分、拆分后的各尾数部分对应的指数部分以及所述拆分后的各尾数部分的第一移位数;
    所述指数处理电路用于将所述第一浮点数的指数部分以及所述第二浮点数的指数部分相加以得到第一运算结果,以及,将所述拆分后的各尾数部分的第一移位数与拆分后的各尾数部分对应的指数部分相加以得到多个第二运算结果,根据多个第二运算结果得到拆分后的各尾数部分的第二移位数;
    所述计算电路用于根据所述拆分后的各尾数部分以及所述拆分后的各尾数部分的第二移位数计算所述第一浮点数与所述第二浮点数的尾数部分的乘积。
  2. 根据权利要求1所述的浮点数计算电路,其特征在于,
    所述拆分电路用于将所述第一浮点数的尾数部分拆分为第一高位尾数与第一低位尾数,将所述第二浮点数的尾数部分拆分为第二高位尾数与第二低位尾数,所述第一移位数用于指示各个高位尾数的最高位与各个低位尾数的最高位之间的移位差值。
  3. 根据权利要求2所述的浮点数计算电路,其特征在于,
    所述第一高位尾数包括第一尾数,所述第一低位尾数包括第二尾数,所述第二高位尾数包括第三尾数,所述第二低位尾数包括第四尾数。
  4. 根据权利要求2所述的浮点数计算电路,其特征在于,
    所述第一高位尾数包括第一尾数,所述第一低位尾数包括第二尾数、第三尾数、第四尾数以及第五尾数,所述第二高位尾数包括第六尾数,所述第二低位尾数包括第七尾数、第八尾数,第九尾数以及第十尾数。
  5. 根据权利要求3或4所述的浮点数计算电路,其特征在于,
    所述指数处理电路包括第一加法器、选择电路以及第二加法器;
    所述第一加法器的输入端与所述存储电路的第一输出端电连接,所述第一加法器的输出端与所述第二加法器的第一输入端电连接;
    所述第二加法器的第二输入端与所述选择电路的输出端电连接,所述第二加法器的输出端与所述计算电路的第一输入端电连接;
    所述第一加法器用于将所述拆分后的各尾数部分的第一移位数与拆分后的各尾数部分 对应的指数部分相加以得到多个第二运算结果;
    所述选择电路用于选择所述多个第二运算结果中的最大值;
    所述第二加法器用于将所述多个第二运算结果中的最大值分别与各第二运算结果相减以得到所述拆分后的各尾数部分的第二移位数。
  6. 根据权利要求5所述的浮点数计算电路,其特征在于,
    所述计算电路包括乘法器、移位寄存器以及第三加法器;
    所述乘法器的输入端与所述存储电路的第二输出端电连接,所述乘法器的输出端与所述移位寄存器的第一输入端电连接;
    所述移位寄存器的第二输入端与所述第二加法器的输出端电连接;
    所述移位寄存器的输出端与所述第三加法器的输入端电连接;
    所述乘法器用于将第一高位尾数以及第一低位尾数拆分出的各尾数部分分别与第二高位尾数以及第二低位尾数拆分出的各尾数部分相乘得到多个乘法数据;
    所述移位寄存器用于根据所述拆分后的各尾数部分的第二移位数对所述多个乘法数据做移位处理;
    所述第三加法器用于对移位处理后的多个乘法数据做加法运算以得到所述第一浮点数与所述第二浮点数的尾数部分的乘积。
  7. 一种浮点数计算方法,其特征在于,
    获取第一浮点数以及第二浮点数;
    拆分所述第一浮点数的尾数部分与所述第二浮点数的尾数部分以及得到拆分后的各尾数部分的第一移位数;
    存储拆分后的各尾数部分、拆分后的各尾数部分对应的指数部分以及所述拆分后的各尾数部分的第一移位数;
    将所述第一浮点数的指数部分以及所述第二浮点数的指数部分相加以得到第一运算结果,以及,将所述拆分后的各尾数部分的第一移位数与拆分后的各尾数部分对应的指数部分相加以得到多个第二运算结果,根据多个第二运算结果得到拆分后的各尾数部分的第二移位数;
    根据所述拆分后的各尾数部分以及所述拆分后的各尾数部分的第二移位数计算所述第一浮点数与所述第二浮点数的尾数部分的乘积。
  8. 根据权利要求7所述的浮点数计算方法,其特征在于,所述拆分所述第一浮点数的尾数部分与所述第二浮点数的尾数部分,包括:
    将所述第一浮点数的尾数部分拆分为第一高位尾数与第一低位尾数,将所述第二浮点数的尾数部分拆分为第二高位尾数与第二低位尾数,所述第一移位数用于指示各个高位尾数的最高位与各个低位尾数的最高位之间的移位差值。
  9. 根据权利要求8所述的浮点数计算方法,其特征在于,
    所述第一高位尾数包括第一尾数,所述第一低位尾数包括第二尾数,所述第二高位尾数包括第三尾数,所述第二低位尾数包括第四尾数。
  10. 根据权利要求8所述的浮点数计算方法,其特征在于,
    所述第一高位尾数包括第一尾数,所述第一低位尾数包括第二尾数、第三尾数、第四尾数以及第五尾数,所述第二高位尾数包括第六尾数,所述第二低位尾数包括第七尾数、第八尾数,第九尾数以及第十尾数。
  11. 一种计算装置,其特征在于,所述计算装置包括控制电路以及浮点数计算电路;
    所述浮点数计算电路在所述控制电路的控制下计算数据,所述浮点数计算电路为如权利要求1至6中任一项所述的浮点数计算电路。
PCT/CN2020/125676 2020-10-31 2020-10-31 一种浮点数计算电路以及浮点数计算方法 WO2022088157A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
PCT/CN2020/125676 WO2022088157A1 (zh) 2020-10-31 2020-10-31 一种浮点数计算电路以及浮点数计算方法
EP20959318.5A EP4220379A4 (en) 2020-10-31 2020-10-31 FLOATING-POINT NUMBER CALCULATION CIRCUIT AND FLOATING-POINT NUMBER CALCULATION METHOD
CN202080102852.5A CN115812194A (zh) 2020-10-31 2020-10-31 一种浮点数计算电路以及浮点数计算方法
US18/309,269 US20230266941A1 (en) 2020-10-31 2023-04-28 Floating Point Number Calculation Circuit and Floating Point Number Calculation Method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/125676 WO2022088157A1 (zh) 2020-10-31 2020-10-31 一种浮点数计算电路以及浮点数计算方法

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/309,269 Continuation US20230266941A1 (en) 2020-10-31 2023-04-28 Floating Point Number Calculation Circuit and Floating Point Number Calculation Method

Publications (1)

Publication Number Publication Date
WO2022088157A1 true WO2022088157A1 (zh) 2022-05-05

Family

ID=81381701

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/125676 WO2022088157A1 (zh) 2020-10-31 2020-10-31 一种浮点数计算电路以及浮点数计算方法

Country Status (4)

Country Link
US (1) US20230266941A1 (zh)
EP (1) EP4220379A4 (zh)
CN (1) CN115812194A (zh)
WO (1) WO2022088157A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160248439A1 (en) * 2015-02-25 2016-08-25 Renesas Electronics Corporation Floating-point adder, semiconductor device, and control method for floating-point adder
CN107305485A (zh) * 2016-04-25 2017-10-31 北京中科寒武纪科技有限公司 一种用于执行多个浮点数相加的装置及方法
CN109508173A (zh) * 2017-09-14 2019-03-22 英特尔公司 具有次正规支持的浮点加法器电路
CN109901814A (zh) * 2019-02-14 2019-06-18 上海交通大学 自定义浮点数及其计算方法和硬件结构
CN110221808A (zh) * 2019-06-03 2019-09-10 深圳芯英科技有限公司 向量乘加运算的预处理方法、乘加器及计算机可读介质

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8037119B1 (en) * 2006-02-21 2011-10-11 Nvidia Corporation Multipurpose functional unit with single-precision and double-precision operations
US10691413B2 (en) * 2018-05-04 2020-06-23 Microsoft Technology Licensing, Llc Block floating point computations using reduced bit-width vectors
US11169776B2 (en) * 2019-06-28 2021-11-09 Intel Corporation Decomposed floating point multiplication

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160248439A1 (en) * 2015-02-25 2016-08-25 Renesas Electronics Corporation Floating-point adder, semiconductor device, and control method for floating-point adder
CN107305485A (zh) * 2016-04-25 2017-10-31 北京中科寒武纪科技有限公司 一种用于执行多个浮点数相加的装置及方法
CN109508173A (zh) * 2017-09-14 2019-03-22 英特尔公司 具有次正规支持的浮点加法器电路
CN109901814A (zh) * 2019-02-14 2019-06-18 上海交通大学 自定义浮点数及其计算方法和硬件结构
CN110221808A (zh) * 2019-06-03 2019-09-10 深圳芯英科技有限公司 向量乘加运算的预处理方法、乘加器及计算机可读介质

Also Published As

Publication number Publication date
US20230266941A1 (en) 2023-08-24
EP4220379A4 (en) 2023-11-01
EP4220379A1 (en) 2023-08-02
CN115812194A (zh) 2023-03-17

Similar Documents

Publication Publication Date Title
US20200218509A1 (en) Multiplication Circuit, System on Chip, and Electronic Device
CN107305484B (zh) 一种非线性函数运算装置及方法
JPS6132437Y2 (zh)
JPS60229140A (ja) 倍精度乗算器
CN113741858B (zh) 存内乘加计算方法、装置、芯片和计算设备
CN112651496A (zh) 一种处理激活函数的硬件电路及芯片
CN112651497A (zh) 一种基于硬件芯片的激活函数处理方法、装置及集成电路
CN112101541B (zh) 对高位宽值数据进行拆分的装置、方法、芯片及板卡
KR20210012882A (ko) 컨볼루션 뉴럴 네트워크의 성능 향상을 위한 방법 및 시스템
US20060106905A1 (en) Method for reducing memory size in logarithmic number system arithmetic units
CA2329104C (en) Method and apparatus for calculating a reciprocal
Rekha et al. FPGA implementation of exponential function using cordic IP core for extended input range
WO2022088157A1 (zh) 一种浮点数计算电路以及浮点数计算方法
CN113296732B (zh) 数据处理方法和装置,处理器及数据搜索方法和装置
CN113138749A (zh) 基于cordic算法的三角函数计算装置及方法
CN113126954B (zh) 浮点数乘法计算的方法、装置和算术逻辑单元
CN116719499A (zh) 一种应用于5g最小二乘定位的自适应伪逆计算方法
CN107015783B (zh) 一种浮点角度压缩实现方法及装置
WO2023028884A1 (zh) 一种浮点数计算电路以及浮点数计算方法
CN114860193A (zh) 一种用于计算Power函数的硬件运算电路及数据处理方法
CN111988031B (zh) 一种忆阻存内矢量矩阵运算器及运算方法
Muscedere et al. On efficient techniques for difficult operations in one and two-digit DBNS index calculus
WO2019127480A1 (zh) 用于处理数值数据的方法、设备和计算机可读存储介质
JPS61262925A (ja) 演算回路
GB1476603A (en) Digital multipliers

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20959318

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2020959318

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2020959318

Country of ref document: EP

Effective date: 20230427

NENP Non-entry into the national phase

Ref country code: DE