WO2022170811A1 - Fixed-point multiply-add operation unit and method suitable for mixed-precision neural network - Google Patents

Fixed-point multiply-add operation unit and method suitable for mixed-precision neural network Download PDF

Info

Publication number
WO2022170811A1
WO2022170811A1 PCT/CN2021/131800 CN2021131800W WO2022170811A1 WO 2022170811 A1 WO2022170811 A1 WO 2022170811A1 CN 2021131800 W CN2021131800 W CN 2021131800W WO 2022170811 A1 WO2022170811 A1 WO 2022170811A1
Authority
WO
WIPO (PCT)
Prior art keywords
multiplier
input data
partial product
precision
data
Prior art date
Application number
PCT/CN2021/131800
Other languages
French (fr)
Chinese (zh)
Inventor
毛伟
余浩
安丰伟
李凯
周俊卓
王宇航
王祥龙
石港
Original Assignee
南方科技大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 南方科技大学 filed Critical 南方科技大学
Publication of WO2022170811A1 publication Critical patent/WO2022170811A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/57Arithmetic logic units [ALU], i.e. arrangements or devices for performing two or more of the operations covered by groups G06F7/483 – G06F7/556 or for performing logical operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the invention relates to the field of digital circuits, in particular to a fixed-point multiply-add operation unit and method suitable for a mixed-precision neural network.
  • artificial intelligence algorithms are widely used in many commercial fields.
  • the quantization of different layers of the network is one of the important methods to improve the efficiency of network computing.
  • artificial intelligence chips have an increasing demand for mixed-precision computing in the process of data processing in order to meet the characteristics of network design.
  • Conventional processors use a variety of processing units with different precisions to process mixed-precision operations. This method makes the hardware overhead too large, redundant idle resources, and excessive delays when switching between different precision hardware, reducing throughput, and cannot target applications.
  • the technical problem to be solved by the present invention is to provide a fixed-point multiply-add operation unit and method suitable for mixed-precision neural network in view of the above-mentioned defects of the prior art, aiming at solving the need to use a variety of different precision processing in the prior art
  • the unit processes mixed-precision operations, resulting in problems such as excessive hardware overhead and redundant idle resources.
  • an embodiment of the present invention provides a fixed-point multiply-add operation method suitable for mixed-precision neural networks, wherein the method includes:
  • the target sum is truncated, and the data obtained after the truncated selection is used as the result of the dot product of the input data.
  • the acquiring the mode signal and the input data, determining the data input position according to the mode signal, and inputting the input data from the data input position into the multiplier includes:
  • the number of called multipliers is greater than 1;
  • the number of called multipliers is 1;
  • a data input location is determined based on the mode signal, and the input data is input into a multiplier from the data input location.
  • the acquiring mode signal, processing the partial product generated by the multiplier according to the mode signal, and performing a summation operation, using the data obtained after the summation operation as the target sum including :
  • a summation operation is performed on the first partial product generation part and the second partial product generation part, and the data obtained after the summation operation is used as a target sum.
  • the mode signal is determined by the precision of the input data; the processing includes at least one of the following operations:
  • the first partial product generating part and the second The second partial product generation part performs a summation operation, and obtains the target sum based on the summation operation, including:
  • the output result of the second stage compressor c is input into an adder, and the output result of the adder is used as a target sum.
  • performing a summation operation on the first partial product generation part and the second partial product generation part performs a target based on the summation operation and includes:
  • stage compressor b When the highest bit number of the input data is equal to the highest bit number of the multiplier, the first partial product generating part and the second partial product generating part are input to the first stage compressor a and the first stage compressor a respectively.
  • stage compressor b When the highest bit number of the input data is equal to the highest bit number of the multiplier, the first partial product generating part and the second partial product generating part are input to the first stage compressor a and the first stage compressor a respectively.
  • stage compressor b When the highest bit number of the input data is equal to the highest bit number of the multiplier, the first partial product generating part and the second partial product generating part are input to the first stage compressor a and the first stage compressor a respectively.
  • stage compressor b When the highest bit number of the input data is equal to the highest bit number of the multiplier, the first partial product generating part and the second partial product generating part are input to the first stage compressor a and the first stage compressor a respectively.
  • stage compressor b When the highest bit number of the input data is
  • the output results of the first-stage compressor a and the first-stage compressor b are respectively input into the first adder and the second adder, and the first adder and the second adder are The sum of the output results is used as the target sum.
  • performing a summation operation on the first partial product generation part and the second partial product generation part performs a target based on the summation operation and includes:
  • the multiplier When the highest bit number of the input data is greater than the highest bit number of the multiplier, the multiplier includes a first multiplier and a second multiplier, and the second multiplier is a low-order operation multiplier; the first multiplier outputs the first partial product generating part, and the second multiplier outputs the second partial product generating part;
  • the sum of the output results of the first adder and the second adder is used as the target sum.
  • the said target sum is truncated, and the data obtained after the truncated selection is used as the result of the dot product of the input data, comprising:
  • a truncation operation is performed on the target sum starting from the 0th bit, and the data obtained after the truncation and selection operation is used as the result of the dot product of the input data.
  • the method further includes:
  • an embodiment of the present invention further provides a fixed-point multiply-add operation unit suitable for a mixed-precision neural network, wherein the operation unit includes:
  • a position determination module for acquiring a mode signal and input data, determining a data input position according to the mode signal, and inputting the input data into a multiplier from the data input position;
  • a partial product processing module for processing the partial product generated by the multiplier according to the mode signal, and performing a summation operation, using the data obtained after the summation operation as a target sum;
  • the result generation module is used for intercepting the target sum, and using the data obtained after the interception as the result of the dot product of the input data.
  • the present invention outputs the partial product generating part after the partial product of the designated area is masked according to the mode signal by inputting the precision of the input data with different precisions into the multiplier, and controls the output part to the output part.
  • the product generation part performs the summation operation according to the methods corresponding to different precisions, so as to realize the dot multiplication operation of mixed precision.
  • a multiplier can be used to realize the point multiplication operation of the mixed-precision neural network, which solves the need to use a variety of processing units with different precisions in the prior art to process the mixed-precision operation, resulting in excessive hardware overhead and idle time. Resource redundancy and other issues.
  • FIG. 1 is a schematic flowchart of a fixed-point multiply-add operation method suitable for a mixed-precision neural network provided by an embodiment of the present invention.
  • FIG. 2 is a schematic diagram of a partial product generated in a conventional 8bit ⁇ 8bit multiplier provided by an embodiment of the present invention.
  • FIG. 3 is an addition tree structure used by a conventional 8bit ⁇ 8bit multiplier provided by an embodiment of the present invention.
  • FIG. 4 is a reference diagram for implementing multiplication operations of four groups of input data with a precision of 2bit ⁇ 2bit based on a group of 8bit ⁇ 8bit multipliers provided by an embodiment of the present invention.
  • FIG. 5 is a reference diagram for implementing multiplication operations of two groups of input data with a precision of 4bit ⁇ 4bit based on a group of 8bit ⁇ 8bit multipliers provided by an embodiment of the present invention.
  • FIG. 6 is a reference diagram for implementing a multiplication operation of input data with a precision of 1 bit ⁇ 1 bit based on an 8bit ⁇ 8bit multiplier according to an embodiment of the present invention.
  • FIG. 7 is a reference diagram for implementing a multiplication operation of input data with a precision of 3 bits ⁇ 3 bits based on an 8bit ⁇ 8bit multiplier according to an embodiment of the present invention.
  • FIG. 8 is a reference diagram for implementing a multiplication operation of input data with a precision of 5bit ⁇ 5bit based on an 8bit ⁇ 8bit multiplier according to an embodiment of the present invention.
  • FIG. 9 is a reference diagram for implementing a multiplication operation of input data with a precision of 6bit ⁇ 6bit based on an 8bit ⁇ 8bit multiplier according to an embodiment of the present invention.
  • FIG. 10 is a reference diagram for implementing a multiplication operation of input data with a precision of 7bit ⁇ 7bit based on an 8bit ⁇ 8bit multiplier according to an embodiment of the present invention.
  • 11 is a reference diagram for realizing multiplication of two 4bit ⁇ 8bit mixed-precision input data by dividing and summing the partial product generation part based on an 8bit ⁇ 8bit multiplier according to an embodiment of the present invention.
  • FIG. 12 is a reference diagram for implementing a multiplication operation of input data with a mixed precision of 8bit ⁇ 16bit based on two sets of 8bit ⁇ 8bit multipliers according to an embodiment of the present invention.
  • FIG. 13 is a schematic diagram of accumulating the output data of the first multiplier and the second multiplier under mixed precision provided by an embodiment of the present invention.
  • FIG. 15 is a reference diagram for implementing a multiplication operation of input data with a mixed precision of 8bit ⁇ 15bit based on two sets of 8bit ⁇ 8bit multipliers provided by an embodiment of the present invention.
  • FIG. 16 is a schematic diagram of a partial product including a sign bit in an 8bit ⁇ 8bit multiplier according to an embodiment of the present invention.
  • FIG. 17 is a reference diagram of an internal module of an arithmetic unit provided by an embodiment of the present invention.
  • artificial intelligence algorithms are widely used in many commercial fields.
  • the quantization of different layers of the network is one of the important methods to improve the efficiency of network computing.
  • artificial intelligence chips have an increasing demand for mixed-precision computing in the process of data processing in order to meet the characteristics of network design.
  • Conventional processors use a variety of processing units with different precisions to process mixed-precision operations. This method makes the hardware overhead too large, redundant idle resources, and excessive delays when switching between different-precision hardware, reducing throughput, and cannot target applications.
  • the present invention provides a fixed-point multiply-add operation method suitable for mixed-precision neural networks.
  • the multiplication is controlled according to the mode signal.
  • the device After masking the partial product of the specified area, the device outputs the partial product generation part, and performs the sum operation on the output partial product generation part according to the method corresponding to different precisions, so as to realize the mixed precision dot product operation.
  • a multiplier can be used to realize the point multiplication operation of the mixed-precision neural network, which solves the need to use a variety of processing units with different precisions in the prior art to process the mixed-precision operation, resulting in excessive hardware overhead and idle time. Resource redundancy and other issues.
  • the method includes the following:
  • Step S100 Acquire a mode signal and input data, determine a data input position according to the mode signal, and input the input data into the multiplier from the data input position.
  • this embodiment uses a unified multiplier to perform the dot multiplication operation of the mixed-precision neural network, and the number of bits of the input position of the multiplier is fixed, it is possible that the precision of the input data is different from the highest number of bits of the multiplier. matching situation.
  • this embodiment needs to acquire a mode signal and input data, determine the data input position according to the mode signal, and then input the input data from the data input position into the multiplier middle.
  • input data of different precisions are input into the multipliers from different data input positions, thereby implementing the point multiplication operation of the mixed precision neural network by using a unified multiplier.
  • step S100 specifically includes the following steps:
  • Step S110 obtain mode signal and input data, determine the quantity of the multiplier called according to the precision of described input data
  • Step S120 when the highest precision of the input data is higher than the highest bit of the multiplier, the number of called multipliers is greater than 1;
  • Step S130 determining the data input position according to the mode signal, splitting the data with the highest precision in the input data, and inputting the input data obtained after the splitting into the multiplier from the data input position;
  • Step S140 when the highest precision of the input data is lower than or equal to the highest bit of the multiplier, the number of called multipliers is 1;
  • Step S150 Determine the data input position according to the mode signal, and input the input data into the multiplier from the data input position.
  • this embodiment adopts a unified multiplier, and the highest number of bits of the multiplier is fixed, it may happen that the precision of the multiplier does not match the precision of the input data.
  • the multiplier is an 8bit ⁇ 8bit multiplier, and the input data
  • the precision is 3bit ⁇ 3bit, or the multiplier is 8bit ⁇ 8bit multiplier, and the precision of the input data is 8bit ⁇ 16bit. Therefore, the number of called multipliers needs to be determined according to the precision of the input data. It can be understood that if the precision of the input data exceeds the precision of the multiplier, the input data cannot be multiplied by one multiplier, and in this case, multiple multipliers need to be called.
  • the number of called multipliers is greater than 1, and then the data input position is determined according to the mode signal, and the highest bit in the input data is determined.
  • the precision data is split, and the input data obtained after the split is input into the multiplier from the data input position.
  • the input data is mixed-precision 8bit ⁇ 16bit
  • the multiplier is an 8bit ⁇ 8bit multiplier
  • two 8bit ⁇ 8bit multipliers need to be called to realize the multiplication of mixed-precision 8bit ⁇ 16bit input data.
  • the 8-bit part of the data can be directly input into the multiplier from the specified data input position, and the 16-bit part of the input data needs to be split before being input into the two multipliers respectively (as shown in Figure 12).
  • the data input position is determined according to the mode signal, and the input data is converted from the data The input position is entered into the multiplier.
  • the precision of the input data is 3bit ⁇ 3bit
  • the multiplier uses an 8bit ⁇ 8bit multiplier
  • only one 8bit ⁇ 8bit multiplier needs to be called to realize the multiplication of the input data.
  • the highest precision of the data does not exceed the highest bit of the multiplier, so the input data can be directly input into the multiplier from the specified data input position for operation (as shown in Figure 7).
  • the method further includes the following steps:
  • Step S200 Obtain a mode signal, process the partial product generated by the multiplier according to the mode signal, perform a summation operation, and use the data obtained after the summation operation as a target sum.
  • the highest bit number of the multiplier may not be equal to the precision of the input data.
  • the concept of a mode signal is introduced, and the partial product generated by the multiplier is processed by the mode signal, so that only the partial product generation part corresponding to the input data is left.
  • the mode signal is equivalent to a control command, and the control system performs different processing on the partial products of different regions generated by the multiplier.
  • the mode signal is determined by the precision of the input data, and the processing includes at least one of the following two operations: 1. Masking the partial product of the preset area generated by the multiplier .
  • the multiplier is an 8bit ⁇ 8bit multiplier
  • the partial product generated by the 8bit ⁇ 8bit multiplier is gated and selected by the mode signal.
  • the unneeded partial product under the mode signal will be masked.
  • the masking process can be implemented by setting the output result of the unneeded partial product generation part to 0 or 1 (the complement of the high-order bit is complemented). bits).
  • Figure 4 shows the multiplication and accumulation operations of 4 groups of 2bit ⁇ 2bit input data.
  • the blocks of the same depth represent the same group of multiplier input data, or the multiplicand input data or the partial product generation part corresponding to the input data. For these 4 For groups of input data, a specific mode signal will be generated, and other partial products other than the partial products corresponding to the four groups of input data will be masked.
  • the partial product generating part obtained after the processing needs to be split into a first partial product generating part and a second partial product generating part. Then, a summation operation is performed on the first partial product generation part and the second partial product generation part, and the data obtained after the summation operation is used as a target sum.
  • the sum operation performed on the first partial product generation part and the second partial product generation part is mainly divided into the following three cases:
  • the first partial product generating part and the second partial product generating part can be generated by Input the first compressor and the second compressor respectively, then input the output result of the first compressor and the second compressor into the same adder, and use the output result of the adder as the target sum .
  • the speed of floating-point multiplication is largely determined by the speed of mantissa processing.
  • a large number of partial products are generated in the process of mantissa processing.
  • the multiplier used in this embodiment is an 8bit ⁇ 8bit multiplier, as shown in Figure 2 and Figure 3, the implementation of the conventional 8bit ⁇ 8bit multiplier will generate a total of 8 groups of partial products of gradual shifting, 8
  • the component product PP 0 -PP 7 is divided into two parts, and they will go through two 42 compressors (CSA42) in the first stage respectively.
  • the output results of the two 42 compressors will be jointly input to a 42 in the second stage.
  • Compressor (CSA42) and then input the output result of the second-stage 42-compressor (CSA42) into the first-stage carry-pass adder (CPA) to obtain the final sum, that is, the target sum.
  • CPA carry-pass adder
  • the first 4 partial product generation parts are one part, that is, the first partial product generation part; the last 4 partial product generation parts are one part, that is, the second partial product generation part and then input the first partial product generation part and the second partial product generation part into the first stage compressor a and the first stage compressor b respectively, and the first stage compressor a and the first stage compressor a and the first stage compressor
  • the output results of the first stage compressor b are jointly input to the second stage compressor c, and then the output results of the second stage compressor c are input into the adder, and finally the output result of the adder is used as the target sum.
  • Figure 6 shows the distribution of the partial product generation part in the multiplier when the input data is 1bit ⁇ 1bit
  • Figure 7 shows the distribution of the partial product generation part in the multiplier when the input data is 3bit ⁇ 3bit
  • Figure 8 shows the distribution of the partial product generation part in the multiplier when the input data is 5bit ⁇ 5bit
  • Figure 9 shows the distribution of the partial product generation part in the multiplier when the input data is 6bit ⁇ 6bit
  • Figure 10 shows the distribution of the partial product generation part in the multiplier when the input data is 7bit ⁇ 7bit.
  • the embodiments corresponding to these figures all meet the condition that the precision of the input data is the same, so the steps of splitting, compressing and summing the partial product generation part are similar to those of the embodiment shown in FIG. 5 .
  • this embodiment adopts another method to obtain the target sum corresponding to the input data.
  • the highest number of bits is equal to the highest number of bits of the multiplier
  • the highest precision representing the input data does not exceed the highest bit of the multiplier, and only one multiplier needs to be called for multiplication at this time.
  • this embodiment adopts the operation of separately summing the compressed partial product generating parts, that is, the two partial product generating parts obtained after compression are input into different adders respectively. beg for peace.
  • this embodiment adopts a conventional 8bit ⁇ 8bit multiplier to realize two 4bit ⁇ 8bit mixed-precision input data, and the 8 partial product generating parts generated at this time can be generated from top to bottom.
  • the first four partial product generating parts are the first partial product generating part, and these four partial product generating parts are summed up separately, that is, the first partial product generating part is input into a compressor for compression, and then separately Input into an adder for summation;
  • the last four partial product generating parts are the second partial product generating part, and these four partial product generating parts are summed separately, that is, the second partial product generating part is input into another compressor After compressing in another adder, it is separately input into another adder for summation, and then the output results of the two adders are summed.
  • the mixed-precision input data may also have a situation where the highest bit number of the input data is greater than the highest bit number of the multiplier. It is understandable that when this happens, it is impossible to rely on only one multiplier
  • the input data is multiplied, and two multipliers must be called for the operation.
  • the highest bit number of the input data is obtained, and the highest bit number of the input data is compared with the highest bit number of the multiplier.
  • this embodiment divides the two multipliers called into a first multiplier and a second multiplier, where the second multiplier is a multiplier that performs low-order operations.
  • the partial product generation part generated by the first multiplier is used as the first partial product generation part
  • the partial product generation part generated by the second multiplier is used as the first partial product generation part.
  • the second part is the product generation part.
  • the first partial product generating part can be directly input into the first adder (CPA1), while the second partial product generating part must be split and input to the first adder and the second adder respectively (CPA2).
  • the sum of the output results of the first adder and the second adder is used as the target sum.
  • the data can be directly input into the addition without being compressed by a compressor. calculation in the device. And since 2 multipliers need to be called in this case, at the system accumulation implementation level, it is necessary to right-shift the partial product generation part generated by the multiplier that performs the low-order operation to realize the subsequent correct summation operation, so in the On the basis of an adder that is conventionally used, another adder needs to be called to perform a summation operation on the excess part after the right shift.
  • Figure 12 shows the multiplication operation of 8bit ⁇ 16bit input data based on two sets of 8bit ⁇ 8bit multiplier architecture
  • Figure 14 shows the implementation of 8bit ⁇ xbit input data based on two sets of 8bit ⁇ 8bit multiplier architecture.
  • Multiplication operation, x 9 ⁇ 15bit
  • Figure 15 shows the multiplication operation of 8bit ⁇ 15bit input data based on two sets of 8bit ⁇ 8bit multiplier architecture.
  • the above cases belong to the highest bit of the input data is greater than the highest bit of the multiplier. In the case of bits, it is necessary to adopt the above method to realize the sum operation of the partial product generation part.
  • the method further includes the following steps:
  • step S300 the target sum is cut and selected, and the data obtained after the cut and selection is used as the result of the dot product of the input data.
  • the dot product result consistent with the mode signal and the input data can be finally obtained.
  • step S300 specifically includes the following steps:
  • Step S310 determining the selected bit width according to the precision of the input data
  • Step S320 perform a truncation operation on the target sum starting from the 0th bit according to the truncation bit width, and use the data obtained after the truncation and selection operation as the result of the dot product of the input data.
  • the truncation bit width is related to the precision of the input data. Specifically, for input data of the same precision, the truncated bit width is from the 0th bit to the 8th-nth bit, where n is the precision of the input data.
  • the clipped bit width is the 0th bit To the 5th bit;
  • the selected bit width is from the 0th bit to the 16th-x, where x is the highest bit of the input data, and the value is 9-15, such as 8bit ⁇ 12bit input data , and the selected bit width is from the 0th to the 4th bit.
  • the cut-selection operation is performed on the target sum starting from the 0th position, and finally the data obtained after the cut-selection operation is used as the dot product of the input data Operation result.
  • this embodiment can not only support dot multiplication operations of different precisions, but also satisfy signed bit operations and unsigned bit operations. Therefore, the method further comprises the following steps:
  • Step S1 determining the partial product generating part corresponding to the highest bit of the input data, and using the partial product generating part as the partial product generating part to be adjusted;
  • Step S2 when the highest bit of the input data is a negative number, invert and add one processing to the partial product generation part to be adjusted.
  • this embodiment first determines the partial product generation part related to the signed bit operation.
  • the operation of the signed-bit fixed-point multiplier is implemented based on the complement input, where the complement of a positive number is itself, and the complement of a negative number is a signed binary (including the sign bit) which is directly fetched bit by bit Add one more.
  • the partial product generating part corresponding to the most significant bit of the input data is used as the partial product generating part to be adjusted.
  • the partial product generating part to be adjusted is inverted. Add one processing, and then realize the operation with the sign bit.
  • Figure 16 shows a schematic diagram of the generation of the partial product generation part of an 8bit ⁇ 8bit multiplier, wherein the generation of the first 7 partial product generation parts PP 0 -PP 6
  • the generation of the partial product generation part (PP 7 ) requires special processing: when the sign bit B7 is 0, it means a positive number, then PP 7 is 0; when the sign bit B7 is 1, it means a negative number, then PP 7 is A7A6A5A4A3A2A1A0 negated Add one operation.
  • PP 1 , PP 3 , PP 5 , and PP 7 need to be processed.
  • PP 3 and PP 7 need to be processed.
  • the operation of the 8bit ⁇ 16bit multiplier needs to reduce such operations, and the generation of PP 7 in the second multiplier that performs low-bit operations does not require a similar method, only the first multiplier The generation of PP 7 in 2 needs to be carried out in a similar way.
  • the bit width needs to be extended on the left side of the data in the addition operation, the added data needs to be the same as the highest bit of the original data to ensure the same value in size.
  • the present invention also provides a fixed-point multiply-add operation unit suitable for mixed-precision neural network.
  • the operation unit includes:
  • a position determination module 01 for acquiring a pattern signal and input data, determining a data input position according to the pattern signal, and inputting the input data into the multiplier from the data input position;
  • Partial product processing module 02 for processing the partial product generated by the multiplier according to the mode signal, and performing a summation operation, using the data obtained after the summation operation as a target sum;
  • the result generation module 03 is configured to cut and select the target sum, and use the data obtained after the cut as the result of the dot product of the input data.
  • a unified multiplier is used for operation, but the number of multipliers is not fixed, and the number of multipliers called by the operation unit can be adaptively changed according to the precision of the input data . It can be understood that when the most significant bit of the input data is less than or equal to the most significant bit of the multiplier, the operation unit may only call one multiplier to implement the operation on the input data. When the most significant bit of the input data is greater than the most significant bit of the multiplier, the operation unit needs to call more than one multiplier.
  • the operation unit can only call one multiplier, and then according to the mode
  • the signal controls the multiplier to output the partial product generating part after masking the partial product of the specified area, and perform a summation operation on the output partial product generating part according to methods corresponding to different precisions.
  • the operation unit needs to call two multipliers, control the two multipliers to mask the partial product of the specified area according to the mode signal, and then output the partial product generating part, and generate the partial product for the output part.
  • the sum operation is performed according to the methods corresponding to different precisions.
  • the present invention discloses a fixed-point multiply-add operation unit and method suitable for mixed-precision neural network.
  • the multiplier is controlled according to the mode signal. After masking the partial product of the specified area, output the partial product generation part, and perform the sum operation on the output partial product generation part according to the methods corresponding to different precisions, so as to realize the mixed precision dot product operation.
  • a multiplier can be used to realize the point multiplication operation of the mixed-precision neural network, which solves the need to use a variety of processing units with different precisions in the prior art to process the mixed-precision operation, resulting in excessive hardware overhead and idle time. Resource redundancy and other issues.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Complex Calculations (AREA)

Abstract

Disclosed is a fixed-point multiply-add operation unit and method suitable for a mixed-precision neural network. A mixed-precision point multiplication operation is achieved by inputting input data precision having different precisions from different positions into a multiplier, controlling, according to a mode signal, the multiplier to mask a partial product of a specified area, then outputting a partial product generation portion, and performing a summation operation on the outputted partial product generation portion according to methods corresponding to different precisions. The multiplier can realize the point multiplication operation of the mixed-precision neural network, solving the problems in the prior art, such as excessive hardware overhead and redundant resource redundancy which are caused by processing a mixed-precision operation by using a processing unit having a plurality of different precisions.

Description

一种适用于混合精度神经网络的定点乘加运算单元及方法A fixed-point multiply-add operation unit and method suitable for mixed-precision neural network 技术领域technical field
本发明涉及数字电路领域,尤其涉及的是一种适用于混合精度神经网络的定点乘加运算单元及方法。The invention relates to the field of digital circuits, in particular to a fixed-point multiply-add operation unit and method suitable for a mixed-precision neural network.
背景技术Background technique
当前人工智能算法在众多商用领域应用广泛,为了提高网络计算的性能,网络不同层的量化是提高网络运算效率的重要方法之一。作为算法实现的运算载体,人工智能芯片为了针对网络设计的特性,在数据处理的过程中对混合精度运算的比例需求日益增大。常规处理器采用多种不同精度的处理单元对混合精度运算进行处理,这种方法使得硬件开销过大,空闲资源冗余,以及不同精度硬件切换时延迟过多,减少通量,并且无法针对应用需求配置调整和最大化的利用硬件资源从而提高能效比和吞吐率,造成运行时间和运行面积上的浪费。At present, artificial intelligence algorithms are widely used in many commercial fields. In order to improve the performance of network computing, the quantization of different layers of the network is one of the important methods to improve the efficiency of network computing. As a computing carrier for algorithm implementation, artificial intelligence chips have an increasing demand for mixed-precision computing in the process of data processing in order to meet the characteristics of network design. Conventional processors use a variety of processing units with different precisions to process mixed-precision operations. This method makes the hardware overhead too large, redundant idle resources, and excessive delays when switching between different precision hardware, reducing throughput, and cannot target applications. Demand configuration adjustment and maximum utilization of hardware resources to improve energy efficiency ratio and throughput rate, resulting in a waste of operating time and operating area.
因此,现有技术还有待改进和发展。Therefore, the existing technology still needs to be improved and developed.
发明内容SUMMARY OF THE INVENTION
本发明要解决的技术问题在于,针对现有技术的上述缺陷,提供一种适用于混合精度神经网络的定点乘加运算单元及方法,旨在解决现有技术中需要采用多种不同精度的处理单元对混合精度运算进行处理,导致的硬件开销过大,空闲资源冗余等问题。The technical problem to be solved by the present invention is to provide a fixed-point multiply-add operation unit and method suitable for mixed-precision neural network in view of the above-mentioned defects of the prior art, aiming at solving the need to use a variety of different precision processing in the prior art The unit processes mixed-precision operations, resulting in problems such as excessive hardware overhead and redundant idle resources.
本发明解决问题所采用的技术方案如下:The technical scheme adopted by the present invention to solve the problem is as follows:
第一方面,本发明实施例提供一种适用于混合精度神经网络的定点乘加运算方法,其中,所述方法包括:In a first aspect, an embodiment of the present invention provides a fixed-point multiply-add operation method suitable for mixed-precision neural networks, wherein the method includes:
获取模式信号和输入数据,根据所述模式信号确定数据输入位置,并将所述输入数据从所述数据输入位置输入乘法器中;acquiring a mode signal and input data, determining a data input position according to the mode signal, and inputting the input data into a multiplier from the data input position;
根据所述模式信号对所述乘法器生成的部分积进行处理,并执行求和操作,将所述求和操作后得到的数据作为目标和;Process the partial products generated by the multiplier according to the mode signal, perform a summation operation, and use the data obtained after the summation operation as a target sum;
对所述目标和进行截选,将截选后得到的数据作为所述输入数据的点乘运算结果。The target sum is truncated, and the data obtained after the truncated selection is used as the result of the dot product of the input data.
在一种实施方式中,所述获取模式信号和输入数据,根据所述模式信号确定数据输入位置,并将所述输入数据从所述数据输入位置输入乘法器中包括:In one embodiment, the acquiring the mode signal and the input data, determining the data input position according to the mode signal, and inputting the input data from the data input position into the multiplier includes:
获取模式信号和输入数据,根据所述输入数据的精度确定调用的乘法器的数量;Acquire the mode signal and input data, and determine the number of multipliers called according to the precision of the input data;
当所述输入数据的最高精度高于所述乘法器的最高比特位时,调用的乘法器的数量大于1;When the highest precision of the input data is higher than the highest bit of the multiplier, the number of called multipliers is greater than 1;
根据所述模式信号确定数据输入位置,将所述输入数据中最高精度的数据进行拆分,将拆分完毕后得到的输入数据从所述数据输入位置输入乘法器中;Determine the data input position according to the mode signal, split the data with the highest precision in the input data, and input the input data obtained after the splitting into the multiplier from the data input position;
当所述输入数据的最高精度低于或者等于所述乘法器的最高比特位时,调用的乘法器的数量为1;When the highest precision of the input data is lower than or equal to the highest bit of the multiplier, the number of called multipliers is 1;
根据所述模式信号确定数据输入位置,将所述输入数据从所述数据输入位置输入乘法器中。A data input location is determined based on the mode signal, and the input data is input into a multiplier from the data input location.
在一种实施方式中,所述获取模式信号,根据所述模式信号对所述乘法器生成的部分积进行处理,并执行求和操作,将所述求和操作后得到的数据作为目标和包括:In one embodiment, the acquiring mode signal, processing the partial product generated by the multiplier according to the mode signal, and performing a summation operation, using the data obtained after the summation operation as the target sum including :
获取模式信号,根据所述模式信号对所述乘法器生成的部分积进行处理;obtaining a mode signal, and processing the partial product generated by the multiplier according to the mode signal;
将处理后得到的部分积生成部分拆分为第一部分积生成部分和第二部分积生成部分;Splitting the partial product generation part obtained after processing into a first partial product generation part and a second partial product generation part;
对所述第一部分积生成部分和所述第二部分积生成部分执行求和操作,将所述求和操作后得到的数据作为目标和。A summation operation is performed on the first partial product generation part and the second partial product generation part, and the data obtained after the summation operation is used as a target sum.
在一种实施方式中,所述模式信号由输入数据的精度确定;所述处理至少包括以下操作中的一种:In one embodiment, the mode signal is determined by the precision of the input data; the processing includes at least one of the following operations:
对所述乘法器生成的预设区域的部分积进行屏蔽处理;performing masking processing on the partial product of the preset area generated by the multiplier;
当调用的乘法器的数量大于1时,对进行低位运算的乘法器输出的部分积生成部分进行移位处理。When the number of called multipliers is greater than 1, a shift process is performed on the partial product generation part output by the multipliers that perform lower order operations.
在一种实施方式中,当所述输入数据为相同精度,且所述输入数据的最高比特位小于或者等于所述乘法器的最高比特位时,所述对所述第一部分积生成部分和所述第二部分积生成部分执行求和操作,并基于所述求和操作得到目标和包括:In an implementation manner, when the input data are of the same precision and the most significant bit of the input data is less than or equal to the most significant bit of the multiplier, the first partial product generating part and the second The second partial product generation part performs a summation operation, and obtains the target sum based on the summation operation, including:
将所述第一部分积生成部分和所述第二部分积生成部分分别输入第一级压缩器a和第一级压缩器b中;inputting the first partial product generating part and the second partial product generating part into the first-stage compressor a and the first-stage compressor b respectively;
将所述第一级压缩器a和所述第一级压缩器b的输出结果共同输入至第二级压缩器c中;inputting the output results of the first-stage compressor a and the first-stage compressor b together into the second-stage compressor c;
将所述第二级压缩器c的输出结果输入加法器中,并将所述加法器的输出结果作为目标和。The output result of the second stage compressor c is input into an adder, and the output result of the adder is used as a target sum.
在一种实施方式中,当所述输入数据为混合精度时,所述对所述第一部分积生成部分和所述第二部分积生成部分执行求和操作,并基于所述求和操作得到目标和包括:In one embodiment, when the input data is mixed precision, performing a summation operation on the first partial product generation part and the second partial product generation part, and obtains a target based on the summation operation and includes:
获取所述输入数据的最高比特位数,并将所述输入数据的最高比特位数与所述乘法器的最高比特位数进行比较;Obtain the highest bit number of the input data, and compare the highest bit number of the input data with the highest bit number of the multiplier;
当所述输入数据的最高比特位数等于所述乘法器的最高比特位数时,将所述第一部分积生成部分和所述第二部分积生成部分分别输入第一级压缩器a和第一级压缩器b中;When the highest bit number of the input data is equal to the highest bit number of the multiplier, the first partial product generating part and the second partial product generating part are input to the first stage compressor a and the first stage compressor a respectively. stage compressor b;
将所述第一级压缩器a和所述第一级压缩器b的输出结果分别输入第一加法器和第二加法器中,并将所述第一加法器和所述第二加法器的输出结果之和作为目标和。The output results of the first-stage compressor a and the first-stage compressor b are respectively input into the first adder and the second adder, and the first adder and the second adder are The sum of the output results is used as the target sum.
在一种实施方式中,当所述输入数据为混合精度时,所述对所述第一部分积生成部分和所述第二部分积生成部分执行求和操作,并基于所述求和操作得到目标和包括:In one embodiment, when the input data is mixed precision, performing a summation operation on the first partial product generation part and the second partial product generation part, and obtains a target based on the summation operation and includes:
获取所述输入数据的最高比特位数,并将所述输入数据的最高比特位数与所述乘法器的最高比特位数进行比较;Obtain the highest bit number of the input data, and compare the highest bit number of the input data with the highest bit number of the multiplier;
当所述输入数据的最高比特位数大于所述乘法器的最高比特位数时,所述乘法器包 括第一乘法器和第二乘法器,所述第二乘法器为低位运算乘法器;所述第一乘法器输出所述第一部分积生成部分,所述第二乘法器输出所述第二部分积生成部分;When the highest bit number of the input data is greater than the highest bit number of the multiplier, the multiplier includes a first multiplier and a second multiplier, and the second multiplier is a low-order operation multiplier; the first multiplier outputs the first partial product generating part, and the second multiplier outputs the second partial product generating part;
将所述第一部分积生成部分直接输入第一加法器;inputting the first partial product generation part directly into the first adder;
将所述第二部分积生成部分拆分后分别输入所述第一加法器和第二加法器中;splitting the second partial product generation part and inputting them into the first adder and the second adder respectively;
将所述第一加法器和所述第二加法器的输出结果之和作为目标和。The sum of the output results of the first adder and the second adder is used as the target sum.
在一种实施方式中,所述对所述目标和进行截选,将截选后得到的数据作为所述输入数据的点乘运算结果包括:In an implementation manner, the said target sum is truncated, and the data obtained after the truncated selection is used as the result of the dot product of the input data, comprising:
根据所述输入数据的精度确定截选位宽;Determine the truncation bit width according to the precision of the input data;
根据所述截选位宽,将所述目标和从第0位开始进行截选操作,将所述截选操作后得到的数据作为所述输入数据的点乘运算结果。According to the truncation bit width, a truncation operation is performed on the target sum starting from the 0th bit, and the data obtained after the truncation and selection operation is used as the result of the dot product of the input data.
在一种实施方式中,所述方法还包括:In one embodiment, the method further includes:
确定所述输入数据的最高位对应的部分积生成部分,将所述部分积生成部分作为待调整部分积生成部分;determining the partial product generating part corresponding to the highest bit of the input data, and using the partial product generating part as the partial product generating part to be adjusted;
当所述输入数据的最高位为负数时,对所述待调整部分积生成部分进行取反加一处理。When the most significant bit of the input data is a negative number, inversion and addition of one processing is performed on the to-be-adjusted partial product generation part.
第二方面,本发明实施例还提供一种适用于混合精度神经网络的定点乘加运算单元,其特征在于,所述运算单元包括:In a second aspect, an embodiment of the present invention further provides a fixed-point multiply-add operation unit suitable for a mixed-precision neural network, wherein the operation unit includes:
位置确定模块,用于获取模式信号和输入数据,根据所述模式信号确定数据输入位置,并将所述输入数据从所述数据输入位置输入乘法器中;a position determination module for acquiring a mode signal and input data, determining a data input position according to the mode signal, and inputting the input data into a multiplier from the data input position;
部分积处理模块,用于根据所述模式信号对所述乘法器生成的部分积进行处理,并执行求和操作,将所述求和操作后得到的数据作为目标和;a partial product processing module for processing the partial product generated by the multiplier according to the mode signal, and performing a summation operation, using the data obtained after the summation operation as a target sum;
结果生成模块,用于对所述目标和进行截选,将截选后得到的数据作为所述输入数据的点乘运算结果。The result generation module is used for intercepting the target sum, and using the data obtained after the interception as the result of the dot product of the input data.
本发明的有益效果:本发明通过将不同精度的输入数据精度从不同位置输入乘法器 中,根据模式信号控制所述乘法器屏蔽指定区域的部分积以后输出部分积生成部分,并对输出的部分积生成部分按不同精度对应的方法执行求和操作,从而实现混合精度的点乘运算。本发明中采用一种乘法器即可实现混合精度神经网络的点乘运算,解决了现有技术中需要采用多种不同精度的处理单元对混合精度运算进行处理,导致的硬件开销过大,空闲资源冗余等问题。Beneficial effects of the present invention: the present invention outputs the partial product generating part after the partial product of the designated area is masked according to the mode signal by inputting the precision of the input data with different precisions into the multiplier, and controls the output part to the output part. The product generation part performs the summation operation according to the methods corresponding to different precisions, so as to realize the dot multiplication operation of mixed precision. In the present invention, a multiplier can be used to realize the point multiplication operation of the mixed-precision neural network, which solves the need to use a variety of processing units with different precisions in the prior art to process the mixed-precision operation, resulting in excessive hardware overhead and idle time. Resource redundancy and other issues.
附图说明Description of drawings
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明中记载的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments described in the present invention. For those of ordinary skill in the art, other drawings can also be obtained based on these drawings without any creative effort.
图1是本发明实施例提供的一种适用于混合精度神经网络的定点乘加运算方法的流程示意图。FIG. 1 is a schematic flowchart of a fixed-point multiply-add operation method suitable for a mixed-precision neural network provided by an embodiment of the present invention.
图2是本发明实施例提供的常规8bit×8bit乘法器的中产生的部分积的示意图。FIG. 2 is a schematic diagram of a partial product generated in a conventional 8bit×8bit multiplier provided by an embodiment of the present invention.
图3是本发明实施例提供的常规8bit×8bit乘法器所用到的加法树结构。FIG. 3 is an addition tree structure used by a conventional 8bit×8bit multiplier provided by an embodiment of the present invention.
图4是本发明实施例提供的基于一组8bit×8bit乘法器实现4组精度为2bit×2bit的输入数据的乘法运算的参考图。FIG. 4 is a reference diagram for implementing multiplication operations of four groups of input data with a precision of 2bit×2bit based on a group of 8bit×8bit multipliers provided by an embodiment of the present invention.
图5是本发明实施例提供的基于一组8bit×8bit乘法器实现2组精度为4bit×4bit的输入数据的乘法运算的参考图。FIG. 5 is a reference diagram for implementing multiplication operations of two groups of input data with a precision of 4bit×4bit based on a group of 8bit×8bit multipliers provided by an embodiment of the present invention.
图6是本发明实施例提供的基于8bit×8bit乘法器实现精度为1bit×1bit的输入数据的乘法运算的参考图。FIG. 6 is a reference diagram for implementing a multiplication operation of input data with a precision of 1 bit×1 bit based on an 8bit×8bit multiplier according to an embodiment of the present invention.
图7是本发明实施例提供的基于8bit×8bit乘法器实现精度为3bit×3bit的输入数据的乘法运算的参考图。FIG. 7 is a reference diagram for implementing a multiplication operation of input data with a precision of 3 bits×3 bits based on an 8bit×8bit multiplier according to an embodiment of the present invention.
图8是本发明实施例提供的基于8bit×8bit乘法器实现精度为5bit×5bit的输入数据的乘法运算的参考图。FIG. 8 is a reference diagram for implementing a multiplication operation of input data with a precision of 5bit×5bit based on an 8bit×8bit multiplier according to an embodiment of the present invention.
图9是本发明实施例提供的基于8bit×8bit乘法器实现精度为6bit×6bit的输入数据的乘法运算的参考图。FIG. 9 is a reference diagram for implementing a multiplication operation of input data with a precision of 6bit×6bit based on an 8bit×8bit multiplier according to an embodiment of the present invention.
图10是本发明实施例提供的基于8bit×8bit乘法器实现精度为7bit×7bit的输入数据的乘法运算的参考图。FIG. 10 is a reference diagram for implementing a multiplication operation of input data with a precision of 7bit×7bit based on an 8bit×8bit multiplier according to an embodiment of the present invention.
图11是本发明实施例提供的基于8bit×8bit乘法器通过对部分积生成部分的拆分求和,实现两个4bit×8bit的混合精度的输入数据的乘法运算的参考图。11 is a reference diagram for realizing multiplication of two 4bit×8bit mixed-precision input data by dividing and summing the partial product generation part based on an 8bit×8bit multiplier according to an embodiment of the present invention.
图12是本发明实施例提供的基于两组8bit×8bit乘法器实现混合精度为8bit×16bit的输入数据的乘法运算的参考图。FIG. 12 is a reference diagram for implementing a multiplication operation of input data with a mixed precision of 8bit×16bit based on two sets of 8bit×8bit multipliers according to an embodiment of the present invention.
图13是本发明实施例提供的混合精度下,第一乘法器和第二乘法器的输出数据进行累加的示意图。FIG. 13 is a schematic diagram of accumulating the output data of the first multiplier and the second multiplier under mixed precision provided by an embodiment of the present invention.
图14是本发明实施例提供的基于两组8bit×8bit乘法器架构实现8bit×xbit乘法的示意图,x=9~15bit。FIG. 14 is a schematic diagram of implementing 8bit×xbit multiplication based on two groups of 8bit×8bit multiplier architectures provided by an embodiment of the present invention, where x=9˜15bit.
图15是本发明实施例提供的基于两组8bit×8bit乘法器实现混合精度为8bit×15bit的输入数据的乘法运算的参考图。FIG. 15 is a reference diagram for implementing a multiplication operation of input data with a mixed precision of 8bit×15bit based on two sets of 8bit×8bit multipliers provided by an embodiment of the present invention.
图16是本发明实施例提供的一个8bit×8bit乘法器中包含符号位的部分积示意图。FIG. 16 is a schematic diagram of a partial product including a sign bit in an 8bit×8bit multiplier according to an embodiment of the present invention.
图17是本发明实施例提供的运算单元的内部模块的参考图。FIG. 17 is a reference diagram of an internal module of an arithmetic unit provided by an embodiment of the present invention.
具体实施方式Detailed ways
为使本发明的目的、技术方案及优点更加清楚、明确,以下参照附图并举实施例对本发明进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。In order to make the objectives, technical solutions and advantages of the present invention clearer and clearer, the present invention will be further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.
需要说明,若本发明实施例中有涉及方向性指示(诸如上、下、左、右、前、后……),则该方向性指示仅用于解释在某一特定姿态(如附图所示)下各部件之间的相对位置关系、运动情况等,如果该特定姿态发生改变时,则该方向性指示也相应地随之改变。It should be noted that if there are directional indications (such as up, down, left, right, front, back, etc.) involved in the embodiments of the present invention, the directional indications are only used to explain a certain posture (as shown in the accompanying drawings). If the specific posture changes, the directional indication also changes accordingly.
当前人工智能算法在众多商用领域应用广泛,为了提高网络计算的性能,网络不同 层的量化是提高网络运算效率的重要方法之一。作为算法实现的运算载体,人工智能芯片为了针对网络设计的特性,在数据处理的过程中对混合精度运算的比例需求日益增大。常规处理器采用多种不同精度的处理单元对混合精度运算进行处理,这种方法使得硬件开销过大,空闲资源冗余,以及不同精度硬件切换时延迟过多,减少通量,并且无法针对应用需求配置调整和最大化的利用硬件资源从而提高能效比和吞吐率,造成运行时间和运行面积上的浪费。At present, artificial intelligence algorithms are widely used in many commercial fields. In order to improve the performance of network computing, the quantization of different layers of the network is one of the important methods to improve the efficiency of network computing. As a computing carrier for algorithm implementation, artificial intelligence chips have an increasing demand for mixed-precision computing in the process of data processing in order to meet the characteristics of network design. Conventional processors use a variety of processing units with different precisions to process mixed-precision operations. This method makes the hardware overhead too large, redundant idle resources, and excessive delays when switching between different-precision hardware, reducing throughput, and cannot target applications. Demand configuration adjustment and maximum utilization of hardware resources to improve energy efficiency ratio and throughput rate, resulting in a waste of operating time and operating area.
针对现有技术的上述缺陷,本发明提供了一种适用于混合精度神经网络的定点乘加运算方法,通过将不同精度的输入数据精度从不同位置输入乘法器中,根据模式信号控制所述乘法器屏蔽指定区域的部分积以后输出部分积生成部分,并对输出的部分积生成部分按不同精度对应的方法执行求和操作,从而实现混合精度的点乘运算。本发明中采用一种乘法器即可实现混合精度神经网络的点乘运算,解决了现有技术中需要采用多种不同精度的处理单元对混合精度运算进行处理,导致的硬件开销过大,空闲资源冗余等问题。In view of the above-mentioned defects of the prior art, the present invention provides a fixed-point multiply-add operation method suitable for mixed-precision neural networks. By inputting the precision of input data of different precisions into the multiplier from different positions, the multiplication is controlled according to the mode signal. After masking the partial product of the specified area, the device outputs the partial product generation part, and performs the sum operation on the output partial product generation part according to the method corresponding to different precisions, so as to realize the mixed precision dot product operation. In the present invention, a multiplier can be used to realize the point multiplication operation of the mixed-precision neural network, which solves the need to use a variety of processing units with different precisions in the prior art to process the mixed-precision operation, resulting in excessive hardware overhead and idle time. Resource redundancy and other issues.
如图1所示,所述方法包括如下:As shown in Figure 1, the method includes the following:
步骤S100、获取模式信号和输入数据,根据所述模式信号确定数据输入位置,并将所述输入数据从所述数据输入位置输入乘法器中。Step S100: Acquire a mode signal and input data, determine a data input position according to the mode signal, and input the input data into the multiplier from the data input position.
由于本实施例采用统一的乘法器进行混合精度神经网络的点乘运算,而乘法器的输入位置的比特位数是固定的,因此有可能出现输入数据的精度与乘法器的最高比特位数不相符的情况。为了使所述乘法器适用于不同精度的输入数据,本实施例需要获取模式信号和输入数据,根据所述模式信号确定数据输入位置,然后将所述输入数据从所述数据输入位置输入乘法器中。本实施例通过将不同精度的输入数据从不同的数据输入位置输入所述乘法器中,进而实现采用统一的乘法器进行混合精度神经网络的点乘运算。Since this embodiment uses a unified multiplier to perform the dot multiplication operation of the mixed-precision neural network, and the number of bits of the input position of the multiplier is fixed, it is possible that the precision of the input data is different from the highest number of bits of the multiplier. matching situation. In order to make the multiplier suitable for input data of different precisions, this embodiment needs to acquire a mode signal and input data, determine the data input position according to the mode signal, and then input the input data from the data input position into the multiplier middle. In this embodiment, input data of different precisions are input into the multipliers from different data input positions, thereby implementing the point multiplication operation of the mixed precision neural network by using a unified multiplier.
在一种实现方式中,所述步骤S100具体包括如下步骤:In an implementation manner, the step S100 specifically includes the following steps:
步骤S110、获取模式信号和输入数据,根据所述输入数据的精度确定调用的乘法器 的数量;Step S110, obtain mode signal and input data, determine the quantity of the multiplier called according to the precision of described input data;
步骤S120、当所述输入数据的最高精度高于所述乘法器的最高比特位时,调用的乘法器的数量大于1;Step S120, when the highest precision of the input data is higher than the highest bit of the multiplier, the number of called multipliers is greater than 1;
步骤S130、根据所述模式信号确定数据输入位置,将所述输入数据中最高精度的数据进行拆分,将拆分完毕后得到的输入数据从所述数据输入位置输入乘法器中;Step S130, determining the data input position according to the mode signal, splitting the data with the highest precision in the input data, and inputting the input data obtained after the splitting into the multiplier from the data input position;
步骤S140、当所述输入数据的最高精度低于或者等于所述乘法器的最高比特位时,调用的乘法器的数量为1;Step S140, when the highest precision of the input data is lower than or equal to the highest bit of the multiplier, the number of called multipliers is 1;
步骤S150、根据所述模式信号确定数据输入位置,将所述输入数据从所述数据输入位置输入乘法器中。Step S150: Determine the data input position according to the mode signal, and input the input data into the multiplier from the data input position.
由于本实施例采用统一的乘法器,而乘法器的最高比特位数固定,因此可能出现乘法器的精度与输入数据的精度不相符的情况,例如乘法器为8bit×8bit乘法器,而输入数据的精度为3bit×3bit,或者乘法器为8bit×8bit乘法器,而输入数据的精度为8bit×16bit。因此需要根据所述输入数据的精度确定调用的乘法器的数量。可以理解的是如果所述输入数据的精度超过了所述乘法器的精度,则无法通过一个乘法器对所述输入数据进行乘法运算,此时就需要调用多个乘法器。Since this embodiment adopts a unified multiplier, and the highest number of bits of the multiplier is fixed, it may happen that the precision of the multiplier does not match the precision of the input data. For example, the multiplier is an 8bit×8bit multiplier, and the input data The precision is 3bit×3bit, or the multiplier is 8bit×8bit multiplier, and the precision of the input data is 8bit×16bit. Therefore, the number of called multipliers needs to be determined according to the precision of the input data. It can be understood that if the precision of the input data exceeds the precision of the multiplier, the input data cannot be multiplied by one multiplier, and in this case, multiple multipliers need to be called.
具体地,当所述输入数据的最高精度高于所述乘法器的最高比特位时,调用的乘法器的数量大于1,然后根据所述模式信号确定数据输入位置,将所述输入数据中最高精度的数据进行拆分,将拆分完毕后得到的输入数据从所述数据输入位置输入乘法器中。举例说明,假设输入数据为混合精度8bit×16bit,而乘法器采用的是8bit×8bit乘法器,则此时需要调用2个8bit×8bit乘法器才能实现混合精度8bit×16bit的输入数据的乘法运算,其中8bit的那部分数据可以直接从指定的数据输入位置输入到乘法器中,而16bit的那一部分输入数据需要进行拆分以后才能分别输入到2个乘法器中(如图12所示)。Specifically, when the highest precision of the input data is higher than the highest bit of the multiplier, the number of called multipliers is greater than 1, and then the data input position is determined according to the mode signal, and the highest bit in the input data is determined. The precision data is split, and the input data obtained after the split is input into the multiplier from the data input position. For example, assuming that the input data is mixed-precision 8bit×16bit, and the multiplier is an 8bit×8bit multiplier, then two 8bit×8bit multipliers need to be called to realize the multiplication of mixed-precision 8bit×16bit input data. , the 8-bit part of the data can be directly input into the multiplier from the specified data input position, and the 16-bit part of the input data needs to be split before being input into the two multipliers respectively (as shown in Figure 12).
当所述输入数据的最高精度低于或者等于所述乘法器的最高比特位时,调用的乘法器的数量为1,根据所述模式信号确定数据输入位置,将所述输入数据从所述数据输入 位置输入乘法器中。举例说明,当输入数据的精度为3bit×3bit,而乘法器采用的是8bit×8bit乘法器,则此时只需要调用1个8bit×8bit乘法器就能实现该输入数据的乘法运算,由于输入数据的最高精度并未超过乘法器的最高比特位,因此可以直接将该输入数据从指定的数据输入位置输入乘法器中进行运算(如图7所示)。When the highest precision of the input data is lower than or equal to the highest bit of the multiplier, the number of called multipliers is 1, the data input position is determined according to the mode signal, and the input data is converted from the data The input position is entered into the multiplier. For example, when the precision of the input data is 3bit×3bit, and the multiplier uses an 8bit×8bit multiplier, then only one 8bit×8bit multiplier needs to be called to realize the multiplication of the input data. The highest precision of the data does not exceed the highest bit of the multiplier, so the input data can be directly input into the multiplier from the specified data input position for operation (as shown in Figure 7).
之后需要获取乘法器的输出结果,如图1所示,所述方法还包括如下步骤:After that, the output result of the multiplier needs to be obtained, as shown in FIG. 1 , the method further includes the following steps:
步骤S200、获取模式信号,根据所述模式信号对所述乘法器生成的部分积进行处理,并执行求和操作,将所述求和操作后得到的数据作为目标和。Step S200: Obtain a mode signal, process the partial product generated by the multiplier according to the mode signal, perform a summation operation, and use the data obtained after the summation operation as a target sum.
具体地,由于本实施例采用的是统一的乘法器对不同精度的输入数据进行计算,因此乘法器的最高比特位数与输入数据的精度有可能出现不相等的情况,为了使乘法器的输出结果与所述输入数据相符,本实施例中引入了模式信号的概念,通过所述模式信号对所述乘法器生成的部分积进行处理,从而只留下所述输入数据对应的部分积生成部分。简言之,所述模式信号相当于一种控制指令,控制系统对所述乘法器生成的不同区域的部分积进行不同的处理。Specifically, since a unified multiplier is used in this embodiment to calculate input data of different precisions, the highest bit number of the multiplier may not be equal to the precision of the input data. In order to make the output of the multiplier The result is consistent with the input data. In this embodiment, the concept of a mode signal is introduced, and the partial product generated by the multiplier is processed by the mode signal, so that only the partial product generation part corresponding to the input data is left. . In short, the mode signal is equivalent to a control command, and the control system performs different processing on the partial products of different regions generated by the multiplier.
在一种实现方式中,所述模式信号由输入数据的精度确定,所述处理至少包括以下2种操作中的一种:1.对所述乘法器生成的预设区域的部分积进行屏蔽处理。举例说明,假设所述乘法器为8bit×8bit乘法器,在所述8bit×8bit乘法器的基础上,通过模式信号对所述8bit×8bit乘法器生成的部分积进行门控选择,在特定的模式信号下不需要的部分积将被进行屏蔽处理,在一种实现方式中,所述屏蔽处理可以通过对不需要的部分积生成部分的输出结果进行置0或者置1实现(补码高位补位)。图4是4组2bit×2bit的输入数据的乘累加运算,相同深度的方块代表相同组的乘数输入数据,或者被乘数输入数据或者所述输入数据对应的部分积生成部分,对于这4组输入数据而言,会产生特定的模式信号,将这4组输入数据对应的部分积之外的其他部分积进行屏蔽处理。In an implementation manner, the mode signal is determined by the precision of the input data, and the processing includes at least one of the following two operations: 1. Masking the partial product of the preset area generated by the multiplier . For example, assuming that the multiplier is an 8bit×8bit multiplier, on the basis of the 8bit×8bit multiplier, the partial product generated by the 8bit×8bit multiplier is gated and selected by the mode signal. The unneeded partial product under the mode signal will be masked. In one implementation, the masking process can be implemented by setting the output result of the unneeded partial product generation part to 0 or 1 (the complement of the high-order bit is complemented). bits). Figure 4 shows the multiplication and accumulation operations of 4 groups of 2bit×2bit input data. The blocks of the same depth represent the same group of multiplier input data, or the multiplicand input data or the partial product generation part corresponding to the input data. For these 4 For groups of input data, a specific mode signal will be generated, and other partial products other than the partial products corresponding to the four groups of input data will be masked.
2.当调用的乘法器的数量大于1时,对进行低位运算的乘法器输出的部分积生成部分进行移位处理。举例说明,当采用的乘法器为8bit×8bit乘法器,输入数据的精度为 8bit×16bit时,由于输入数据的最大精度大于乘法器的最高比特位数,因此无法采用一个乘法器完成所述输入数据的乘法运算,必须调用2个乘法器,根据输入数据的精度会生成特定的模式信号,通过所述模式信号会对进行低位运算的乘法器输出的部分积生成部分进行移位处理(如图12所示)。2. When the number of called multipliers is greater than 1, shift processing is performed on the partial product generation part of the output of the multipliers that perform lower-order operations. For example, when the multiplier used is an 8bit×8bit multiplier and the precision of the input data is 8bit×16bit, since the maximum precision of the input data is greater than the highest bit number of the multiplier, it is impossible to use a multiplier to complete the input. For data multiplication, two multipliers must be called. According to the precision of the input data, a specific mode signal will be generated. Through the mode signal, the partial product generation part of the low-order multiplier output will be shifted (as shown in the figure). 12).
处理完毕以后,需要将处理后得到的部分积生成部分拆分为第一部分积生成部分和第二部分积生成部分。然后对所述第一部分积生成部分和所述第二部分积生成部分执行求和操作,将所述求和操作后得到的数据作为目标和。具体地,本实施例中对所述第一部分积生成部分和所述第二部分积生成部分执行求和操作主要分为以下三种情况:After the processing is completed, the partial product generating part obtained after the processing needs to be split into a first partial product generating part and a second partial product generating part. Then, a summation operation is performed on the first partial product generation part and the second partial product generation part, and the data obtained after the summation operation is used as a target sum. Specifically, in this embodiment, the sum operation performed on the first partial product generation part and the second partial product generation part is mainly divided into the following three cases:
当所述输入数据为相同精度,且所述输入数据的最高比特位小于或者等于所述乘法器的最高比特位时,可以通过将所述第一部分积生成部分和所述第二部分积生成部分分别输入第一压缩器和第二压缩器中,然后将所述第一压缩器和所述第二压缩器的输出结果输入同一个加法器中,并将所述加法器的输出结果作为目标和。具体地,在实际应用中浮点乘法运算的快慢在很大程度上由尾数处理的速度决定,然而尾数处理过程中要产生大量的部分积,在对这些部分积进行累加的过程中,如果直接累加势必会大大延长尾数处理的时间,所以采用先对部分积进行压缩的方式,使部分积由n个最终压缩为2个,然后压缩后得到的2个部分积进行累加,累加后得到的结果即为本实施例需要的目标和。需要说明的是,本实施例中的压缩器实际属于一种特别的加法器。When the input data has the same precision and the most significant bit of the input data is less than or equal to the most significant bit of the multiplier, the first partial product generating part and the second partial product generating part can be generated by Input the first compressor and the second compressor respectively, then input the output result of the first compressor and the second compressor into the same adder, and use the output result of the adder as the target sum . Specifically, in practical applications, the speed of floating-point multiplication is largely determined by the speed of mantissa processing. However, a large number of partial products are generated in the process of mantissa processing. In the process of accumulating these partial products, if you directly The accumulation is bound to greatly prolong the mantissa processing time, so the partial product is first compressed, so that the partial products are finally compressed from n to 2, and then the 2 partial products obtained after compression are accumulated, and the result obtained after accumulation That is, the target sum required by this embodiment. It should be noted that the compressor in this embodiment actually belongs to a special adder.
举例说明,假设本实施例采用的乘法器为8bit×8bit乘法器,如图2和图3所示,常规8bit×8bit乘法器的实现方式,一共会产生8组逐步移位的部分积,8组部分积PP 0-PP 7拆分为两部分,并分别会经过第一级的两个42压缩器(CSA42),这两个42压缩器的输出结果会共同输入至第二级的一个42压缩器(CSA42),然后再将第二级的这个42压缩器(CSA42)的输出结果输入一级进位传递加法器(CPA)中求得最终的和,即目标和。如图5所示,假设第一级的两个压缩器分别为a、b,第二级的压缩器为c,假设输入数据为2个4bit×4bit的浮点数,则图5中的8个部分积生成部分将会被拆分为两部分,从上至 下前4个部分积生成部分为一部分,即第一部分积生成部分;后4个部分积生成部分为一部分,即第二部分积生成部分,然后将所述第一部分积生成部分和所述第二部分积生成部分分别输入第一级压缩器a和第一级压缩器b中,将所述第一级压缩器a和所述第一级压缩器b的输出结果共同输入至第二级压缩器c中,再将所述第二级压缩器c的输出结果输入加法器中,最后将所述加法器的输出结果作为目标和。图6展示的是输入数据为1bit×1bit时,乘法器中的部分积生成部分的分布情况;图7展示的是输入数据为3bit×3bit时,乘法器中的部分积生成部分的分布情况;图8展示的是输入数据为5bit×5bit时,乘法器中的部分积生成部分的分布情况;图9展示的是输入数据为6bit×6bit时,乘法器中的部分积生成部分的分布情况;图10展示的是输入数据为7bit×7bit时,乘法器中的部分积生成部分的分布情况。这些图对应的实施例均符合输入数据的精度相同的条件,因此其中的部分积生成部分的拆分、压缩、求和的步骤与图5所示的实施例类似。 For example, assuming that the multiplier used in this embodiment is an 8bit×8bit multiplier, as shown in Figure 2 and Figure 3, the implementation of the conventional 8bit×8bit multiplier will generate a total of 8 groups of partial products of gradual shifting, 8 The component product PP 0 -PP 7 is divided into two parts, and they will go through two 42 compressors (CSA42) in the first stage respectively. The output results of the two 42 compressors will be jointly input to a 42 in the second stage. Compressor (CSA42), and then input the output result of the second-stage 42-compressor (CSA42) into the first-stage carry-pass adder (CPA) to obtain the final sum, that is, the target sum. As shown in Figure 5, it is assumed that the two compressors of the first stage are a and b respectively, and the compressor of the second stage is c. Assuming that the input data are two 4bit×4bit floating-point numbers, then the eight compressors in Figure 5 The partial product generation part will be split into two parts. From top to bottom, the first 4 partial product generation parts are one part, that is, the first partial product generation part; the last 4 partial product generation parts are one part, that is, the second partial product generation part and then input the first partial product generation part and the second partial product generation part into the first stage compressor a and the first stage compressor b respectively, and the first stage compressor a and the first stage compressor a and the first stage compressor The output results of the first stage compressor b are jointly input to the second stage compressor c, and then the output results of the second stage compressor c are input into the adder, and finally the output result of the adder is used as the target sum. Figure 6 shows the distribution of the partial product generation part in the multiplier when the input data is 1bit×1bit; Figure 7 shows the distribution of the partial product generation part in the multiplier when the input data is 3bit×3bit; Figure 8 shows the distribution of the partial product generation part in the multiplier when the input data is 5bit×5bit; Figure 9 shows the distribution of the partial product generation part in the multiplier when the input data is 6bit×6bit; Figure 10 shows the distribution of the partial product generation part in the multiplier when the input data is 7bit×7bit. The embodiments corresponding to these figures all meet the condition that the precision of the input data is the same, so the steps of splitting, compressing and summing the partial product generation part are similar to those of the embodiment shown in FIG. 5 .
当所述输入数据为混合精度时,本实施例采取另外的方法来获取所述输入数据对应的目标和。首先需要获取所述输入数据中最高比特位数,并将所述最高比特位数与所述乘法器的最高比特位数进行比较,当所述最高比特位数等于所述乘法器的最高比特位数时,代表输入数据的最高精度并未超过乘法器的最高比特位,此时只需要调用一个乘法器进行乘法运算。则获取到第一部分积生成部分和第二部分积生成部分以后,将所述第一部分积生成部分和所述第二部分积生成部分分别输入第一级压缩器a和第一级压缩器b中,然后将所述第一级压缩器a和所述第一级压缩器b的输出结果分别输入第一加法器和第二加法器中,最后将所述第一加法器和所述第二加法器的输出结果之和作为目标和。简言之,本实施例对于混合精度的输入数据,采取对压缩后的部分积生成部分进行单独求和的操作,即将压缩后得到的两个部分积生成部分分别输入至不同的加法器中进行求和。When the input data is of mixed precision, this embodiment adopts another method to obtain the target sum corresponding to the input data. First, it is necessary to obtain the highest number of bits in the input data, and compare the highest number of bits with the highest number of bits of the multiplier. When the highest number of bits is equal to the highest number of bits of the multiplier When counting, the highest precision representing the input data does not exceed the highest bit of the multiplier, and only one multiplier needs to be called for multiplication at this time. After obtaining the first partial product generating part and the second partial product generating part, input the first partial product generating part and the second partial product generating part into the first stage compressor a and the first stage compressor b respectively. , and then input the output results of the first-stage compressor a and the first-stage compressor b into the first adder and the second adder respectively, and finally add the first adder and the second adder The sum of the output results of the generator is used as the target sum. In short, for the mixed-precision input data, this embodiment adopts the operation of separately summing the compressed partial product generating parts, that is, the two partial product generating parts obtained after compression are input into different adders respectively. beg for peace.
举例说明,如图11所示,假设本实施例采取常规8bit×8bit乘法器来实现2个4bit×8bit的混合精度的输入数据,此时产生的8个部分积生成部分,从上至下可以分为两个部分, 前4个部分积生成部分为第一部分积生成部分,这4个部分积生成部分进行单独求和,即所述第一部分积生成部分输入一个压缩器中进行压缩以后,单独输入一个加法器中进行求和;后4个部分积生成部分为第二部分积生成部分,这4个部分积生成部分进行单独求和,即所述第二部分积生成部分输入另一个压缩器中进行压缩以后,单独输入另一个加法器中进行求和,然后再对2个加法器的输出结果进行求和。For example, as shown in FIG. 11, it is assumed that this embodiment adopts a conventional 8bit×8bit multiplier to realize two 4bit×8bit mixed-precision input data, and the 8 partial product generating parts generated at this time can be generated from top to bottom. Divided into two parts, the first four partial product generating parts are the first partial product generating part, and these four partial product generating parts are summed up separately, that is, the first partial product generating part is input into a compressor for compression, and then separately Input into an adder for summation; the last four partial product generating parts are the second partial product generating part, and these four partial product generating parts are summed separately, that is, the second partial product generating part is input into another compressor After compressing in another adder, it is separately input into another adder for summation, and then the output results of the two adders are summed.
然而混合精度的输入数据还可能出现输入数据的最高比特位数大于所述乘法器的最高比特位数的情况,可以理解的是当出现这种情况的时候,则无法仅依靠一个乘法器对该输入数据进行乘法运算,必须调用两个乘法器进行运算。如图13所示,获取所述输入数据的最高比特位数,并将所述输入数据的最高比特位数与所述乘法器的最高比特位数进行比较,当所述输入数据的最高比特位数大于所述乘法器的最高比特位数时,本实施例将调用的两个乘法器分为第一乘法器和第二乘法器,其中所述第二乘法器为进行低位运算的乘法器。为了区分两个乘法器生成的部分积生成部分,本实施例将所述第一乘法器生成的部分积生成部分作为第一部分积生成部分,将所述第二乘法器生成的部分积生成部分作为第二部分积生成部分。然后,所述第一部分积生成部分可以直接输入第一加法器中(CPA1),而所述第二部分积生成部分必须经过拆分以后分别输入所述第一加法器和所述第二加法器(CPA2)中。再将所述第一加法器和所述第二加法器的输出结果之和作为目标和。简言之,针对混合精度,且最高比特位数大于所述乘法器的最高比特位数的输入数据来说,为了避免造成过多的时序延迟,因此数据不需要经过压缩器压缩可以直接输入加法器中进行计算。且由于这种情况下需要调用2个乘法器,在系统累加实现层面上,需要通过对执行低位运算的乘法器产生的部分积生成部分整体进行右移才能实现后续正确的求和运算,因此在常规采用的一个加法器的基础上,还需要额外再调用另外一个加法器对右移以后超出的额外部分进行求和运算。举例说明,图12展示的是基于两组8bit×8bit乘法器架构实现8bit×16bit的输入数据的乘法运算,图14展示的是基于两组8bit×8bit乘法器架构实现8bit×xbit的输入数据的乘法运算,x=9~15bit,其中 图15展示的是基于两组8bit×8bit乘法器架构实现8bit×15bit的输入数据的乘法运算,上述情况都属于输入数据的最高比特位大于乘法器的最高比特位的情况,因此都需要采取上述方法实现对部分积生成部分的求和运算。However, the mixed-precision input data may also have a situation where the highest bit number of the input data is greater than the highest bit number of the multiplier. It is understandable that when this happens, it is impossible to rely on only one multiplier The input data is multiplied, and two multipliers must be called for the operation. As shown in FIG. 13 , the highest bit number of the input data is obtained, and the highest bit number of the input data is compared with the highest bit number of the multiplier. When the highest bit number of the input data is When the number is greater than the highest number of bits of the multiplier, this embodiment divides the two multipliers called into a first multiplier and a second multiplier, where the second multiplier is a multiplier that performs low-order operations. In order to distinguish the partial product generation parts generated by the two multipliers, in this embodiment, the partial product generation part generated by the first multiplier is used as the first partial product generation part, and the partial product generation part generated by the second multiplier is used as the first partial product generation part. The second part is the product generation part. Then, the first partial product generating part can be directly input into the first adder (CPA1), while the second partial product generating part must be split and input to the first adder and the second adder respectively (CPA2). Then, the sum of the output results of the first adder and the second adder is used as the target sum. In short, for input data with mixed precision and the highest number of bits is greater than the highest number of bits of the multiplier, in order to avoid causing excessive timing delay, the data can be directly input into the addition without being compressed by a compressor. calculation in the device. And since 2 multipliers need to be called in this case, at the system accumulation implementation level, it is necessary to right-shift the partial product generation part generated by the multiplier that performs the low-order operation to realize the subsequent correct summation operation, so in the On the basis of an adder that is conventionally used, another adder needs to be called to perform a summation operation on the excess part after the right shift. For example, Figure 12 shows the multiplication operation of 8bit×16bit input data based on two sets of 8bit×8bit multiplier architecture, and Figure 14 shows the implementation of 8bit×xbit input data based on two sets of 8bit×8bit multiplier architecture. Multiplication operation, x=9~15bit, in which Figure 15 shows the multiplication operation of 8bit×15bit input data based on two sets of 8bit×8bit multiplier architecture. The above cases belong to the highest bit of the input data is greater than the highest bit of the multiplier. In the case of bits, it is necessary to adopt the above method to realize the sum operation of the partial product generation part.
获取到目标和以后,为了获取需要的点乘运算结果,如图1所示,所述方法还包括如下步骤:After obtaining the target and, in order to obtain the required result of the dot product operation, as shown in Figure 1, the method further includes the following steps:
步骤S300、对所述目标和进行截选,将截选后得到的数据作为所述输入数据的点乘运算结果。In step S300, the target sum is cut and selected, and the data obtained after the cut and selection is used as the result of the dot product of the input data.
具体地,本实施例获取到目标和以后还需要对求得的和进行不同位宽的截选才可以最终得到与所述模式信号以及输入数据相符的点乘运算结果。Specifically, in this embodiment, after the target is obtained and the obtained sum needs to be cut with different bit widths later, the dot product result consistent with the mode signal and the input data can be finally obtained.
在一种实现方式中,所述步骤S300具体包括如下步骤:In an implementation manner, the step S300 specifically includes the following steps:
步骤S310、根据所述输入数据的精度确定截选位宽;Step S310, determining the selected bit width according to the precision of the input data;
步骤S320、根据所述截选位宽,将所述目标和从第0位开始进行截选操作,将所述截选操作后得到的数据作为所述输入数据的点乘运算结果。Step S320 , perform a truncation operation on the target sum starting from the 0th bit according to the truncation bit width, and use the data obtained after the truncation and selection operation as the result of the dot product of the input data.
本实施例中截选位宽与输入数据的精度相关。具体地,相同精度的输入数据,其截选位宽为第0位至第8-n位,其中n为输入数据的精度,例如3bit×3bit的输入数据,其截选位宽为第0位至第5位;不同精度的输入数据,其截选位宽为第0位至第16-x,其中x为输入数据的最高比特位,取值为9-15,例如8bit×12bit的输入数据,其截选位宽为第0位至第4位。确定好截选位宽以后,根据所述截选位宽,将所述目标和从第0位开始进行截选操作,最后将所述截选操作后得到的数据作为所述输入数据的点乘运算结果。In this embodiment, the truncation bit width is related to the precision of the input data. Specifically, for input data of the same precision, the truncated bit width is from the 0th bit to the 8th-nth bit, where n is the precision of the input data. For example, for 3bit×3bit input data, the clipped bit width is the 0th bit To the 5th bit; for input data of different precision, the selected bit width is from the 0th bit to the 16th-x, where x is the highest bit of the input data, and the value is 9-15, such as 8bit × 12bit input data , and the selected bit width is from the 0th to the 4th bit. After the cut-and-selection bit width is determined, according to the cut-selection bit width, the cut-selection operation is performed on the target sum starting from the 0th position, and finally the data obtained after the cut-selection operation is used as the dot product of the input data Operation result.
在一种实现方法中,本实施例不仅可以支持不同精度的点乘运算,还可以满足带符号位运算和无符号位运算。因此所述方法还包括如下步骤:In an implementation method, this embodiment can not only support dot multiplication operations of different precisions, but also satisfy signed bit operations and unsigned bit operations. Therefore, the method further comprises the following steps:
步骤S1、确定所述输入数据的最高位对应的部分积生成部分,将所述部分积生成部分作为待调整部分积生成部分;Step S1, determining the partial product generating part corresponding to the highest bit of the input data, and using the partial product generating part as the partial product generating part to be adjusted;
步骤S2、当所述输入数据的最高位为负数时,对所述待调整部分积生成部分进行取反加一处理。Step S2, when the highest bit of the input data is a negative number, invert and add one processing to the partial product generation part to be adjusted.
具体地,为了满足带符号位的运算,本实施例首先确定与带符号位运算相关的部分积生成部分。在实际应用中,有符号位定点数乘法器的操作是基于补码输入实现的,其中正数的补码是本身,而负数的补码是带符号的二进制(包含符号位)直接按位取反再加一。本实施例将所述输入数据的最高位对应的部分积生成部分作为待调整部分积生成部分,当所述输入数据的最高位为负数时,然对所述待调整部分积生成部分进行取反加一处理,进而实现带符号位的运算。Specifically, in order to satisfy the signed bit operation, this embodiment first determines the partial product generation part related to the signed bit operation. In practical applications, the operation of the signed-bit fixed-point multiplier is implemented based on the complement input, where the complement of a positive number is itself, and the complement of a negative number is a signed binary (including the sign bit) which is directly fetched bit by bit Add one more. In this embodiment, the partial product generating part corresponding to the most significant bit of the input data is used as the partial product generating part to be adjusted. When the most significant bit of the input data is a negative number, then the partial product generating part to be adjusted is inverted. Add one processing, and then realize the operation with the sign bit.
举例说明,图16中展示了一个8bit×8bit乘法器的部分积生成部分生成示意图,其中前7个部分积生成部分PP 0-PP 6的生成与无符号位定点数乘法一样,而第8个部分积生成部分(PP 7)的生成需要特殊处理:当符号位B7为0时,表示正数,则PP 7为0;当符号位B7为1时,表示负数,则PP 7为A7A6A5A4A3A2A1A0取反加一操作。同理,在2bit×2bit操作中,需要对PP 1、PP 3、PP 5、PP 7进行处理,在4bit×4bit和4bit×8bit的操作中,需要对PP 3、PP 7进行处理,当符号位为0时取0,当符号位为1时对该部分积生成部分进行取反加一操作。但是需要说明的是,在8bit×16bit的乘法器的操作则需要减少这样的操作,在进行低位运算的第二乘法器中的PP 7的生成不需要类似的方法进行,只在第一乘法器中的PP 7的生成则需要以类似的方法进行。除此之外,由于是补码计算,在加法操作中需要在数据左侧进行比特位宽度扩展时,增加的数据需要和原有数据的最高位相同,以保证数值大小上的相同。类似地,如图4、5所示,在2bit×2bit和4bit×4bit的操作中,两幅图中左侧没有用到的数据位置在输入到加法树操作时,其输入的数值也需要和实际有效数据的最高位相同,而不是简单的补0操作。 For example, Figure 16 shows a schematic diagram of the generation of the partial product generation part of an 8bit×8bit multiplier, wherein the generation of the first 7 partial product generation parts PP 0 -PP 6 The generation of the partial product generation part (PP 7 ) requires special processing: when the sign bit B7 is 0, it means a positive number, then PP 7 is 0; when the sign bit B7 is 1, it means a negative number, then PP 7 is A7A6A5A4A3A2A1A0 negated Add one operation. Similarly, in the 2bit×2bit operation, PP 1 , PP 3 , PP 5 , and PP 7 need to be processed. In the 4bit×4bit and 4bit×8bit operations, PP 3 and PP 7 need to be processed. When the symbol When the bit is 0, it takes 0, and when the sign bit is 1, the partial product generation part is inverted and added by one. However, it should be noted that the operation of the 8bit×16bit multiplier needs to reduce such operations, and the generation of PP 7 in the second multiplier that performs low-bit operations does not require a similar method, only the first multiplier The generation of PP 7 in 2 needs to be carried out in a similar way. In addition, due to the complement calculation, when the bit width needs to be extended on the left side of the data in the addition operation, the added data needs to be the same as the highest bit of the original data to ensure the same value in size. Similarly, as shown in Figures 4 and 5, in the operations of 2bit×2bit and 4bit×4bit, when the unused data position on the left side of the two figures is input to the addition tree operation, the input value also needs to be the same as The highest bits of the actual valid data are the same, rather than a simple 0-fill operation.
基于上述实施例,本发明还提供一种适用于混合精度神经网络的定点乘加运算单元,如图17所示,所述运算单元包括:Based on the above embodiment, the present invention also provides a fixed-point multiply-add operation unit suitable for mixed-precision neural network. As shown in FIG. 17 , the operation unit includes:
位置确定模块01,用于获取模式信号和输入数据,根据所述模式信号确定数据输入 位置,并将所述输入数据从所述数据输入位置输入乘法器中;A position determination module 01, for acquiring a pattern signal and input data, determining a data input position according to the pattern signal, and inputting the input data into the multiplier from the data input position;
部分积处理模块02,用于根据所述模式信号对所述乘法器生成的部分积进行处理,并执行求和操作,将所述求和操作后得到的数据作为目标和;Partial product processing module 02, for processing the partial product generated by the multiplier according to the mode signal, and performing a summation operation, using the data obtained after the summation operation as a target sum;
结果生成模块03,用于对所述目标和进行截选,将截选后得到的数据作为所述输入数据的点乘运算结果。The result generation module 03 is configured to cut and select the target sum, and use the data obtained after the cut as the result of the dot product of the input data.
具体地,本实施例中采用的是统一的乘法器进行运算,然而乘法器的数量并不是固定不变的,运算单元调用的乘法器的数量可以通过所述输入数据的精度进行适应性的改变。可以理解的是当所述输入数据的最高比特位小于或者等于所述乘法器的最高比特位时,运算单元可以只调用一个乘法器既可以实现对输入数据的运算。而当所述输入数据的最高比特位大于所述乘法器的最高比特位时,则运算单元需要调用不止一个乘法器。举例说明,当运算单元中的乘法器为8bit×8bit的常规乘法器,且获取到3bit×3bit的输入数据,或者4bit×8bit的输入数据时,运算单元可以只调用一个乘法器,然后根据模式信号控制所述乘法器屏蔽指定区域的部分积以后输出部分积生成部分并对输出的部分积生成部分按不同精度对应的方法执行求和操作。当获取到8bit×16bit的输入数据时,运算单元则需要调用两个乘法器,根据模式信号控制两个乘法器屏蔽指定区域的部分积以后输出部分积生成部分,并对输出的部分积生成部分按不同精度对应的方法执行求和操作。Specifically, in this embodiment, a unified multiplier is used for operation, but the number of multipliers is not fixed, and the number of multipliers called by the operation unit can be adaptively changed according to the precision of the input data . It can be understood that when the most significant bit of the input data is less than or equal to the most significant bit of the multiplier, the operation unit may only call one multiplier to implement the operation on the input data. When the most significant bit of the input data is greater than the most significant bit of the multiplier, the operation unit needs to call more than one multiplier. For example, when the multiplier in the operation unit is a conventional multiplier of 8bit×8bit, and the input data of 3bit×3bit or the input data of 4bit×8bit is obtained, the operation unit can only call one multiplier, and then according to the mode The signal controls the multiplier to output the partial product generating part after masking the partial product of the specified area, and perform a summation operation on the output partial product generating part according to methods corresponding to different precisions. When the input data of 8bit×16bit is obtained, the operation unit needs to call two multipliers, control the two multipliers to mask the partial product of the specified area according to the mode signal, and then output the partial product generating part, and generate the partial product for the output part. The sum operation is performed according to the methods corresponding to different precisions.
综上所述,本发明公开了一种适用于混合精度神经网络的定点乘加运算单元及方法,通过将不同精度的输入数据精度从不同位置输入乘法器中,根据模式信号控制所述乘法器屏蔽指定区域的部分积以后输出部分积生成部分,并对输出的部分积生成部分按不同精度对应的方法执行求和操作,从而实现混合精度的点乘运算。本发明中采用一种乘法器即可实现混合精度神经网络的点乘运算,解决了现有技术中需要采用多种不同精度的处理单元对混合精度运算进行处理,导致的硬件开销过大,空闲资源冗余等问题。To sum up, the present invention discloses a fixed-point multiply-add operation unit and method suitable for mixed-precision neural network. By inputting the precision of input data with different precisions into the multiplier from different positions, the multiplier is controlled according to the mode signal. After masking the partial product of the specified area, output the partial product generation part, and perform the sum operation on the output partial product generation part according to the methods corresponding to different precisions, so as to realize the mixed precision dot product operation. In the present invention, a multiplier can be used to realize the point multiplication operation of the mixed-precision neural network, which solves the need to use a variety of processing units with different precisions in the prior art to process the mixed-precision operation, resulting in excessive hardware overhead and idle time. Resource redundancy and other issues.
应当理解的是,本发明的应用不限于上述的举例,对本领域普通技术人员来说,可 以根据上述说明加以改进或变换,所有这些改进和变换都应属于本发明所附权利要求的保护范围。It should be understood that the application of the present invention is not limited to the above-mentioned examples, and for those of ordinary skill in the art, improvements or transformations can be made according to the above-mentioned descriptions, and all these improvements and transformations should belong to the protection scope of the appended claims of the present invention.

Claims (10)

  1. 一种适用于混合精度神经网络的定点乘加运算方法,其特征在于,所述方法包括:A fixed-point multiply-add operation method suitable for mixed-precision neural networks, characterized in that the method comprises:
    获取模式信号和输入数据,根据所述模式信号确定数据输入位置,并将所述输入数据从所述数据输入位置输入乘法器中;acquiring a mode signal and input data, determining a data input position according to the mode signal, and inputting the input data into a multiplier from the data input position;
    根据所述模式信号对所述乘法器生成的部分积进行处理,并执行求和操作,将所述求和操作后得到的数据作为目标和;Process the partial products generated by the multiplier according to the mode signal, perform a summation operation, and use the data obtained after the summation operation as a target sum;
    对所述目标和进行截选,将截选后得到的数据作为所述输入数据的点乘运算结果。The target sum is truncated, and the data obtained after the truncated selection is used as the result of the dot product of the input data.
  2. 根据权利要求1所述的一种适用于混合精度神经网络的定点乘加运算方法,其特征在于,所述获取模式信号和输入数据,根据所述模式信号确定数据输入位置,并将所述输入数据从所述数据输入位置输入乘法器中包括:A fixed-point multiply-add operation method suitable for mixed-precision neural network according to claim 1, characterized in that, said acquiring a mode signal and input data, determining a data input position according to said mode signal, and applying said input Data from the data input location into the multiplier includes:
    获取模式信号和输入数据,根据所述输入数据的精度确定调用的乘法器的数量;Acquire the mode signal and input data, and determine the number of multipliers called according to the precision of the input data;
    当所述输入数据的最高精度高于所述乘法器的最高比特位时,调用的乘法器的数量大于1;When the highest precision of the input data is higher than the highest bit of the multiplier, the number of called multipliers is greater than 1;
    根据所述模式信号确定数据输入位置,将所述输入数据中最高精度的数据进行拆分,将拆分完毕后得到的输入数据从所述数据输入位置输入乘法器中;Determine the data input position according to the mode signal, split the data with the highest precision in the input data, and input the input data obtained after the splitting into the multiplier from the data input position;
    当所述输入数据的最高精度低于或者等于所述乘法器的最高比特位时,调用的乘法器的数量为1;When the highest precision of the input data is lower than or equal to the highest bit of the multiplier, the number of called multipliers is 1;
    根据所述模式信号确定数据输入位置,将所述输入数据从所述数据输入位置输入乘法器中。A data input location is determined based on the mode signal, and the input data is input into a multiplier from the data input location.
  3. 根据权利要求2所述的一种适用于混合精度神经网络的定点乘加运算方法,其特征在于,所述获取模式信号,根据所述模式信号对所述乘法器生成的部分积进行处理,并执行求和操作,将所述求和操作后得到的数据作为目标和包括:A fixed-point multiply-add operation method suitable for mixed-precision neural networks according to claim 2, wherein the acquiring mode signal, processing the partial product generated by the multiplier according to the mode signal, and A summation operation is performed, and the data obtained after the summation operation is used as the target sum including:
    获取模式信号,根据所述模式信号对所述乘法器生成的部分积进行处理;obtaining a mode signal, and processing the partial product generated by the multiplier according to the mode signal;
    将处理后得到的部分积生成部分拆分为第一部分积生成部分和第二部分积生成部分;Splitting the partial product generation part obtained after processing into a first partial product generation part and a second partial product generation part;
    对所述第一部分积生成部分和所述第二部分积生成部分执行求和操作,将所述求和操作后得到的数据作为目标和。A summation operation is performed on the first partial product generation part and the second partial product generation part, and the data obtained after the summation operation is used as a target sum.
  4. 根据权利要求3所述的一种适用于混合精度神经网络的定点乘加运算方法,其特征在于,所述模式信号由输入数据的精度确定;所述处理至少包括以下操作中的一种:A fixed-point multiply-add operation method suitable for mixed-precision neural networks according to claim 3, wherein the mode signal is determined by the precision of the input data; the processing includes at least one of the following operations:
    对所述乘法器生成的预设区域的部分积进行屏蔽处理;performing masking processing on the partial product of the preset area generated by the multiplier;
    当调用的乘法器的数量大于1时,对进行低位运算的乘法器输出的部分积生成部分进行移位处理。When the number of called multipliers is greater than 1, a shift process is performed on the partial product generation part output by the multipliers that perform lower order operations.
  5. 根据权利要求3所述的一种适用于混合精度神经网络的定点乘加运算方法,其特征在于,当所述输入数据为相同精度,且所述输入数据的最高比特位小于或者等于所述乘法器的最高比特位时,所述对所述第一部分积生成部分和所述第二部分积生成部分执行求和操作,并基于所述求和操作得到目标和包括:A fixed-point multiply-add operation method suitable for mixed-precision neural networks according to claim 3, characterized in that, when the input data is of the same precision, and the highest bit of the input data is less than or equal to the multiplication When the highest bit of the device is selected, performing a summation operation on the first partial product generation part and the second partial product generation part, and obtaining a target sum based on the summation operation includes:
    将所述第一部分积生成部分和所述第二部分积生成部分分别输入第一级压缩器a和第一级压缩器b中;inputting the first partial product generating part and the second partial product generating part into the first-stage compressor a and the first-stage compressor b respectively;
    将所述第一级压缩器a和所述第一级压缩器b的输出结果共同输入至第二级压缩器c中;inputting the output results of the first-stage compressor a and the first-stage compressor b together into the second-stage compressor c;
    将所述第二级压缩器c的输出结果输入加法器中,并将所述加法器的输出结果作为目标和。The output result of the second stage compressor c is input into an adder, and the output result of the adder is used as a target sum.
  6. 根据权利要求3所述的一种适用于混合精度神经网络的定点乘加运算方法,其特征在于,当所述输入数据为混合精度时,所述对所述第一部分积生成部分和所述第二部分积生成部分执行求和操作,并基于所述求和操作得到目标和包括:A fixed-point multiply-add operation method suitable for mixed-precision neural networks according to claim 3, wherein when the input data is mixed-precision, the first partial product generating part and the first partial product are generated. The bipartite product generation part performs a summation operation, and obtains the target sum based on the summation operation, including:
    获取所述输入数据的最高比特位数,并将所述输入数据的最高比特位数与所述乘法器的最高比特位数进行比较;Obtain the highest bit number of the input data, and compare the highest bit number of the input data with the highest bit number of the multiplier;
    当所述输入数据的最高比特位数等于所述乘法器的最高比特位数时,将所述第一部分积生成部分和所述第二部分积生成部分分别输入第一级压缩器a和第一级压缩器b中;When the highest bit number of the input data is equal to the highest bit number of the multiplier, the first partial product generating part and the second partial product generating part are input to the first stage compressor a and the first stage compressor a respectively. stage compressor b;
    将所述第一级压缩器a和所述第一级压缩器b的输出结果分别输入第一加法器和第二加法器中,并将所述第一加法器和所述第二加法器的输出结果之和作为目标和。The output results of the first-stage compressor a and the first-stage compressor b are respectively input into the first adder and the second adder, and the first adder and the second adder are The sum of the output results is used as the target sum.
  7. 根据权利要求3所述的一种适用于混合精度神经网络的定点乘加运算方法,其特征在于,当所述输入数据为混合精度时,所述对所述第一部分积生成部分和所述第二部分积生成部分执行求和操作,并基于所述求和操作得到目标和包括:A fixed-point multiply-add operation method suitable for mixed-precision neural networks according to claim 3, wherein when the input data is mixed-precision, the first partial product generating part and the first partial product are generated. The bipartite product generation part performs a summation operation, and obtains the target sum based on the summation operation, including:
    获取所述输入数据的最高比特位数,并将所述输入数据的最高比特位数与所述乘法器的最高比特位数进行比较;Obtain the highest bit number of the input data, and compare the highest bit number of the input data with the highest bit number of the multiplier;
    当所述输入数据的最高比特位数大于所述乘法器的最高比特位数时,所述乘法器包括第一乘法器和第二乘法器,所述第二乘法器为低位运算乘法器;所述第一乘法器输出所述第一部分积生成部分,所述第二乘法器输出所述第二部分积生成部分;When the highest bit number of the input data is greater than the highest bit number of the multiplier, the multiplier includes a first multiplier and a second multiplier, and the second multiplier is a low-order operation multiplier; the first multiplier outputs the first partial product generating part, and the second multiplier outputs the second partial product generating part;
    将所述第一部分积生成部分直接输入第一加法器;inputting the first partial product generation part directly into the first adder;
    将所述第二部分积生成部分拆分后分别输入所述第一加法器和第二加法器中;splitting the second partial product generation part and inputting them into the first adder and the second adder respectively;
    将所述第一加法器和所述第二加法器的输出结果之和作为目标和。The sum of the output results of the first adder and the second adder is used as the target sum.
  8. 根据权利要求1所述的一种适用于混合精度神经网络的定点乘加运算方法,其特征在于,所述对所述目标和进行截选,将截选后得到的数据作为所述输入数据的点乘运算结果包括:A fixed-point multiply-add operation method suitable for mixed-precision neural network according to claim 1, characterized in that, when the target sum is cut and selected, the data obtained after the cut and selection is used as the input data. The dot product results include:
    根据所述输入数据的精度确定截选位宽;Determine the truncation bit width according to the precision of the input data;
    根据所述截选位宽,将所述目标和从第0位开始进行截选操作,将所述截选操作后得到的数据作为所述输入数据的点乘运算结果。According to the truncation bit width, an truncation operation is performed on the target sum starting from the 0th bit, and the data obtained after the truncation and selection operation is used as the result of the dot product of the input data.
  9. 根据权利要求1所述的一种适用于混合精度神经网络的定点乘加运算方法,其特征在于,所述方法还包括:A fixed-point multiply-add operation method suitable for mixed-precision neural network according to claim 1, characterized in that, the method further comprises:
    确定所述输入数据的最高位对应的部分积生成部分,将所述部分积生成部分作为待调整部分积生成部分;determining the partial product generating part corresponding to the highest bit of the input data, and using the partial product generating part as the partial product generating part to be adjusted;
    当所述输入数据的最高位为负数时,对所述待调整部分积生成部分进行取反加一处 理。When the most significant bit of the input data is a negative number, inversion and addition processing is performed on the to-be-adjusted partial product generation part.
  10. 一种适用于混合精度神经网络的定点乘加运算单元,其特征在于,所述运算单元包括:A fixed-point multiply-add operation unit suitable for mixed-precision neural network, characterized in that, the operation unit includes:
    位置确定模块,用于获取模式信号和输入数据,根据所述模式信号确定数据输入位置,并将所述输入数据从所述数据输入位置输入乘法器中;a position determination module for acquiring a mode signal and input data, determining a data input position according to the mode signal, and inputting the input data into a multiplier from the data input position;
    部分积处理模块,用于根据所述模式信号对所述乘法器生成的部分积进行处理,并执行求和操作,将所述求和操作后得到的数据作为目标和;a partial product processing module, configured to process the partial product generated by the multiplier according to the mode signal, and perform a summation operation, using the data obtained after the summation operation as a target sum;
    结果生成模块,用于对所述目标和进行截选,将截选后得到的数据作为所述输入数据的点乘运算结果。The result generation module is used for intercepting the target sum, and using the data obtained after the interception as the result of the dot product of the input data.
PCT/CN2021/131800 2021-02-09 2021-11-19 Fixed-point multiply-add operation unit and method suitable for mixed-precision neural network WO2022170811A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110178992.7 2021-02-09
CN202110178992.7A CN113010148B (en) 2021-02-09 2021-02-09 Fixed-point multiply-add operation unit and method suitable for mixed precision neural network

Publications (1)

Publication Number Publication Date
WO2022170811A1 true WO2022170811A1 (en) 2022-08-18

Family

ID=76383947

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/131800 WO2022170811A1 (en) 2021-02-09 2021-11-19 Fixed-point multiply-add operation unit and method suitable for mixed-precision neural network

Country Status (2)

Country Link
CN (1) CN113010148B (en)
WO (1) WO2022170811A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113010148B (en) * 2021-02-09 2022-11-11 南方科技大学 Fixed-point multiply-add operation unit and method suitable for mixed precision neural network
US20240004952A1 (en) * 2022-06-29 2024-01-04 Mediatek Singapore Pte. Ltd. Hardware-Aware Mixed-Precision Quantization

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101916177A (en) * 2010-07-26 2010-12-15 清华大学 Configurable multi-precision fixed point multiplying and adding device
US8706790B1 (en) * 2009-03-03 2014-04-22 Altera Corporation Implementing mixed-precision floating-point operations in a programmable integrated circuit device
CN108287681A (en) * 2018-02-14 2018-07-17 中国科学院电子学研究所 A kind of single-precision floating point fusion point multiplication operation unit
CN108459840A (en) * 2018-02-14 2018-08-28 中国科学院电子学研究所 A kind of SIMD architecture floating-point fusion point multiplication operation unit
CN108694038A (en) * 2017-04-12 2018-10-23 英特尔公司 Dedicated processes mixed-precision floating-point operation circuit in the block
CN113010148A (en) * 2021-02-09 2021-06-22 南方科技大学 Fixed-point multiply-add operation unit and method suitable for mixed precision neural network

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100405289C (en) * 2005-03-08 2008-07-23 中国科学院计算技术研究所 Floating-point multiplicator and method of compatible double-prepcision and double-single precision computing
CN102591615A (en) * 2012-01-16 2012-07-18 中国人民解放军国防科学技术大学 Structured mixed bit-width multiplying method and structured mixed bit-width multiplying device
US11068238B2 (en) * 2019-05-21 2021-07-20 Arm Limited Multiplier circuit
CN110531954B (en) * 2019-08-30 2024-07-19 上海寒武纪信息科技有限公司 Multiplier, data processing method, chip and electronic equipment
CN210109863U (en) * 2019-08-30 2020-02-21 上海寒武纪信息科技有限公司 Multiplier, device, neural network chip and electronic equipment
CN110780845B (en) * 2019-10-17 2021-11-30 浙江大学 Configurable approximate multiplier for quantization convolutional neural network and implementation method thereof
CN111522528B (en) * 2020-04-22 2023-03-28 星宸科技股份有限公司 Multiplier, multiplication method, operation chip, electronic device, and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8706790B1 (en) * 2009-03-03 2014-04-22 Altera Corporation Implementing mixed-precision floating-point operations in a programmable integrated circuit device
CN101916177A (en) * 2010-07-26 2010-12-15 清华大学 Configurable multi-precision fixed point multiplying and adding device
CN108694038A (en) * 2017-04-12 2018-10-23 英特尔公司 Dedicated processes mixed-precision floating-point operation circuit in the block
CN108287681A (en) * 2018-02-14 2018-07-17 中国科学院电子学研究所 A kind of single-precision floating point fusion point multiplication operation unit
CN108459840A (en) * 2018-02-14 2018-08-28 中国科学院电子学研究所 A kind of SIMD architecture floating-point fusion point multiplication operation unit
CN113010148A (en) * 2021-02-09 2021-06-22 南方科技大学 Fixed-point multiply-add operation unit and method suitable for mixed precision neural network

Also Published As

Publication number Publication date
CN113010148B (en) 2022-11-11
CN113010148A (en) 2021-06-22

Similar Documents

Publication Publication Date Title
CN115934030B (en) Arithmetic logic unit, method and equipment for floating point number multiplication
CN110780845B (en) Configurable approximate multiplier for quantization convolutional neural network and implementation method thereof
CN107451658B (en) Fixed-point method and system for floating-point operation
US20210349692A1 (en) Multiplier and multiplication method
CN110852416B (en) CNN hardware acceleration computing method and system based on low-precision floating point data representation form
WO2022170811A1 (en) Fixed-point multiply-add operation unit and method suitable for mixed-precision neural network
CN110852434B (en) CNN quantization method, forward calculation method and hardware device based on low-precision floating point number
US20210182026A1 (en) Compressing like-magnitude partial products in multiply accumulation
CN116400883A (en) Floating point multiply-add device capable of switching precision
Venkatachalam et al. Approximate sum-of-products designs based on distributed arithmetic
CN112434801A (en) Convolution operation acceleration method for carrying out weight splitting according to bit precision
CN117908835B (en) Method for accelerating SM2 cryptographic algorithm based on floating point number computing capability
KR20220031098A (en) Signed Multi-Word Multiplier
CN114860193A (en) Hardware operation circuit for calculating Power function and data processing method
JPH04332036A (en) Floating decimal point multiplier and its multiplying system
CN113608718A (en) Method for realizing acceleration of prime number domain large integer modular multiplication calculation
JPH04205026A (en) Divider circuit
WO2023078364A1 (en) Operation method and apparatus for matrix multiplication
CN112558920A (en) Signed/unsigned multiply-accumulate device and method
CN115827555A (en) Data processing method, computer device, storage medium and multiplier structure
US20220075598A1 (en) Systems and Methods for Numerical Precision in Digital Multiplier Circuitry
CN115062768A (en) Softmax hardware implementation method and system of logic resource limited platform
JPH05204602A (en) Method and device of control signal
Zhang et al. A multiple-precision multiply and accumulation design with multiply-add merged strategy for AI accelerating
Kumar et al. Complex multiplier: implementation using efficient algorithms for signal processing application

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21925458

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21925458

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 21925458

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 20.02.2024)