WO2018196750A1 - Device for processing multiplication and addition operations and method for processing multiplication and addition operations - Google Patents

Device for processing multiplication and addition operations and method for processing multiplication and addition operations Download PDF

Info

Publication number
WO2018196750A1
WO2018196750A1 PCT/CN2018/084275 CN2018084275W WO2018196750A1 WO 2018196750 A1 WO2018196750 A1 WO 2018196750A1 CN 2018084275 W CN2018084275 W CN 2018084275W WO 2018196750 A1 WO2018196750 A1 WO 2018196750A1
Authority
WO
WIPO (PCT)
Prior art keywords
log
data
adder
value
values
Prior art date
Application number
PCT/CN2018/084275
Other languages
French (fr)
Chinese (zh)
Inventor
徐斌
陈清龙
戎建江
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2018196750A1 publication Critical patent/WO2018196750A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/544Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices for evaluating functions by calculation
    • G06F7/5443Sum of products

Definitions

  • the present application relates to the field of computers, and more particularly to an apparatus for processing multiply-add operations and a method of processing multiply-add operations.
  • the computer often uses the multiplication and addition operation when processing the input data.
  • the computer performs the multiplication and addition operation, it first multiplies the input data, and then adds the data obtained by the multiplication operation. Since the input data is generally data in a linear domain, and the data in the linear domain occupies a relatively large bit width (for example, 32 bits), the computer needs to occupy more resources when performing multiplication and addition operations.
  • the multiplication and addition operations include a large number of multiplication operations, the multiplication operation has a large computation amount and the operation speed is relatively slow, which results in a computer having a low computational efficiency when performing multiplication and addition operations.
  • the prior art proposes a scheme for processing multiplication and addition operations, which converts input data in a linear domain into data in a logarithmic domain, thereby converting multiplication operations in a linear domain into logarithms. Addition in the domain.
  • the bit width occupied by the data can be reduced (for example, the original data is 32-bit data, and the bit width occupied by the logarithm becomes 5 bits). Converting multiplications in the linear domain to additions in the logarithmic domain also increases computational efficiency.
  • the above scheme also needs to reconvert the data in the logarithmic domain into data in the linear domain, and add the data in these linear domains to obtain the final result of multiply and accumulate. result.
  • the computer still needs to occupy more resources when performing the addition operation.
  • the present application provides an apparatus and method for processing a multiply-accumulate operation to reduce computational power consumption.
  • an apparatus for processing a multiply-add operation comprising: a first adder for adding the input first data and the second data to obtain first intermediate data, wherein the The values of the first data and the second data are log a A and log a B, respectively, and the value of the first intermediate data is m, and the first data and the second data are the number of the plurality of original data a raw data A and a second original data B are respectively obtained by taking a logarithm; a second adder is configured to add the input third data and the fourth data to obtain second intermediate data, wherein the first The values of the three data and the fourth data are log a C and log a D, respectively, and the value of the second intermediate data is n, and the third data and the fourth data are in the plurality of original data.
  • the third original data C and the fourth original data D are respectively obtained by taking a logarithm, wherein a is an integer greater than 0 and not equal to 1, m and n are real numbers, and m is greater than or equal to n; a logarithmic adder, An input port of the logarithmic adder and the first adder and the Two adder output port connected to said adders for obtaining a nm according to the first adder and said second adder input m and n, and m and a nm is determined and the approximation ( A value of log e a )*log a (A*B+C*D); wherein the first adder, the second adder, and the logarithmic adder are implemented by a hardware circuit.
  • the first adder, the second adder, and the logarithmic adder may be implemented by using various hardware circuits such as an Application Specific Integrated Circuit (ASIC) or a Field Programmable Gate Array (FPGA). .
  • ASIC Application Specific Integrated Circuit
  • FPGA Field Programmable Gate Array
  • the data operation for converting the high bit width data operation to the low bit width is realized, which can be reduced in the calculation process.
  • a m and higher data a n is calculated by adding the bit width and data bit width plus and lower It can avoid the use of a high bit width adder, which can reduce the area of the computing chip and reduce the calculation power consumption. It should also be understood that the above A, B, C, and D are all real numbers greater than zero.
  • the above numerical value which approximates the sum of m and a nm to (log e a )*log a (A*B+C*D) may be the sum of m and a nm as (log e a )*log a (A Approximate value of *B+C*D).
  • the logarithmic adder may be further configured to obtain a nm according to m and n input by the first adder and the second adder, and determine a sum of m and -a nm as (log e a )*log a (A*BC*D) value.
  • the above multiplication and addition operation is a generalized multiplication operation, which may include an addition operation between products, or may include a subtraction operation between products.
  • the above multiplication operation may include A*B+C*D or A*B-C*D.
  • the logarithmic adder is configured to derive a nm from m and n of the first adder and the second adder input, and to The sum of a nm is approximately determined as a value of (log e a )*log a (A*B+C*D), including: determining a target accuracy to be achieved when processing the plurality of original data; In the case where the accuracy is lower than the first precision, the sum of m and a nm is approximately determined as a value of (log e a )*log a (A*B+C*D).
  • the first precision described above may be preset, and when the target accuracy is lower than the first precision, the accuracy required for processing the original data may be considered to be low.
  • the accuracy requirement for processing the original data can be determined.
  • the m+a nm approximation can be directly determined as (log e a )*log a (A
  • the value of *B+C*D) can flexibly determine the value of (log e a )*log a (A*B+C*D) according to the accuracy requirement of processing the original data, and can ensure the accuracy requirement of the original data, and Improve computing efficiency.
  • the logarithmic adder is specifically configured to: determine an error compensation value of a nm according to an error compensation table, where the error compensation table includes K values and The error compensation value of the K values, wherein the K values are obtained by dividing [-1, 1] into K shares, and the K error compensation values are obtained by substituting the K values into an error compensation term
  • the obtained K and L are integers greater than 1; the sum of the error compensation values of m+a nm and a nm is approximately determined as the value of (log e a )*log a (A*B+C*D).
  • the error compensation value of a nm can be taken into account in determining the value of (log e a )*log a (A*B+C*D), which can further improve the calculation accuracy.
  • the logarithmic adder approximates the sum of the error compensation values of m+a nm and a nm to (log e a )*log a (A*
  • the value of B+C*D) includes: determining a target accuracy to be achieved when processing the plurality of original data; and if the target accuracy is higher than the second precision, m+a nm and a nm
  • the sum of the error compensation values is approximately determined as the value of (log e a )*log a (A*B+C*D).
  • the target precision is higher than the second precision, it can be considered that the precision required for the processing of the original data is high, and at this time, a value of (log e a )*log a (A*B+C*D) can be determined.
  • the error compensation value of nm is taken into account to ensure the accuracy of the value of (log e a )*log a (A*B+C*D).
  • the second precision may be the same as the first precision, and the second precision may be greater than the first precision.
  • the K is determined based on the target accuracy.
  • K When the target precision is high, K can be a large value, and when the target precision is low, K can be a small value.
  • the L is determined based on the target accuracy.
  • L When the value of L is larger, the more the number of items of the error compensation term, the more accurate the error compensation value obtained according to the error compensation term. Therefore, when the target precision is high, L can be a large value, and When the target accuracy is low, L can be a smaller value.
  • the logarithmic adder specifically includes: a shifting circuit for performing a shift operation on a according to nm to obtain a nm ; a sub-addition circuit for Adding m and a nm gives m+a nm .
  • the logarithmic adder further includes: a subtraction circuit for subtracting m and n to obtain mn or nm; and a comparison circuit for comparing mn Or a relationship between nm and zero; a selection circuit for selecting m and nm in the case where mn is greater than or equal to zero, or for selecting m and nm in the case where nm is less than or equal to zero.
  • the apparatus further comprises: a converter for approximating A*B according to (log e a )*log a (A*B+C*D) approximation A value of +C*D, wherein the converter is implemented by a hardware circuit.
  • the apparatus further includes: a quantizer for quantizing the value of the A*B+C*D to achieve a preset data bit width .
  • a method for processing a multiply-add operation comprising: adding an input first data and a second data to obtain first intermediate data, wherein the first data and the first The values of the two data are log a A and log a B, respectively, the value of the first intermediate data is m, and the first data and the second data are the first original data A and the second of the plurality of original data.
  • the raw data B is obtained by taking the logarithm respectively; adding the third data and the fourth data to obtain the second intermediate data, wherein the values of the third data and the fourth data are respectively log a C and log a D, the value of the second intermediate data is n, and the third data and the fourth data are respectively paired with the third original data C and the fourth original data D of the plurality of original data Obtained after the number, where a is an integer greater than 0 and not equal to 1, m and n are real numbers, and m is greater than or equal to n; according to m and n input by the first adder and the second adder a nm and approximate the sum of m and a nm as (log e a )*log a (A*B+C*D) The value.
  • the m and n inputs according to the first adder and the second adder obtain a nm and approximate the sum of m and a nm
  • Determining the value of (log e a )*log a (A*B+C*D) includes: determining a target accuracy that needs to be achieved when processing the plurality of original data; and the target accuracy is lower than the first precision In the case, the sum of m and a nm is approximately determined as the value of (log e a )*log a (A*B+C*D).
  • the method further comprises: determining an error compensation value of a nm according to the error compensation table, wherein the error compensation table includes K values and the K Numerical error compensation value, wherein the K values are obtained by dividing [-1, 1] into K shares, and the K error compensation values are obtained by substituting the K values into an error compensation term
  • the obtained K and L are integers greater than 1; the sum of the error compensation values of m+a nm and a nm is approximately determined as the value of (log e a )*log a (A*B+C*D).
  • the sum of the error compensation values of m+a nm and a nm is approximately (log e a )*log a (A*B+C*
  • the value of D) includes: determining a target accuracy that needs to be achieved when processing the plurality of original data; and correcting an error of m+a nm and a nm when the target accuracy is higher than the second precision The sum is approximately determined as the value of (log e a )*log a (A*B+C*D).
  • the K is determined based on the target accuracy.
  • the m and n inputs according to the first adder and the second adder obtain a nm and approximate the sum of m and a nm
  • the value determined as (log e a )*log a (A*B+C*D) includes: shifting a according to nm to obtain a nm ; adding m and a nm to obtain m+a Nm .
  • the m and n inputs according to the first adder and the second adder obtain a nm and approximate the sum of m and a nm
  • the value determined as (log e a )*log a (A*B+C*D) includes: subtracting m and n to obtain mn or nm; comparing the magnitude relationship of mn or nm with zero; In the case of being equal to zero, m and nm are selected, or, in the case where nm is less than or equal to zero, m and nm are selected.
  • the method further comprises: approximating A*B+C*D according to (log e a )*log a (A*B+C*D) Value, wherein the converter is implemented by a hardware circuit.
  • the method further comprises: quantizing the value of the A*B+C*D to achieve a preset data bit width.
  • FIG. 1 is a schematic flow chart of a method for processing a multiply-and-accumulate operation in the prior art
  • FIG. 2 is a schematic block diagram of an apparatus for processing a multiply-and-accumulate operation according to an embodiment of the present application
  • FIG. 3 is a schematic block diagram of an apparatus for processing a multiply-and-accumulate operation in an embodiment of the present application
  • FIG. 4 is a schematic block diagram of an apparatus for processing a multiply-and-accumulate operation according to an embodiment of the present application
  • FIG. 5 is a schematic flowchart of a method for processing a multiply-and-accumulate operation according to an embodiment of the present application
  • FIG. 6 is a schematic flowchart of a method for processing a multiply and add operation in an embodiment of the present application.
  • FIG. 1 shows a schematic flow chart of a method of processing a multiply-and-accumulate operation in the prior art.
  • four multipliers (a first multiplier, a second multiplier, a third multiplier, and a fourth multiplier) respectively multiply four pairs of data to obtain four 32-bit data, and then, The first adder and the second adder respectively add four 32-bit data outputted by the four multipliers to obtain two 32-bit data, and then the third adder and the second adder and the second adder The two 32-bit data output by the adder is added to obtain a 32-bit data, and finally a 32-bit data obtained by the addition is quantized to obtain 16-bit data.
  • the prior art proposes a scheme for processing the multiply-and-accumulate operation. This scheme converts data in a linear domain into data in a logarithmic domain, thereby transforming multiplication operations in the linear domain into addition operations in the logarithmic domain.
  • the data in the linear domain occupies more bits (for example, 2 x+y and 2 z+w occupy 32 bits of data width), therefore, After converting the data in the log domain into data in the linear domain, it is still necessary to use a high bit width adder to perform the addition, resulting in more resources that the computer still needs to occupy when performing the addition operation.
  • the embodiment of the present application proposes a device for processing a multiply-and-accumulate operation, which is capable of converting an addition operation between data of an exponential form of a higher bit width into an addition operation of data of a lower bit width, and is capable of The computational process reduces the use of resources, thereby reducing computational power consumption.
  • FIG. 2 is a schematic block diagram of an apparatus for processing data according to an embodiment of the present application.
  • the apparatus 200 of Figure 2 includes:
  • the first adder 210 is configured to add the input first data and the second data to obtain the first intermediate data, wherein the values of the first data and the second data are log a A and log a B, respectively
  • the value of an intermediate data is m, and the first data and the second data are obtained by taking a logarithm of the first original data A and the second original data B of the plurality of original data respectively;
  • a second adder 220 configured to add the input third data and the fourth data to obtain second intermediate data, wherein the values of the third data and the fourth data are log a C and log a D, respectively
  • the value of the second intermediate data is n
  • the third data and the fourth data are obtained by taking the logarithm of the third original data C and the fourth original data D of the plurality of original data respectively, wherein a is greater than 0 and not An integer equal to 1, m and n are real numbers, and m is greater than or equal to n.
  • the above raw data may be RGB pixel data when the image is processed.
  • the value of a above may be 2.
  • the product operation between the original data may be first converted into an addition operation in the logarithmic domain, and then a plurality of intermediate data in an exponential form are obtained.
  • the logarithmic adder 230 the input port of the logarithmic adder 230 is connected to the output ports of the first adder 210 and the second adder 220, and the logarithmic adder 230 is used according to the first adder 210 and the second adder 220.
  • the input m and n are a nm , and the sum of m and a nm is approximately determined as the value of (log e a )*log a (A*B+C*D).
  • the first adder 210, the second adder 220, and the logarithmic adder 230 described above may be implemented by hardware circuits. Specifically, the first adder 210, the second adder 220, and the logarithmic adder 230 may be based on an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), or the like. A variety of hardware circuits are implemented.
  • ASIC Application Specific Integrated Circuit
  • FPGA Field Programmable Gate Array
  • the data operation for converting the high bit width data operation to the low bit width is realized, which can be reduced in the calculation process.
  • the addition of the high bit width of a m and a n is converted into m.
  • the addition of a low bit width to a nm reduces the occupation of system resources during the calculation process and improves computational efficiency.
  • the logarithmic adder 230 may determine the sum of m and a nm to be approximately (log e a )*log a (A*B+C*D), or m and -a nm. The sum is approximately determined as the value of (log e a )*log a (A*BC*D).
  • the above multiplication and addition operation is a generalized multiplication and addition operation, and may include an addition operation between products, or may include a subtraction operation between products.
  • the multiply-accumulate operation may include A*B+C*D or A*B-C*D.
  • the logarithmic adder 230 obtains a nm at m and n input according to the first adder 210 and the second adder 220, and approximates the sum of m and a nm to (log e a )*log
  • the value of a (A*B+C*D) specifically includes: determining the target accuracy to be achieved when processing a plurality of original data; and the sum of m and a nm when the target accuracy is lower than the first precision Approximately determined as the value of (log e a )*log a (A*B+C*D).
  • the first precision described above may be preset, and when the target accuracy is lower than the first precision, the accuracy required for processing the original data may be considered to be low.
  • the accuracy requirement for processing the original data can be determined.
  • the m+a nm approximation can be directly determined as (log e a )*log a (A *B+C*D) value. Therefore, the present application can flexibly determine the value of (log e a )*log a (A*B+C*D) according to the precision requirement of processing the original data, can ensure the accuracy requirement of the original data, and improve the operation efficiency.
  • the logarithmic adder 230 is specifically configured to: determine an error compensation value of a nm according to the error compensation table, where the error compensation table includes K values and error compensation values of K values, wherein K The value is obtained by dividing [-1,1] into K parts, and K error compensation values are substituted for K values into the error compensation term.
  • the obtained K and L are integers greater than 1; the sum of the error compensation values of m+a nm and a nm is approximately determined as the value of (log e a )*log a (A*B+C*D).
  • the error compensation value of a nm can be taken into account in determining the value of (log e a )*log a (A*B+C*D), which can further improve the calculation accuracy.
  • the K values may be divided into [0, 1] K shares were obtained.
  • the K values may be obtained by dividing [-1, 0] into K shares.
  • determining the error compensation value of a nm according to the error compensation table may be determining the error compensation value of a nm by querying the error compensation table. Specifically, the error compensation table may first query a value closest to a nm among the K values, and then determine the error compensation value of the value as the error compensation value of a nm .
  • the logarithmic adder 230 determines the sum of the error compensation values of m+a nm and a nm to be a value of (log e a )*log a (A*B+C*D), specifically including: determining The target accuracy to be achieved when processing multiple raw data; if the target accuracy is higher than the second precision, the sum of the error compensation values of m+a nm and a nm is approximately (log e a )*log The value of a (A*B+C*D).
  • the target precision is higher than the second precision, it can be considered that the precision required for the processing of the original data is high, and at this time, a value of (log e a )*log a (A*B+C*D) can be determined.
  • the error compensation value of nm is taken into account to ensure the accuracy of the value of (log e a )*log a (A*B+C*D).
  • the second precision described above may be the same as the first precision.
  • the logarithmic adder 230 may determine the absolute value of the nm and the first threshold when determining the value of (log e a )*log a (A*B+C*D). Size relationship; if the absolute value of nm is greater than or equal to the first threshold, logarithmic adder 230 may directly determine m as a value of (log e a )*log a (A*B+C*D).
  • the first threshold value is 5
  • the absolute value is greater than a first threshold nm
  • -8 A value much smaller than 10
  • the value may be ignored
  • a -8, 10 directly determined Is the value of (log e a )*log a (A*B+C*D).
  • the logarithmic adder 230 When the absolute value of nm is less than the first threshold, the logarithmic adder 230 still determines the sum of m and a nm approximately as a value of (log e a )*log a (A*B+C*D).
  • K is determined based on target accuracy. Specifically, K may be a larger value when the target precision is higher, and K may be a smaller value when the target precision is lower.
  • L is determined based on target accuracy.
  • L when the value of L is larger, the more the number of items of the error compensation term, the more accurate the error compensation value obtained according to the error compensation term, and the smaller the value of L is, the smaller the number of items of the error compensation term is.
  • the flexibility of the original data processing can be flexibly adjusted by flexibly setting the values of K and L.
  • the logarithmic adder 230 specifically includes:
  • the shift circuit 2301 is configured to perform a shift operation on a according to nm to obtain a nm ;
  • Sub-addition circuit 2302 is used to add m and a nm to obtain m+a nm .
  • the logarithmic adder 230 further includes:
  • the subtraction circuit 2303 is configured to perform subtraction on m and n to obtain m-n or n-m;
  • Comparation circuit 2304 for comparing the magnitude relationship of m-n or n-m with zero
  • the selection circuit 2305 is configured to select m and n-m if m-n is greater than or equal to zero, or to select m and n-m if n-m is less than or equal to zero.
  • the shift circuit 2301 may first acquire nm from the selection circuit 2305 before performing a shift operation on a according to nm, and the sub-addition circuit 2302 may first select from the selection circuit 2305 before adding m and a nm . Get m.
  • the subtraction circuit 2303 when the subtraction circuit 2303 performs subtraction on m and n, either one of them may be subtracted and the other may be subtracted, thereby obtaining m-n or n-m.
  • the foregoing apparatus 200 further includes: a converter 240, configured to approximate the value of A*B+C*D according to (log e a )*log a (A*B+C*D) .
  • the apparatus 200 further includes: a quantizer 250, configured to quantize the value of the A*B+C*D to reach a preset data bit width.
  • the converter 240 and the quantizer 250 can be implemented by hardware circuits. Specifically, the converter 240 and the quantizer 250 can be implemented based on hardware circuits such as an ASIC and an FPGA.
  • quantification refers to matching data of different bit widths.
  • the bit width of the data obtained in the first step is 8 bits
  • the bit width required for the second step operation is 5 bits
  • 8 The bit data is truncated into 5 bits of data to meet the calculation of the bit width requirement in the second step.
  • the specific implementation may be to adjust the maximum value of more than 5 bits of the 8-bit data to the 5-bit maximum value, which will be less than The 5-bit minimum is adjusted to the 5-bit minimum, and the other values are unchanged.
  • FIG. 3 is a schematic block diagram of a logarithmic adder 300 for processing a multiply-and-accumulate operation in an embodiment of the present application.
  • the logarithmic adder 300 specifically includes a subtraction circuit 310, a comparison circuit 320, a selection circuit 330, a shift circuit 340, an error compensation circuit 350, and an addition circuit 360.
  • n and m are the input 5 bits of data (assuming m>n), and sign indicates whether the sign bits of n and m are the same. For example, when sign is 1, it means that a m and a n have the same number, and when sign is 0, it means a.
  • the m and a n different numbers herein the case where sign is 1), the specific steps of the device 300 for calculating a m + a n are as follows:
  • the subtraction circuit 310 makes a difference between n and m, and obtains n-m or m-n;
  • the comparison circuit 320 obtains the result n-m or m-n calculated by the subtraction circuit 310, and compares the size of n-m or m-n with zero;
  • the selection circuit 330 selects a larger number m and n-m from n and m according to the magnitude relationship of n-m or m-n and zero;
  • the shift circuit 340 performs a shift operation on a according to nm to obtain a nm ;
  • the error compensation circuit 350 calculates an error compensation value of a nm .
  • the error compensation circuit 350 may specifically be a multiple-selector combination combination circuit.
  • the error compensation circuit 350 may also be referred to as an error compensation table, that is, a dotted line portion in the figure.
  • error(x) represents the sum of the quadratic term and the high-order term in the expansion, and as long as a sufficiently high number of items are retained, a sufficiently high precision can be ensured.
  • the error(a nm ) is expanded according to the Taylor series. According to the accuracy requirement, the higher order items of the third level, the fourth level or more are retained, and the value ranges of x belonging to [-1, 1] are equally divided into K equal parts ( K is a positive integer), and the result is recorded into a K-select 1 selector combination circuit, which is called an error compensation table. For scenes with high computational accuracy requirements, the error compensation value is added to the results of other parts of the logarithmic addition circuit; for scenarios with low computational accuracy requirements, all circuits related to the error compensation table can be turned off, and this part of the function is not used.
  • the adder 360 adds the error compensation values of m, a nm, and a nm to obtain a value of (log e a )*log a (a m + a n ).
  • the log adder 300 may further be based on (log e a )*log a (a m +a n ) The value is used to determine the value of a m + a n , or the value of a m + a n is not calculated, but the value of (log e a )*log a (a m + a n ) is input to other arithmetic circuits for calculation. .
  • the device 400 of FIG. 4 is composed of a central processing unit (CPU), a double data rate synchronous dynamic random access memory (DDR) memory, an AXI bus, and a computing chip.
  • the computing chip includes an input buffer module, a calculation engine module, an output control module, and the like.
  • the input buffer module is configured to store the input raw data
  • the calculation engine module is used to calculate the original data
  • the output control module controls the output of the calculation result output by the calculation engine module.
  • the apparatus 200 shown in FIG. 2 and the apparatus 300 shown in FIG. 3 may correspond to the computing chip in FIG. 4, which is capable of implementing the processing of data by the apparatus 200 and the apparatus 300 above.
  • the above apparatus 200 and apparatus 300 may also directly correspond to the calculation engine module in FIG. 4, which is capable of implementing the processing of data by the apparatus 200 and the apparatus 300 above.
  • the above calculation engine module may also be implemented based on a hardware circuit.
  • FIG. 5 is a schematic flowchart of a multiplication and addition operation performed by the apparatus for processing multiplication and addition operations in the embodiment of the present application. Specifically, FIG. 5 may specifically represent a schematic flowchart of the above-described multiplication and addition operation of the device 400. It should be understood that FIG. 5 may represent a calculation process of multiplying and accumulating a plurality of data.
  • the input buffer module converts image data in the buffered linear domain into data in a logarithmic domain
  • the calculation engine module adds the values in the logarithmic domain to calculate a result of multiplying the values in the linear domain;
  • the calculation engine module adds the results obtained by multiplying the data in the linear domain, and completes the addition operation of the index through the comparison circuit, the shift circuit, and the error compensation circuit to obtain a processing result.
  • the output control module quantizes the data output by the calculation engine module, aligns the data bit width of the next-level operation, and outputs the data.
  • steps 502 to 504 may be repeated in the actual calculation process.
  • the apparatus for processing the multiply-and-accumulate operation of the embodiment of the present application is described in detail above with reference to FIG. 2 to FIG. 4 .
  • the method for processing the multiplication and addition operation of the embodiment of the present application will be described below with reference to FIG. 6 .
  • the apparatus for processing multiply-add operation in FIGS. 2 to 4 can implement the processing multiplication and addition operation in FIG. 6, the processing multiplication and addition operation in FIG. 6, and the processing multiplication and addition operation in FIGS. 2 to 5.
  • the device is corresponding. For the sake of brevity, the repeated description is appropriately omitted below.
  • FIG. 6 is a schematic flowchart of a method for processing data according to an embodiment of the present application.
  • the method of FIG. 6 can be performed by the apparatus 200, the apparatus 300, or the apparatus 400 that processes the data described above.
  • the method 600 of Figure 6 includes:
  • the data operation for converting the high bit width data operation to the low bit width is realized, which can be reduced in the calculation process.
  • a m and higher data a n is calculated by adding the bit width and data bit width plus low
  • the use of a high bit width adder can be avoided, which can reduce the area of the computing chip and reduce the calculation power consumption.
  • the above a may specifically be 2.
  • the m and n inputs according to the first adder and the second adder obtain a nm , and the sum of m and a nm is approximated as (log e a )
  • the value of *log a (A*B+C*D) includes: determining a target accuracy that needs to be achieved when processing the plurality of original data; and in the case where the target accuracy is lower than the first precision, m and The sum of a nm is approximately determined as the value of (log e a )*log a (A*B+C*D).
  • the first precision described above may be preset, and when the target accuracy is lower than the first precision, the accuracy required for processing the original data may be considered to be low.
  • the accuracy requirement for processing the original data can be determined.
  • the m+a nm approximation can be directly determined as (log e a )*log a (A
  • the value of *B+C*D) can flexibly determine the value of (log e a )*log a (A*B+C*D) according to the accuracy requirement of processing the original data, and can ensure the accuracy requirement of the original data, and Improve computing efficiency.
  • the method 600 further includes: determining an error compensation value of a nm according to the error compensation table, where the error compensation table includes K values and error compensation values of the K values, where The K values are obtained by dividing [-1, 1] into K shares, and the K error compensation values are obtained by substituting the K values into the error compensation term
  • the obtained K and L are integers greater than 1; the sum of the error compensation values of m+a nm and a nm is approximately determined as the value of (log e a )*log a (A*B+C*D).
  • the error compensation value of a nm can be taken into account in determining the value of (log e a )*log a (A*B+C*D), which can further improve the calculation accuracy.
  • the sum of the error compensation values of m+a nm and a nm is approximately determined as a value of (log e a )*log a (A*B+C*D), including: Determining a target accuracy that needs to be achieved when processing the plurality of original data; and if the target accuracy is higher than the second precision, determining a sum of error compensation values of m+a nm and a nm is determined as (log e a )*log a (A*B+C*D) value.
  • the target precision is higher than the second precision, it can be considered that the precision required for the processing of the original data is high, and at this time, a value of (log e a )*log a (A*B+C*D) can be determined.
  • the error compensation value of nm is taken into account to ensure the accuracy of the value of (log e a )*log a (A*B+C*D).
  • the second precision described above may be the same as the first precision.
  • the K is determined according to the target accuracy.
  • the L is determined according to the target accuracy.
  • K When the target precision is high, K can be a large value, and when the target precision is low, K can be a small value.
  • L When the value of L is larger, the more the number of items of the error compensation term, the more accurate the error compensation value obtained according to the error compensation term. Therefore, when the target precision is high, L can be a large value, and When the target accuracy is low, L can be a smaller value.
  • the m and n inputs according to the first adder and the second adder obtain a nm
  • the sum of m and a nm is approximated as (log e a ) *log a (A*B+C*D) values, including: shifting a according to nm to obtain a nm ; adding m and a nm to obtain m+a nm .
  • the m and n inputs according to the first adder and the second adder obtain a nm
  • the sum of m and a nm is approximated as (log e a ) *log a (A*B+C*D) value, including: subtracting m and n to obtain mn or nm; comparing mn or nm to zero; if mn is greater than or equal to zero, select m and nm, or, for the case where nm is less than or equal to zero, m and nm are selected.
  • the method 600 further includes: obtaining a value of A*B+C*D according to (log e a )*log a (A*B+C*D).
  • the foregoing method 600 further includes: quantizing the value of the A*B+C*D to reach a preset data bit width.
  • the disclosed systems, devices, and methods may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the functions may be stored in a computer readable storage medium if implemented in the form of a software functional unit and sold or used as a standalone product.
  • the technical solution of the present application which is essential or contributes to the prior art, or a part of the technical solution, may be embodied in the form of a software product, which is stored in a storage medium, including
  • the instructions are used to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present application.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like, which can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Computing Systems (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Complex Calculations (AREA)
  • Image Processing (AREA)

Abstract

The present application provides a device and method for processing multiplication and addition operations. The device comprises: a first adder, used for performing an addition operation on inputted first data and second data to obtain first intermediate data, wherein values the first data and the second data are respectively logaA and logaB; a second adder, used for performing an addition operation on inputted third data and fourth data to obtain second intermediate data, wherein values of the third data and the fourth data are respectively logaC and logaD, and the value of the second intermediate data is N; a logarithm adder, used for obtaining an-m according to m and n inputted by the first adder and the second adder and approximately determining the sum of m and an-m as the value of (loge a)*loga(A*B+C*D), wherein the first adder, the second adder, and the logarithm adder are implemented by hardware circuits. According to the present application, calculation power consumption can be reduced during a calculation process.

Description

处理乘加运算的装置和处理乘加运算的方法Device for processing multiplication and addition operations and method for processing multiplication and addition operations
本申请要求于2017年04月24日提交中国专利局、申请号为201710269126.2、申请名称为“处理乘加运算的装置和处理乘加运算的方法”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed on April 24, 2017, the Chinese Patent Office, the application number is 201710269126.2, and the application name is "the device for processing the multiply-and-accumulate operation and the method for processing the multiply-and-accumulate operation". The citations are incorporated herein by reference.
技术领域Technical field
本申请涉及计算机领域,并且更具体地,涉及一种处理乘加运算的装置和处理乘加运算的方法。The present application relates to the field of computers, and more particularly to an apparatus for processing multiply-add operations and a method of processing multiply-add operations.
背景技术Background technique
计算机在对输入数据进行处理时经常会用到乘加运算,计算机在执行乘加运算时是先对输入数据进行乘法运算,然后再对乘法运算得到的数据进行加法运算。由于输入数据一般是线性域中的数据,而线性域中的数据占用的位宽一般比较大(例如,32位),这样计算机在进行乘加运算时就需要占用较多的资源。此外,由于乘加运算中包含了大量的乘法运算,而乘法运算的运算量较大,运算速度也比较慢,这样就导致计算机在进行乘加运算时运算效率较低。The computer often uses the multiplication and addition operation when processing the input data. When the computer performs the multiplication and addition operation, it first multiplies the input data, and then adds the data obtained by the multiplication operation. Since the input data is generally data in a linear domain, and the data in the linear domain occupies a relatively large bit width (for example, 32 bits), the computer needs to occupy more resources when performing multiplication and addition operations. In addition, since the multiplication and addition operations include a large number of multiplication operations, the multiplication operation has a large computation amount and the operation speed is relatively slow, which results in a computer having a low computational efficiency when performing multiplication and addition operations.
为了解决上述问题,现有技术中提出了一种处理乘加运算的方案,该方案将线性域中的输入数据转化为对数域中的数据,从而将线性域中的乘法运算转化为对数域中的加法运算。通过将线性域中的数据转化为对数域中的数据能够减少数据占用的位宽(例如,原始数据为32位的数据,取对数后占用的位宽变成了5位),此外,将线性域中的乘法转化为对数域中的加法运算也能提高计算效率。In order to solve the above problem, the prior art proposes a scheme for processing multiplication and addition operations, which converts input data in a linear domain into data in a logarithmic domain, thereby converting multiplication operations in a linear domain into logarithms. Addition in the domain. By converting the data in the linear domain into data in the logarithmic domain, the bit width occupied by the data can be reduced (for example, the original data is 32-bit data, and the bit width occupied by the logarithm becomes 5 bits). Converting multiplications in the linear domain to additions in the logarithmic domain also increases computational efficiency.
但是,上述方案在完成对数域中的加法运算后,还需要将对数域中的数据再转化为线性域中的数据,并对这些线性域中的数据进行加法运算以得到乘累加的最终结果。在进行加法运算时,由于线性域中的数据占用的位宽较大,从而导致计算机在执行加法运算时仍需要占用较多的资源。However, after completing the addition in the logarithmic domain, the above scheme also needs to reconvert the data in the logarithmic domain into data in the linear domain, and add the data in these linear domains to obtain the final result of multiply and accumulate. result. When performing the addition operation, since the bit width occupied by the data in the linear domain is large, the computer still needs to occupy more resources when performing the addition operation.
发明内容Summary of the invention
本申请提供一种处理乘加运算的装置和方法,以降低计算功耗。The present application provides an apparatus and method for processing a multiply-accumulate operation to reduce computational power consumption.
第一方面,提供了一种处理乘加运算的装置,该装置包括:第一加法器,用于对输入的第一数据和第二数据进行加法运算,得到第一中间数据,其中,所述第一数据和所述第二数据的数值分别为log aA和log aB,所述第一中间数据的数值为m,所述第一数据和第二数据是对多个原始数据中的第一原始数据A和第二原始数据B分别取对数后得到的;第二加法器,用于对输入的第三数据和第四数据进行加法运算,得到第二中间数据,其中,所述第三数据和所述第四数据的数值分别为log aC和log aD,所述第二中间数据的数值为n,所述第三数据和第四数据是对所述多个原始数据中的第三原始数据C和第四原始数据D 分别取对数后得到的,其中,a为大于0且不等于1的整数,m和n为实数,且m大于等于n;对数加法器,所述对数加法器的输入端口与所述第一加法器以及所述第二加法器的输出端口相连,所述对数加法器用于根据所述第一加法器和所述第二加法器输入的m和n得到a n-m,并将m与a n-m的和近似确定为(log e a)*log a(A*B+C*D)的数值;其中,所述第一加法器、所述第二加法器以及所述对数加法器由硬件电路实现。 In a first aspect, an apparatus for processing a multiply-add operation is provided, the apparatus comprising: a first adder for adding the input first data and the second data to obtain first intermediate data, wherein the The values of the first data and the second data are log a A and log a B, respectively, and the value of the first intermediate data is m, and the first data and the second data are the number of the plurality of original data a raw data A and a second original data B are respectively obtained by taking a logarithm; a second adder is configured to add the input third data and the fourth data to obtain second intermediate data, wherein the first The values of the three data and the fourth data are log a C and log a D, respectively, and the value of the second intermediate data is n, and the third data and the fourth data are in the plurality of original data. The third original data C and the fourth original data D are respectively obtained by taking a logarithm, wherein a is an integer greater than 0 and not equal to 1, m and n are real numbers, and m is greater than or equal to n; a logarithmic adder, An input port of the logarithmic adder and the first adder and the Two adder output port connected to said adders for obtaining a nm according to the first adder and said second adder input m and n, and m and a nm is determined and the approximation ( A value of log e a )*log a (A*B+C*D); wherein the first adder, the second adder, and the logarithmic adder are implemented by a hardware circuit.
上述第一加法器、第二加法器以及对数加法器可以是基于专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field Programmable Gate Array,FPGA)等各种硬件电路实现的。The first adder, the second adder, and the logarithmic adder may be implemented by using various hardware circuits such as an Application Specific Integrated Circuit (ASIC) or a Field Programmable Gate Array (FPGA). .
本申请中,通过将指数形式的数据之间的加和转化为具有较低位宽的数值的加和,实现了将高位宽的数据运算转化到低位宽的数据运算,能够在计算过程中减少对资源的占用,从而降低计算功耗。In the present application, by converting the sum of the data in the exponential form into the sum of the values having the lower bit width, the data operation for converting the high bit width data operation to the low bit width is realized, which can be reduced in the calculation process. The use of resources, thereby reducing computing power consumption.
应理解,与a m、a n相比,m与a n-m是位宽较低的数据,通过位宽较低的数据的加和来计算位宽较高的数据a m与a n的加和,能够避免采用高位宽的加法器,能够降低计算芯片的面积,降低计算功耗。还应理解,上述A、B、C、D均为大于0的实数。 It should be understood, as compared with a m, a n, m and a nm is the lower-bit wide data, a m and higher data a n is calculated by adding the bit width and data bit width plus and lower It can avoid the use of a high bit width adder, which can reduce the area of the computing chip and reduce the calculation power consumption. It should also be understood that the above A, B, C, and D are all real numbers greater than zero.
上述将m与a n-m的和近似确定为(log e a)*log a(A*B+C*D)的数值可以是将m与a n-m的和作为(log e a)*log a(A*B+C*D)的近似值。 The above numerical value which approximates the sum of m and a nm to (log e a )*log a (A*B+C*D) may be the sum of m and a nm as (log e a )*log a (A Approximate value of *B+C*D).
应理解,上述a具体可以为2。It should be understood that the above a may specifically be 2.
可选地,上述对数加法器还可以用于根据所述第一加法器和所述第二加法器输入的m和n得到a n-m,并将m与-a n-m的和近似确定为(log e a)*log a(A*B-C*D)的数值。 Optionally, the logarithmic adder may be further configured to obtain a nm according to m and n input by the first adder and the second adder, and determine a sum of m and -a nm as (log e a )*log a (A*BC*D) value.
上述乘加运算是广义的乘法运算,该乘法运算既可以包括乘积之间的相加运算,也可以包括乘积之间的相减运算。例如,上述乘法运算既可以包括A*B+C*D,也可以包括A*B-C*D。The above multiplication and addition operation is a generalized multiplication operation, which may include an addition operation between products, or may include a subtraction operation between products. For example, the above multiplication operation may include A*B+C*D or A*B-C*D.
结合第一方面,在第一方面的某些实现方式中,所述对数加法器用于根据所述第一加法器和所述第二加法器输入的m和n得到a n-m,并将m与a n-m的和近似确定为(log e a)*log a(A*B+C*D)的数值,包括:确定对所述多个原始数据进行处理时需要达到的目标精度;在所述目标精度低于第一精度的情况下,将m与a n-m的和近似确定为(log e a)*log a(A*B+C*D)的数值。 In conjunction with the first aspect, in some implementations of the first aspect, the logarithmic adder is configured to derive a nm from m and n of the first adder and the second adder input, and to The sum of a nm is approximately determined as a value of (log e a )*log a (A*B+C*D), including: determining a target accuracy to be achieved when processing the plurality of original data; In the case where the accuracy is lower than the first precision, the sum of m and a nm is approximately determined as a value of (log e a )*log a (A*B+C*D).
上述第一精度可以是预先设置的,当目标精度低于第一精度可以认为对原始数据处理时要求的精度较低。The first precision described above may be preset, and when the target accuracy is lower than the first precision, the accuracy required for processing the original data may be considered to be low.
通过比较目标精度与预设的精度的大小关系能够确定对原始数据进行处理时的精度要求,当精度要求较低时可以直接将m+a n-m近似确定为(log e a)*log a(A*B+C*D)的数值,能够根据处理原始数据的精度要求来灵活确定(log e a)*log a(A*B+C*D)的数值,能够保证原始数据的精度要求,并提高运算效率。 By comparing the magnitude relationship between the target accuracy and the preset accuracy, the accuracy requirement for processing the original data can be determined. When the accuracy requirement is low, the m+a nm approximation can be directly determined as (log e a )*log a (A The value of *B+C*D) can flexibly determine the value of (log e a )*log a (A*B+C*D) according to the accuracy requirement of processing the original data, and can ensure the accuracy requirement of the original data, and Improve computing efficiency.
结合第一方面,在第一方面的某些实现方式中,所述对数加法器具体用于:根据误差补偿表确定a n-m的误差补偿值,其中,所述误差补偿表包含K个数值以及所述K个数值的误差补偿值,其中,所述K个数值是将[-1,1]分成K份得到的,所述K个误差补偿值是将所述K个数值代入到误差补偿项
Figure PCTCN2018084275-appb-000001
得到的,K和L均为大于1的整数;将m+a n-m与a n-m的误差补偿值的和近似确定为(log e a)*log a(A*B+C*D)的数值。
In conjunction with the first aspect, in some implementations of the first aspect, the logarithmic adder is specifically configured to: determine an error compensation value of a nm according to an error compensation table, where the error compensation table includes K values and The error compensation value of the K values, wherein the K values are obtained by dividing [-1, 1] into K shares, and the K error compensation values are obtained by substituting the K values into an error compensation term
Figure PCTCN2018084275-appb-000001
The obtained K and L are integers greater than 1; the sum of the error compensation values of m+a nm and a nm is approximately determined as the value of (log e a )*log a (A*B+C*D).
在确定(log e a)*log a(A*B+C*D)的数值时除了m+a n-m之外,还可以将a n-m的误差补偿值考虑进去,能够进一步提高计算精度。 In addition to m+a nm , the error compensation value of a nm can be taken into account in determining the value of (log e a )*log a (A*B+C*D), which can further improve the calculation accuracy.
结合第一方面,在第一方面的某些实现方式中,所述对数加法器将m+a n-m与a n-m的误差补偿值的和近似确定为(log e a)*log a(A*B+C*D)的数值,包括:确定对所述多个原始数据进行处理时需要达到的目标精度;在所述目标精度高于第二精度的情况下,将m+a n-m与a n-m的误差补偿值的和近似确定为(log e a)*log a(A*B+C*D)的数值。 In conjunction with the first aspect, in some implementations of the first aspect, the logarithmic adder approximates the sum of the error compensation values of m+a nm and a nm to (log e a )*log a (A* The value of B+C*D) includes: determining a target accuracy to be achieved when processing the plurality of original data; and if the target accuracy is higher than the second precision, m+a nm and a nm The sum of the error compensation values is approximately determined as the value of (log e a )*log a (A*B+C*D).
当目标精度高于第二精度时,可以认为对原始数据处理时要求的精度较高,此时在确定(log e a)*log a(A*B+C*D)的数值时可以将a n-m的误差补偿值考虑进去,以确保(log e a)*log a(A*B+C*D)的数值的精度。另外,上述第二精度可以与第一精度相同,第二精度也可以大于第一精度。 When the target precision is higher than the second precision, it can be considered that the precision required for the processing of the original data is high, and at this time, a value of (log e a )*log a (A*B+C*D) can be determined. The error compensation value of nm is taken into account to ensure the accuracy of the value of (log e a )*log a (A*B+C*D). In addition, the second precision may be the same as the first precision, and the second precision may be greater than the first precision.
结合第一方面,在第一方面的某些实现方式中,所述K是根据所述目标精度确定的。In conjunction with the first aspect, in some implementations of the first aspect, the K is determined based on the target accuracy.
当目标精度较高时,K可以是一个较大的数值,而当目标精度较低时,K可以是一个较小的数值。When the target precision is high, K can be a large value, and when the target precision is low, K can be a small value.
K的数值越大,将[-1,1]划分的越细,这样在查询误差补偿表确定a n-m的误差补偿值是能够取得更精确的结果。 The larger the value of K, the finer the [-1,1] is divided, so that the error compensation value of a nm can be obtained in the query error compensation table to obtain more accurate results.
结合第一方面,在第一方面的某些实现方式中,所述L是根据所述目标精度确定的。In conjunction with the first aspect, in some implementations of the first aspect, the L is determined based on the target accuracy.
当L的数值越大时,误差补偿项的项数越多,根据该误差补偿项得到的误差补偿值就越准确,因此,当目标精度较高时,L可以是一个较大的数值,而当目标精度较低时,L可以是一个较小的数值。When the value of L is larger, the more the number of items of the error compensation term, the more accurate the error compensation value obtained according to the error compensation term. Therefore, when the target precision is high, L can be a large value, and When the target accuracy is low, L can be a smaller value.
结合第一方面,在第一方面的某些实现方式中,所述对数加法器具体包括:移位电路,用于根据n-m对a进行移位运算,得到a n-m;子加法电路,用于对m和a n-m进行加法运算,得到m+a n-mIn conjunction with the first aspect, in some implementations of the first aspect, the logarithmic adder specifically includes: a shifting circuit for performing a shift operation on a according to nm to obtain a nm ; a sub-addition circuit for Adding m and a nm gives m+a nm .
结合第一方面,在第一方面的某些实现方式中,所述对数加法器还包括:减法电路,用于对m和n进行减法运算,得到m-n或者n-m;比较电路,用于比较m-n或者n-m与零的大小关系;选择电路,用于在m-n大于等于零的情况下,选择出m和n-m,或者,用于在n-m小于等于零的情况下,选择出m和n-m。In conjunction with the first aspect, in some implementations of the first aspect, the logarithmic adder further includes: a subtraction circuit for subtracting m and n to obtain mn or nm; and a comparison circuit for comparing mn Or a relationship between nm and zero; a selection circuit for selecting m and nm in the case where mn is greater than or equal to zero, or for selecting m and nm in the case where nm is less than or equal to zero.
结合第一方面,在第一方面的某些实现方式中,所述装置还包括:转换器,用于根据(log e a)*log a(A*B+C*D)近似得到A*B+C*D的值,其中,所述转换器由硬件电路实现。 In conjunction with the first aspect, in some implementations of the first aspect, the apparatus further comprises: a converter for approximating A*B according to (log e a )*log a (A*B+C*D) approximation A value of +C*D, wherein the converter is implemented by a hardware circuit.
结合第一方面,在第一方面的某些实现方式中,所述装置还包括:量化器,用于对所述A*B+C*D的值进行量化,以达到预设的数据位宽。In conjunction with the first aspect, in some implementations of the first aspect, the apparatus further includes: a quantizer for quantizing the value of the A*B+C*D to achieve a preset data bit width .
第二方面,提供了一种处理乘加运算的方法,该方法包括:对输入的第一数据和第二数据进行加法运算,得到第一中间数据,其中,所述第一数据和所述第二数据的数值分别为log aA和log aB,所述第一中间数据的数值为m,所述第一数据和第二数据是对多个原始数据中的第一原始数据A和第二原始数据B分别取对数后得到的;对输入的第三数据和第四数据进行加法运算,得到第二中间数据,其中,所述第三数据和所述第四数据的数值分别为log aC和log aD,所述第二中间数据的数值为n,所述第三数据和第四数据是对所述多个原始数据中的第三原始数据C和第四原始数据D分别取对数后得到的,其中,a为大于0且不等于1的整数,m和n为实数,且m大于等于n;根据所述第一加法器和所述第二加法器输入的m和n得到a n-m,并将m与a n-m的和近似确定为(log e a)*log a(A*B+C*D) 的数值。 In a second aspect, a method for processing a multiply-add operation is provided, the method comprising: adding an input first data and a second data to obtain first intermediate data, wherein the first data and the first The values of the two data are log a A and log a B, respectively, the value of the first intermediate data is m, and the first data and the second data are the first original data A and the second of the plurality of original data. The raw data B is obtained by taking the logarithm respectively; adding the third data and the fourth data to obtain the second intermediate data, wherein the values of the third data and the fourth data are respectively log a C and log a D, the value of the second intermediate data is n, and the third data and the fourth data are respectively paired with the third original data C and the fourth original data D of the plurality of original data Obtained after the number, where a is an integer greater than 0 and not equal to 1, m and n are real numbers, and m is greater than or equal to n; according to m and n input by the first adder and the second adder a nm and approximate the sum of m and a nm as (log e a )*log a (A*B+C*D) The value.
结合第二方面,在第二方面的某些实现方式中,所述根据所述第一加法器和所述第二加法器输入的m和n得到a n-m,并将m与a n-m的和近似确定为(log e a)*log a(A*B+C*D)的数值包括:确定对所述多个原始数据进行处理时需要达到的目标精度;在所述目标精度低于第一精度的情况下,将m与a n-m的和近似确定为(log e a)*log a(A*B+C*D)的数值。 In conjunction with the second aspect, in some implementations of the second aspect, the m and n inputs according to the first adder and the second adder obtain a nm and approximate the sum of m and a nm Determining the value of (log e a )*log a (A*B+C*D) includes: determining a target accuracy that needs to be achieved when processing the plurality of original data; and the target accuracy is lower than the first precision In the case, the sum of m and a nm is approximately determined as the value of (log e a )*log a (A*B+C*D).
结合第二方面,在第二方面的某些实现方式中,所述方法还包括:根据误差补偿表确定a n-m的误差补偿值,其中,所述误差补偿表包含K个数值以及所述K个数值的误差补偿值,其中,所述K个数值是将[-1,1]分成K份得到的,所述K个误差补偿值是将所述K个数值代入到误差补偿项
Figure PCTCN2018084275-appb-000002
得到的,K和L均为大于1的整数;将m+a n-m与a n-m的误差补偿值的和近似确定为(log e a)*log a(A*B+C*D)的数值。
With reference to the second aspect, in some implementations of the second aspect, the method further comprises: determining an error compensation value of a nm according to the error compensation table, wherein the error compensation table includes K values and the K Numerical error compensation value, wherein the K values are obtained by dividing [-1, 1] into K shares, and the K error compensation values are obtained by substituting the K values into an error compensation term
Figure PCTCN2018084275-appb-000002
The obtained K and L are integers greater than 1; the sum of the error compensation values of m+a nm and a nm is approximately determined as the value of (log e a )*log a (A*B+C*D).
结合第二方面,在第二方面的某些实现方式中,所述将m+a n-m与a n-m的误差补偿值的和近似确定为(log e a)*log a(A*B+C*D)的数值,包括:确定对所述多个原始数据进行处理时需要达到的目标精度;在所述目标精度高于第二精度的情况下,将m+a n-m与a n-m的误差补偿值的和近似确定为(log e a)*log a(A*B+C*D)的数值。 In conjunction with the second aspect, in some implementations of the second aspect, the sum of the error compensation values of m+a nm and a nm is approximately (log e a )*log a (A*B+C* The value of D) includes: determining a target accuracy that needs to be achieved when processing the plurality of original data; and correcting an error of m+a nm and a nm when the target accuracy is higher than the second precision The sum is approximately determined as the value of (log e a )*log a (A*B+C*D).
结合第二方面,在第二方面的某些实现方式中,所述K是根据所述目标精度确定的。In conjunction with the second aspect, in some implementations of the second aspect, the K is determined based on the target accuracy.
结合第二方面,在第二方面的某些实现方式中,所述根据所述第一加法器和所述第二加法器输入的m和n得到a n-m,并将m与a n-m的和近似确定为(log e a)*log a(A*B+C*D)的数值,包括:根据n-m对a进行移位运算,得到a n-m;对m和a n-m进行加法运算,得到m+a n-mIn conjunction with the second aspect, in some implementations of the second aspect, the m and n inputs according to the first adder and the second adder obtain a nm and approximate the sum of m and a nm The value determined as (log e a )*log a (A*B+C*D) includes: shifting a according to nm to obtain a nm ; adding m and a nm to obtain m+a Nm .
结合第二方面,在第二方面的某些实现方式中,所述根据所述第一加法器和所述第二加法器输入的m和n得到a n-m,并将m与a n-m的和近似确定为(log e a)*log a(A*B+C*D)的数值,包括:对m和n进行减法运算,得到m-n或者n-m;比较m-n或者n-m与零的大小关系;在m-n大于等于零的情况下,选择出m和n-m,或者,用于在n-m小于等于零的情况下,选择出m和n-m。 In conjunction with the second aspect, in some implementations of the second aspect, the m and n inputs according to the first adder and the second adder obtain a nm and approximate the sum of m and a nm The value determined as (log e a )*log a (A*B+C*D) includes: subtracting m and n to obtain mn or nm; comparing the magnitude relationship of mn or nm with zero; In the case of being equal to zero, m and nm are selected, or, in the case where nm is less than or equal to zero, m and nm are selected.
结合第二方面,在第二方面的某些实现方式中,所述方法还包括:根据(log e a)*log a(A*B+C*D)近似得到A*B+C*D的值,其中,所述转换器由硬件电路实现。 In conjunction with the second aspect, in some implementations of the second aspect, the method further comprises: approximating A*B+C*D according to (log e a )*log a (A*B+C*D) Value, wherein the converter is implemented by a hardware circuit.
结合第二方面,在第二方面的某些实现方式中,所述方法还包括:对所述A*B+C*D的值进行量化,以达到预设的数据位宽。In conjunction with the second aspect, in some implementations of the second aspect, the method further comprises: quantizing the value of the A*B+C*D to achieve a preset data bit width.
附图说明DRAWINGS
图1是现有技术中处理乘加运算的方法的示意性流程图;1 is a schematic flow chart of a method for processing a multiply-and-accumulate operation in the prior art;
图2是本申请实施例处理乘加运算的装置的示意性框图;2 is a schematic block diagram of an apparatus for processing a multiply-and-accumulate operation according to an embodiment of the present application;
图3是本申请实施例处理乘加运算的装置的示意性框图;3 is a schematic block diagram of an apparatus for processing a multiply-and-accumulate operation in an embodiment of the present application;
图4是本申请实施例处理乘加运算的装置的示意性框图;4 is a schematic block diagram of an apparatus for processing a multiply-and-accumulate operation according to an embodiment of the present application;
图5是本申请实施例处理乘加运算的方法的示意性流程图;FIG. 5 is a schematic flowchart of a method for processing a multiply-and-accumulate operation according to an embodiment of the present application; FIG.
图6是本申请实施例处理乘加运算的方法的示意性流程图。FIG. 6 is a schematic flowchart of a method for processing a multiply and add operation in an embodiment of the present application.
具体实施方式detailed description
下面将结合附图,对本申请中的技术方案进行描述。为了更好地本申请实施例的处理数据的装置,下面先结合图1对现有技术中处理乘加运算的方法进行简单的介绍。The technical solutions in the present application will be described below with reference to the accompanying drawings. In order to better implement the apparatus for processing data in the embodiment of the present application, a method for processing the multiply-add operation in the prior art will be briefly described below with reference to FIG.
图1示出了现有技术中处理乘加运算的方法的示意性流程图。FIG. 1 shows a schematic flow chart of a method of processing a multiply-and-accumulate operation in the prior art.
在图1中,四个乘法器(第一乘法器、第二乘法器、第三乘法器以及第四乘法器)分别对四对数据进行乘法运算,得到4个32位的数据,接下来,第一加法器和第二加法器分别对四个乘法器输出的4个32位的数据进行加法运算,得到两个32位的数据,接下来第三加法器再对第一加法器和第二加法器输出的两个32位的数据进行加法运算,得到一个32位的数据,最后再对加法运算得到的一个32位的数据进行量化处理,得到16位的数据。In FIG. 1, four multipliers (a first multiplier, a second multiplier, a third multiplier, and a fourth multiplier) respectively multiply four pairs of data to obtain four 32-bit data, and then, The first adder and the second adder respectively add four 32-bit data outputted by the four multipliers to obtain two 32-bit data, and then the third adder and the second adder and the second adder The two 32-bit data output by the adder is added to obtain a 32-bit data, and finally a 32-bit data obtained by the addition is quantized to obtain 16-bit data.
由于乘法器的能耗和芯片面积均远大于加法器,因此,在计算机内部如果乘法器过多则会导致能耗较高,计算效率也比较低。为了解决该问题,现有技术提出了一种处理乘加运算的方案。该方案将线性域中的数据转化为对数域中的数据,从而将线性域中的乘法运算转化为对数域中的加法运算。Since the energy consumption and chip area of the multiplier are much larger than the adder, if there are too many multipliers inside the computer, the energy consumption is high and the calculation efficiency is relatively low. In order to solve this problem, the prior art proposes a scheme for processing the multiply-and-accumulate operation. This scheme converts data in a linear domain into data in a logarithmic domain, thereby transforming multiplication operations in the linear domain into addition operations in the logarithmic domain.
下面以线性域中的数据A、B、C、D为例,对A*B+C*D的计算过程进行详细的介绍:The following takes the data A, B, C, and D in the linear domain as an example to describe the calculation process of A*B+C*D in detail:
首先,将线性域中的A、B、C和D转化为对数域中的数据,得到:First, convert A, B, C, and D in the linear domain into data in the logarithmic domain to get:
x=log 2A,y=log 2B,z=log 2C,w=log 2D,其中,A=2 x,B=2 y,C=2 z,D=2 w x=log 2 A, y=log 2 B, z=log 2 C, w=log 2 D, where A=2 x , B=2 y , C=2 z , D=2 w
其次,将线性域中的乘法运算转化为对数域中的加法运算,得到:Second, converting the multiplication operations in the linear domain into additions in the logarithmic domain yields:
A*B+C*D=2 x+y+2 z+w A*B+C*D=2 x+y +2 z+w
因此,A与B的乘法运算就转化成了x与y的加法运算,C与D的乘法运算就转化成了z与w的加法运算。最后再通过x+y以及z+w分别计算出2 x+y+2 z+w,然后将2 x+y与2 z+w相加就可以得到A×B+C×D的运算结果。 Therefore, the multiplication of A and B is converted into the addition of x and y, and the multiplication of C and D is converted into the addition of z and w. Finally, 2 x+y +2 z+w is calculated by x+y and z+w respectively, and then 2 x+y and 2 z+w are added to obtain the operation result of A×B+C×D.
虽然该方案将线性域中的乘法运算转化成了对数域中的加法运算,避免了进行乘法运算,但是在完成了对数域中的加法运算之后,还要将对数域中的数据(x、y、z、w)转化为线性域中的数据(2 x+y、2 z+w)然后再相加,由于对数域中的数据占用的位宽较少(例如,x、y、z、w占用的数据位宽为5位),线性域中的数据占用的位宽较多(例如,2 x+y、2 z+w占用的数据位宽为32位),因此,在将对数域中的数据转化为线性域中数据之后仍需要采用高位宽的加法器来执行加法运算,导致计算机在执行加法运算时仍需要占用的较多的资源。 Although this scheme converts multiplication operations in the linear domain into addition operations in the logarithmic domain, multiplication is avoided, but after the addition in the logarithmic domain is completed, the data in the logarithmic domain is also x, y, z, w) are transformed into data in the linear domain (2 x+y , 2 z+w ) and then added, since the data in the log domain occupies less bit width (for example, x, y The data width occupied by z and w is 5 bits. The data in the linear domain occupies more bits (for example, 2 x+y and 2 z+w occupy 32 bits of data width), therefore, After converting the data in the log domain into data in the linear domain, it is still necessary to use a high bit width adder to perform the addition, resulting in more resources that the computer still needs to occupy when performing the addition operation.
因此,本申请实施例提出了一种处理乘加运算的装置,该装置能够将较高位宽的指数形式的数据之间的加法运算转化为较低位宽的数据之间的加法运算,能够在计算过程减少对资源的占用,从而降低计算功耗。Therefore, the embodiment of the present application proposes a device for processing a multiply-and-accumulate operation, which is capable of converting an addition operation between data of an exponential form of a higher bit width into an addition operation of data of a lower bit width, and is capable of The computational process reduces the use of resources, thereby reducing computational power consumption.
图2是本申请实施例的处理数据的装置的示意性框图。图2的装置200包括:FIG. 2 is a schematic block diagram of an apparatus for processing data according to an embodiment of the present application. The apparatus 200 of Figure 2 includes:
第一加法器210,用于对输入的第一数据和第二数据进行加法运算,得到第一中间数据,其中,第一数据和第二数据的数值分别为log aA和log aB,第一中间数据的数值为m,第一数据和第二数据是对多个原始数据中的第一原始数据A和第二原始数据B分别取对数后得到的; The first adder 210 is configured to add the input first data and the second data to obtain the first intermediate data, wherein the values of the first data and the second data are log a A and log a B, respectively The value of an intermediate data is m, and the first data and the second data are obtained by taking a logarithm of the first original data A and the second original data B of the plurality of original data respectively;
第二加法器220,用于对输入的第三数据和第四数据进行加法运算,得到第二中间数据,其中,第三数据和第四数据的数值分别为log aC和log aD,第二中间数据的数值为n,第三数据和第四数据是对多个原始数据中的第三原始数据C和第四原始数据D分别取对数后得到的,其中,a为大于0且不等于1的整数,m和n为实数,且m大于等于n。 a second adder 220, configured to add the input third data and the fourth data to obtain second intermediate data, wherein the values of the third data and the fourth data are log a C and log a D, respectively The value of the second intermediate data is n, and the third data and the fourth data are obtained by taking the logarithm of the third original data C and the fourth original data D of the plurality of original data respectively, wherein a is greater than 0 and not An integer equal to 1, m and n are real numbers, and m is greater than or equal to n.
上述原始数据可以是处理图像时的RGB像素数据。The above raw data may be RGB pixel data when the image is processed.
上述a的取值可以为2。The value of a above may be 2.
在对上述多个原始数据进行处理,得到多个中间数据时,可以先将原始数据之间的乘积运算转化为对数域中的加法运算,然后得到指数形式的多个中间数据。When processing the plurality of original data to obtain a plurality of intermediate data, the product operation between the original data may be first converted into an addition operation in the logarithmic domain, and then a plurality of intermediate data in an exponential form are obtained.
对数加法器230,对数加法器230的输入端口与第一加法器210以及第二加法器220的输出端口相连,对数加法器230用于根据第一加法器210和第二加法器220输入的m和n得到a n-m,并将m与a n-m的和近似确定为(log e a)*log a(A*B+C*D)的数值。 The logarithmic adder 230, the input port of the logarithmic adder 230 is connected to the output ports of the first adder 210 and the second adder 220, and the logarithmic adder 230 is used according to the first adder 210 and the second adder 220. The input m and n are a nm , and the sum of m and a nm is approximately determined as the value of (log e a )*log a (A*B+C*D).
上述第一加法器210、第二加法器220以及对数加法器230可以由硬件电路实现。具体地,上述第一加法器210、第二加法器220以及对数加法器230可以是基于专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field Programmable Gate Array,FPGA)等各种硬件电路实现的。The first adder 210, the second adder 220, and the logarithmic adder 230 described above may be implemented by hardware circuits. Specifically, the first adder 210, the second adder 220, and the logarithmic adder 230 may be based on an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), or the like. A variety of hardware circuits are implemented.
本申请中,通过将指数形式的数据之间的加法运算转化为具有较低位宽的数值的加法运算,实现了将高位宽的数据运算转化到低位宽的数据运算,能够在计算过程中减少对资源的占用,从而降低计算功耗。In the present application, by converting the addition operation between the data in the exponential form into the addition operation with the value of the lower bit width, the data operation for converting the high bit width data operation to the low bit width is realized, which can be reduced in the calculation process. The use of resources, thereby reducing computing power consumption.
具体地,由于m与a n-m占用的数据位宽小于a m和a n占用的数据位宽,因此,本申请实施例中,通过将a m和a n的高位宽的加法运算转换成了m与a n-m之间的低位宽的加法运算,减少了计算过程中对系统资源的占用,能够提高计算效率。 Specifically, since the data bit width occupied by m and a nm is smaller than the data bit width occupied by a m and a n , in the embodiment of the present application, the addition of the high bit width of a m and a n is converted into m. The addition of a low bit width to a nm reduces the occupation of system resources during the calculation process and improves computational efficiency.
可选地,上述对数加法器230既可以将m与a n-m的和近似确定为(log e a)*log a(A*B+C*D)的数值,也可以将m与-a n-m的和近似确定为(log e a)*log a(A*B-C*D)的数值。 Alternatively, the logarithmic adder 230 may determine the sum of m and a nm to be approximately (log e a )*log a (A*B+C*D), or m and -a nm. The sum is approximately determined as the value of (log e a )*log a (A*BC*D).
上述乘加运算是广义的乘加运算,既可以包括乘积之间的相加运算,也可以包括乘积之间的相减运算。例如,该乘加运算既可以包括A*B+C*D,也可以包括A*B-C*D。The above multiplication and addition operation is a generalized multiplication and addition operation, and may include an addition operation between products, or may include a subtraction operation between products. For example, the multiply-accumulate operation may include A*B+C*D or A*B-C*D.
可选地,上述对数加法器230在根据第一加法器210和第二加法器220输入的m和n得到a n-m,并将m与a n-m的和近似确定为(log e a)*log a(A*B+C*D)的数值,具体包括:确定对多个原始数据进行处理时需要达到的目标精度;在目标精度低于第一精度的情况下,将m与a n-m的和近似确定为(log e a)*log a(A*B+C*D)的数值。 Alternatively, the logarithmic adder 230 obtains a nm at m and n input according to the first adder 210 and the second adder 220, and approximates the sum of m and a nm to (log e a )*log The value of a (A*B+C*D) specifically includes: determining the target accuracy to be achieved when processing a plurality of original data; and the sum of m and a nm when the target accuracy is lower than the first precision Approximately determined as the value of (log e a )*log a (A*B+C*D).
上述第一精度可以是预先设置的,当目标精度低于第一精度可以认为对原始数据处理时要求的精度较低。通过比较目标精度与预设的精度的大小关系能够确定对原始数据进行处理时的精度要求,当精度要求较低时可以直接将m+a n-m近似确定为(log e a)*log a(A*B+C*D)的数值。因此,本申请能够根据处理原始数据的精度要求来灵活确定(log e a)*log a(A*B+C*D)的数值,能够保证原始数据的精度要求,并提高运算效率。 The first precision described above may be preset, and when the target accuracy is lower than the first precision, the accuracy required for processing the original data may be considered to be low. By comparing the magnitude relationship between the target accuracy and the preset accuracy, the accuracy requirement for processing the original data can be determined. When the accuracy requirement is low, the m+a nm approximation can be directly determined as (log e a )*log a (A *B+C*D) value. Therefore, the present application can flexibly determine the value of (log e a )*log a (A*B+C*D) according to the precision requirement of processing the original data, can ensure the accuracy requirement of the original data, and improve the operation efficiency.
在一些实施例中,上述对数加法器230具体用于:根据误差补偿表确定a n-m的误差补偿值,其中,误差补偿表包含K个数值以及K个数值的误差补偿值,其中,K个数值是将[-1,1]分成K份得到的,K个误差补偿值是将K个数值代入到误差补偿项
Figure PCTCN2018084275-appb-000003
得到的,K和L均为大于1的整数;将m+a n-m与a n-m的误差补偿值的和近似确定为(log e a)*log a(A*B+C*D)的数值。
In some embodiments, the logarithmic adder 230 is specifically configured to: determine an error compensation value of a nm according to the error compensation table, where the error compensation table includes K values and error compensation values of K values, wherein K The value is obtained by dividing [-1,1] into K parts, and K error compensation values are substituted for K values into the error compensation term.
Figure PCTCN2018084275-appb-000003
The obtained K and L are integers greater than 1; the sum of the error compensation values of m+a nm and a nm is approximately determined as the value of (log e a )*log a (A*B+C*D).
在确定(log e a)*log a(A*B+C*D)的数值时除了m+a n-m之外,还可以将a n-m的误差补偿值考虑进去,能够进一步提高计算精度。 In addition to m+a nm , the error compensation value of a nm can be taken into account in determining the value of (log e a )*log a (A*B+C*D), which can further improve the calculation accuracy.
可选地,在根据m+a n-m以及a n-m的误差补偿值确定(log e a)*log a(A*B+C*D)时,K个数值可以是将[0,1]划分成K份得到的。 Alternatively, when determining (log e a )*log a (A*B+C*D) based on the error compensation values of m+a nm and a nm , the K values may be divided into [0, 1] K shares were obtained.
而在根据m-a n-m以及-a n-m的误差补偿值确定(log e a)*log a(A*B-C*D)时,K个数值可以 是将[-1,0]划分成K份得到的。 When the log compensation value according to ma nm and -a nm is determined (log e a )*log a (A*BC*D), the K values may be obtained by dividing [-1, 0] into K shares.
应理解,在将[-1,1]、[0,1]或者[-1,0]分成K份时可以是将这些区间进行平均划分而得到K个数值。It should be understood that when dividing [-1, 1], [0, 1] or [-1, 0] into K shares, it is possible to divide the intervals equally to obtain K values.
应理解,根据误差补偿表确定a n-m的误差补偿值可以是通过查询误差补偿表确定a n-m的误差补偿值。具体可以是在误差补偿表中先查询K个数值中与a n-m最接近的一个数值,然后将数值的误差补偿值确定为a n-m的误差补偿值。 It should be understood that determining the error compensation value of a nm according to the error compensation table may be determining the error compensation value of a nm by querying the error compensation table. Specifically, the error compensation table may first query a value closest to a nm among the K values, and then determine the error compensation value of the value as the error compensation value of a nm .
可选地,对数加法器230将m+a n-m与a n-m的误差补偿值的和近似确定为(log e a)*log a(A*B+C*D)的数值,具体包括:确定对多个原始数据进行处理时需要达到的目标精度;在目标精度高于第二精度的情况下,将m+a n-m与a n-m的误差补偿值的和近似确定为(log e a)*log a(A*B+C*D)的数值。 Optionally, the logarithmic adder 230 determines the sum of the error compensation values of m+a nm and a nm to be a value of (log e a )*log a (A*B+C*D), specifically including: determining The target accuracy to be achieved when processing multiple raw data; if the target accuracy is higher than the second precision, the sum of the error compensation values of m+a nm and a nm is approximately (log e a )*log The value of a (A*B+C*D).
当目标精度高于第二精度时,可以认为对原始数据处理时要求的精度较高,此时在确定(log e a)*log a(A*B+C*D)的数值时可以将a n-m的误差补偿值考虑进去,以确保(log e a)*log a(A*B+C*D)的数值的精度。另外,上述第二精度可以与第一精度相同。 When the target precision is higher than the second precision, it can be considered that the precision required for the processing of the original data is high, and at this time, a value of (log e a )*log a (A*B+C*D) can be determined. The error compensation value of nm is taken into account to ensure the accuracy of the value of (log e a )*log a (A*B+C*D). In addition, the second precision described above may be the same as the first precision.
可选地,作为一个实施例,上述对数加法器230在确定(log e a)*log a(A*B+C*D)的数值时还可以先确定n-m的绝对值与第一阈值的大小关系;如果n-m的绝对值大于或者等于第一阈值,那么对数加法器230可以直接将m近似确定为(log e a)*log a(A*B+C*D)的数值。 Optionally, as an embodiment, the logarithmic adder 230 may determine the absolute value of the nm and the first threshold when determining the value of (log e a )*log a (A*B+C*D). Size relationship; if the absolute value of nm is greater than or equal to the first threshold, logarithmic adder 230 may directly determine m as a value of (log e a )*log a (A*B+C*D).
当n-m的绝对值较大时,a n-m的数值与m相比非常小,因此,在计算时可以将a n-m忽略掉,而直接将m的值近似确定为(log e a)*log a(A*B+C*D)的数值,能够减少计算的复杂度。 When the absolute value of nm is large, the value of a nm is very small compared with m. Therefore, a nm can be ignored in the calculation, and the value of m is directly determined to be (log e a )*log a ( The value of A*B+C*D) can reduce the computational complexity.
例如,n=2,m=10,第一阈值为5,n-m的绝对值大于第一阈值,a -8的数值与10相比非常小,可以将a -8的数值忽略,直接将10确定为(log e a)*log a(A*B+C*D)的数值。 For example, n = 2, m = 10 , the first threshold value is 5, the absolute value is greater than a first threshold nm, -8 A value much smaller than 10, the value may be ignored A -8, 10 directly determined Is the value of (log e a )*log a (A*B+C*D).
当n-m的绝对值小于第一阈值的情况下,对数加法器230仍将m与a n-m的和近似确定为(log e a)*log a(A*B+C*D)的数值。 When the absolute value of nm is less than the first threshold, the logarithmic adder 230 still determines the sum of m and a nm approximately as a value of (log e a )*log a (A*B+C*D).
在一些实施例中,K是根据目标精度确定的。具体地,当目标精度较高时,K可以是一个较大的数值,而当目标精度较低时,K可以是一个较小的数值。In some embodiments, K is determined based on target accuracy. Specifically, K may be a larger value when the target precision is higher, and K may be a smaller value when the target precision is lower.
具体地,K的数值越大,将[-1,1]划分的越细,误差补偿表中包含的数据越多,这时根据误差补偿表查询a n-m的误差补偿值能够得到更精确的结果。 Specifically, the larger the value of K is, the finer the [-1,1] is divided, and the more data is included in the error compensation table. At this time, the error compensation value of a nm can be obtained according to the error compensation table to obtain more accurate results. .
在一些实施例中,L是根据目标精度确定的。In some embodiments, L is determined based on target accuracy.
具体地,当L的数值越大时,误差补偿项的项数越多,根据该误差补偿项得到的误差补偿值就越精确,而当L的数值越小时,误差补偿项的项数越小,根据该误差补偿项得到的误差补偿值就越不准确,因此,当目标精度较高时,L可以是一个较大的数值,而当目标精度较低时,L可以是一个较小的数值。Specifically, when the value of L is larger, the more the number of items of the error compensation term, the more accurate the error compensation value obtained according to the error compensation term, and the smaller the value of L is, the smaller the number of items of the error compensation term is. The less accurate the error compensation value obtained according to the error compensation term, therefore, L can be a larger value when the target precision is higher, and L can be a smaller value when the target precision is lower. .
在本申请中,通过灵活设置K、L的数值,能够灵活地调整对原始数据处理的精度。In the present application, the flexibility of the original data processing can be flexibly adjusted by flexibly setting the values of K and L.
可选地,在一些实施例中,上述对数加法器230具体包括:Optionally, in some embodiments, the logarithmic adder 230 specifically includes:
移位电路2301,用于根据n-m对a进行移位运算,得到a n-mThe shift circuit 2301 is configured to perform a shift operation on a according to nm to obtain a nm ;
子加法电路2302,用于对m和a n-m进行加法运算,得到m+a n-mSub-addition circuit 2302 is used to add m and a nm to obtain m+a nm .
可选地,在一些实施例中,上述对数加法器230还包括:Optionally, in some embodiments, the logarithmic adder 230 further includes:
减法电路2303,用于对m和n进行减法运算,得到m-n或者n-m;The subtraction circuit 2303 is configured to perform subtraction on m and n to obtain m-n or n-m;
比较电路2304,用于比较m-n或者n-m与零的大小关系;Comparation circuit 2304, for comparing the magnitude relationship of m-n or n-m with zero;
选择电路2305,用于在m-n大于等于零的情况下,选择出m和n-m,或者,用于在 n-m小于等于零的情况下,选择出m和n-m。The selection circuit 2305 is configured to select m and n-m if m-n is greater than or equal to zero, or to select m and n-m if n-m is less than or equal to zero.
应理解,移位电路2301在根据n-m对a进行移位运算之前,可以先从选择电路2305中获取n-m,子加法电路2302在对m和a n-m进行加法运算之前,可以先从选择电路2305中获取m。 It should be understood that the shift circuit 2301 may first acquire nm from the selection circuit 2305 before performing a shift operation on a according to nm, and the sub-addition circuit 2302 may first select from the selection circuit 2305 before adding m and a nm . Get m.
另外,减法电路2303在对m和n进行减法运算时,可以以其中任意一个作为被减数,另一个作为减数,从而得到m-n或者n-m。Further, when the subtraction circuit 2303 performs subtraction on m and n, either one of them may be subtracted and the other may be subtracted, thereby obtaining m-n or n-m.
可选地,作为一个实施例,上述装置200还包括:转换器240,用于根据(log e a)*log a(A*B+C*D)近似得到A*B+C*D的值。 Optionally, as an embodiment, the foregoing apparatus 200 further includes: a converter 240, configured to approximate the value of A*B+C*D according to (log e a )*log a (A*B+C*D) .
可选地,作为一个实施例,上述装置200还包括:量化器250,用于对A*B+C*D的值进行量化,以达到预设的数据位宽。Optionally, as an embodiment, the apparatus 200 further includes: a quantizer 250, configured to quantize the value of the A*B+C*D to reach a preset data bit width.
上述转换器240和量化器250均可由硬件电路实现,具体地,转换器240和量化器250可以是基于ASIC、FPGA等硬件电路实现。The converter 240 and the quantizer 250 can be implemented by hardware circuits. Specifically, the converter 240 and the quantizer 250 can be implemented based on hardware circuits such as an ASIC and an FPGA.
其中,量化是指将不同位宽的数据进行匹配,例如,第一步计算出来得到的数据的位宽为8位,而第二步运算需要的位宽为5位,那么这时需要将8位的数据截断成5位的数据,以满足第二步中的计算对位宽的要求,具体实现可以是将8位数据中大于5位数据的最大值都调整为5位最大值,将小于5位最小值都调整为5位最小值,其他值不变。Wherein, quantification refers to matching data of different bit widths. For example, the bit width of the data obtained in the first step is 8 bits, and the bit width required for the second step operation is 5 bits, then 8 The bit data is truncated into 5 bits of data to meet the calculation of the bit width requirement in the second step. The specific implementation may be to adjust the maximum value of more than 5 bits of the 8-bit data to the 5-bit maximum value, which will be less than The 5-bit minimum is adjusted to the 5-bit minimum, and the other values are unchanged.
图3是本申请实施例的处理乘加运算的对数加法器300的示意性框图。FIG. 3 is a schematic block diagram of a logarithmic adder 300 for processing a multiply-and-accumulate operation in an embodiment of the present application.
对数加法器300具体包括:减法电路310、比较电路320、选择电路330、移位电路340、误差补偿电路350以及加法电路360。The logarithmic adder 300 specifically includes a subtraction circuit 310, a comparison circuit 320, a selection circuit 330, a shift circuit 340, an error compensation circuit 350, and an addition circuit 360.
假设存在原始数据A、B、C、D,需要计算A*B+C*D的数值,那么先对原始数据取对数,得到x=log aA,y=log aB,z=log aC和w=log aD,并且令x+y=m,z+w=n,那么,A*B+C*D=a m+a n,也就是说可以通过计算a m+a n的数值得到A*B+C*D的数值。 Assuming that there are raw data A, B, C, D, you need to calculate the value of A*B+C*D, then first log the original data, get x=log a A, y=log a B,z=log a C and w=log a D, and let x+y=m,z+w=n, then A*B+C*D=a m +a n , that is, by calculating a m + a n The value gives the value of A*B+C*D.
下面对对数加法器300确定a m+a n的数值过程进行详细的介绍。n和m是输入的5bit的数据(假设m>n),sign表示n和m的符号位是否相同,例如,当sign为1时表示a m和a n同号,当sign为0时表示a m和a n异号(这里对sign为1的情况进行说明),装置300计算a m+a n的具体步骤如下: The process of determining the numerical value of a m + a n by the logarithmic adder 300 will be described in detail below. n and m are the input 5 bits of data (assuming m>n), and sign indicates whether the sign bits of n and m are the same. For example, when sign is 1, it means that a m and a n have the same number, and when sign is 0, it means a. The m and a n different numbers (herein the case where sign is 1), the specific steps of the device 300 for calculating a m + a n are as follows:
401、减法电路310对n和m做差,得到n-m或者m-n;401, the subtraction circuit 310 makes a difference between n and m, and obtains n-m or m-n;
402、比较电路320获取减法电路310运算得到的结果n-m或者m-n,并将n-m或者m-n与零进行大小比较;402, the comparison circuit 320 obtains the result n-m or m-n calculated by the subtraction circuit 310, and compares the size of n-m or m-n with zero;
403、选择电路330根据n-m或者m-n与零的大小关系从n和m中选出较大的数m以及n-m;403, the selection circuit 330 selects a larger number m and n-m from n and m according to the magnitude relationship of n-m or m-n and zero;
404、移位电路340根据n-m对a执行移位运算,得到a n-m404, the shift circuit 340 performs a shift operation on a according to nm to obtain a nm ;
405、误差补偿电路350计算a n-m的误差补偿值; 405. The error compensation circuit 350 calculates an error compensation value of a nm .
误差补偿电路350具体可以是一个多选一的选择器组合电路,误差补偿电路350还可以称为误差补偿表,即图中的虚线部分。The error compensation circuit 350 may specifically be a multiple-selector combination combination circuit. The error compensation circuit 350 may also be referred to as an error compensation table, that is, a dotted line portion in the figure.
下面对误差补偿表的生成过程进行详细的介绍。The following describes the generation process of the error compensation table in detail.
根据泰勒公式对ln(1+x)展开,得到:According to the Taylor formula, ln(1+x) is expanded to get:
Figure PCTCN2018084275-appb-000004
Figure PCTCN2018084275-appb-000004
当x∈[-1,1]时,该级数收敛。因此可以上式(1)写成:When x ∈ [-1, 1], the series converges. Therefore, it can be written as: (1):
ln(1+x)=x+error(x)               (2)Ln(1+x)=x+error(x) (2)
在(2)式中,error(x)表示展开式中二次项以及高次项的和,只要保留足够高的多次项,就可以保证足够高的精度。In the formula (2), error(x) represents the sum of the quadratic term and the high-order term in the expansion, and as long as a sufficiently high number of items are retained, a sufficiently high precision can be ensured.
由于log a(x)与ln(x)之间相差一个常数,即log a(x)=C*ln(x),因此,log a(x)也可以按照展开成公式(2)的形式,其中,C=log a eSince log a (x) and ln(x) differ by a constant, that is, log a (x)=C*ln(x), log a (x) can also be expanded into the form of formula (2). Where C=log a e .
当x>y时,When x>y,
log a(a x+a y)=x+log a(1+a y-x) Log a (a x +a y )=x+log a (1+a yx )
=C[x+a y-x+error(x)]                       (3) =C[x+a yx +error(x)] (3)
同理,可以得到,The same reason, you can get,
log a(a x-a y)=C[x-a y-x+error(x)]                    (4) Log a (a x -a y )=C[xa yx +error(x)] (4)
根据上述公式(3)可得:According to the above formula (3):
log a(a m+a n)=C[m+a n-m+error(a n-m)]                   (5) Log a (a m + a n )=C[m+a nm +error(a nm )] (5)
(log e a)*log a(a m+a n)=C[m+a n-m+error(a n-m)]                   (6) (log e a )*log a (a m +a n )=C[m+a nm +error(a nm )] (6)
将error(a n-m)按照泰勒级数展开,根据精度要求,保留三级、四级或者更多级的高次项,对x属于[-1,1]的取值范围平均分成K等分(K是一个正整数),将结果记录到一个K选1选择器组合电路中,该选择器称为误差补偿表。对计算精度要求高的场景,将误差补偿值与对数加法电路中其他部分的结果相加;对计算精度要求低的场景,可以关闭误差补偿表相关所有电路,不使用该部分功能。 The error(a nm ) is expanded according to the Taylor series. According to the accuracy requirement, the higher order items of the third level, the fourth level or more are retained, and the value ranges of x belonging to [-1, 1] are equally divided into K equal parts ( K is a positive integer), and the result is recorded into a K-select 1 selector combination circuit, which is called an error compensation table. For scenes with high computational accuracy requirements, the error compensation value is added to the results of other parts of the logarithmic addition circuit; for scenarios with low computational accuracy requirements, all circuits related to the error compensation table can be turned off, and this part of the function is not used.
406、加法器360对m、a n-m以及a n-m的误差补偿值进行加法运算,得到(log e a)*log a(a m+a n)的数值。 406. The adder 360 adds the error compensation values of m, a nm, and a nm to obtain a value of (log e a )*log a (a m + a n ).
由上述公式(1)至(6)可知,(log e a)*log a(A*B+C*D)等于m+a n-m与误差项error(a n-m)的和,由于在实际计算误差项error(a n-m)的过程中,只能展开有限项数的泰勒级数,因此,m+a n-m或者m+a n-m与误差项error(a n-m)的和只是近似作为(log e a)*log a(A*B+C*D)的数值。 It can be seen from the above formulas (1) to (6) that (log e a )*log a (A*B+C*D) is equal to the sum of m+a nm and the error term error(a nm ) due to the actual calculation error. In the process of error(a nm ), only the Taylor series of finite number of terms can be expanded. Therefore, the sum of m+a nm or m+a nm and the error term error(a nm ) is only approximate as (log e a ). *log a (A*B+C*D) value.
应理解,对数加法器300在确定了(log e a)*log a(a m+a n)的数值之后,还可以进一步根据(log e a)*log a(a m+a n)的数值来确定a m+a n的数值,或者是不计算出a m+a n的数值,而是将(log e a)*log a(a m+a n)的数值输入其它运算电路进行运算。 It should be understood that after the log adder 300 determines the value of (log e a )*log a (a m +a n ), it may further be based on (log e a )*log a (a m +a n ) The value is used to determine the value of a m + a n , or the value of a m + a n is not calculated, but the value of (log e a )*log a (a m + a n ) is input to other arithmetic circuits for calculation. .
图4是本申请实施例处理乘加运算的装置的示意性框图。图4的装置400由主控中央处理器(Central Processing Unit,CPU)、双倍速率同步动态随机存储器(Double Data Rate Synchronous Dynamic Random Access Memory,DDR)内存、AXI总线、计算芯片组成。其中,计算芯片包括输入缓存模块、计算引擎模块以及输出控制模块等。其中,输入缓存模块用于存储输入的原始数据,计算引擎模块用于对原始数据进行计算,输出控制模块对计算引擎模块输出的计算结果的输出进行控制。4 is a schematic block diagram of an apparatus for processing a multiply-and-accumulate operation in an embodiment of the present application. The device 400 of FIG. 4 is composed of a central processing unit (CPU), a double data rate synchronous dynamic random access memory (DDR) memory, an AXI bus, and a computing chip. The computing chip includes an input buffer module, a calculation engine module, an output control module, and the like. The input buffer module is configured to store the input raw data, the calculation engine module is used to calculate the original data, and the output control module controls the output of the calculation result output by the calculation engine module.
应理解,图2所示的装置200以及图3所示的装置300可以对应于图4中的计算芯片,该计算芯片能够实现上文中由装置200和装置300对数据的处理过程。另外,上述装置200和装置300还可以直接对应于图4中的计算引擎模块,该计算引擎模块能够实现上文中由装置200和装置300对数据的处理过程。另外,上述计算引擎模块还可以是基于硬件电路实现的。It should be understood that the apparatus 200 shown in FIG. 2 and the apparatus 300 shown in FIG. 3 may correspond to the computing chip in FIG. 4, which is capable of implementing the processing of data by the apparatus 200 and the apparatus 300 above. In addition, the above apparatus 200 and apparatus 300 may also directly correspond to the calculation engine module in FIG. 4, which is capable of implementing the processing of data by the apparatus 200 and the apparatus 300 above. In addition, the above calculation engine module may also be implemented based on a hardware circuit.
图5是本申请实施例的处理乘加运算的装置进行乘加运算的示意性流程图。具体地, 图5可以具体表示上文中装置400进行乘加运算的示意性流程图。应理解,图5表示的可以是对多个数据进行乘累加的计算过程。FIG. 5 is a schematic flowchart of a multiplication and addition operation performed by the apparatus for processing multiplication and addition operations in the embodiment of the present application. Specifically, FIG. 5 may specifically represent a schematic flowchart of the above-described multiplication and addition operation of the device 400. It should be understood that FIG. 5 may represent a calculation process of multiplying and accumulating a plurality of data.
501、输入缓存模块将缓存的线性域中的图像数据转化为对数域中的数据;501. The input buffer module converts image data in the buffered linear domain into data in a logarithmic domain;
502、计算引擎模块在对数域下对数值进行加法运算,从而计算出线性域下的数值相乘的结果;502. The calculation engine module adds the values in the logarithmic domain to calculate a result of multiplying the values in the linear domain;
503、计算引擎模块对线性域下的数据相乘后得到的结果进行相加,通过比较电路、移位电路以及误差补偿电路等完成指数的相加运算,得到处理结果。503. The calculation engine module adds the results obtained by multiplying the data in the linear domain, and completes the addition operation of the index through the comparison circuit, the shift circuit, and the error compensation circuit to obtain a processing result.
504、输出控制模块将计算引擎模块输出的数据进行量化,对齐下一级运算的数据位宽,并输出。504. The output control module quantizes the data output by the calculation engine module, aligns the data bit width of the next-level operation, and outputs the data.
在实际计算过程中可能会重复进行步骤502至步骤504的计算过程。The calculation process of steps 502 to 504 may be repeated in the actual calculation process.
上文结合图2至图4对本申请实施例的处理乘加运算的装置进行了详细的描述,下面结合图6对本申请实施例的处理乘加运算的方法进行描述。应理解,图2至图4中的处理乘加运算的装置能够实现图6中的处理乘加运算的方法,图6中处理乘加运算的方法与图2至图5中的处理乘加运算的装置是对应的。为了简洁,下面适当省略重复的描述。The apparatus for processing the multiply-and-accumulate operation of the embodiment of the present application is described in detail above with reference to FIG. 2 to FIG. 4 . The method for processing the multiplication and addition operation of the embodiment of the present application will be described below with reference to FIG. 6 . It should be understood that the apparatus for processing multiply-add operation in FIGS. 2 to 4 can implement the processing multiplication and addition operation in FIG. 6, the processing multiplication and addition operation in FIG. 6, and the processing multiplication and addition operation in FIGS. 2 to 5. The device is corresponding. For the sake of brevity, the repeated description is appropriately omitted below.
图6是本申请实施例处理数据的方法的示意性流程图。图6的方法可以由上述处理数据的装置200、装置300或者装置400来执行。图6的方法600包括:FIG. 6 is a schematic flowchart of a method for processing data according to an embodiment of the present application. The method of FIG. 6 can be performed by the apparatus 200, the apparatus 300, or the apparatus 400 that processes the data described above. The method 600 of Figure 6 includes:
610、对输入的第一数据和第二数据进行加法运算,得到第一中间数据,其中,所述第一数据和所述第二数据的数值分别为log aA和log aB,所述第一中间数据的数值为m,所述第一数据和第二数据是对多个原始数据中的第一原始数据A和第二原始数据B分别取对数后得到的; 610. Add the input first data and the second data to obtain the first intermediate data, where the values of the first data and the second data are log a A and log a B, respectively. The value of an intermediate data is m, and the first data and the second data are obtained by taking a logarithm of the first original data A and the second original data B of the plurality of original data respectively;
620、对输入的第三数据和第四数据进行加法运算,得到第二中间数据,其中,所述第三数据和所述第四数据的数值分别为log aC和log aD,所述第二中间数据的数值为n,所述第三数据和第四数据是对所述多个原始数据中的第三原始数据C和第四原始数据D分别取对数后得到的,其中,a为大于0且不等于1的整数,m和n为实数,且m大于等于n; 620. Add the input third data and the fourth data to obtain second intermediate data, where the values of the third data and the fourth data are log a C and log a D, respectively. The value of the second intermediate data is n, and the third data and the fourth data are obtained by taking a logarithm of the third original data C and the fourth original data D of the plurality of original data respectively, wherein a is An integer greater than 0 and not equal to 1, m and n are real numbers, and m is greater than or equal to n;
630、根据所述第一加法器和所述第二加法器输入的m和n得到a n-m,并将m与a n-m的和近似确定为(log e a)*log a(A*B+C*D)的数值。 630. Obtain a nm according to m and n input by the first adder and the second adder, and approximate the sum of m and a nm to (log e a )*log a (A*B+C *D) The value.
本申请中,通过将指数形式的数据之间的加和转化为具有较低位宽的数值的加和,实现了将高位宽的数据运算转化到低位宽的数据运算,能够在计算过程中减少对资源的占用,从而降低计算功耗。具体而言,与a m、a n相比,m与a n-m是位宽较低的数据,通过位宽较低的数据的加和来计算位宽较高的数据a m与a n的加和,能够避免采用高位宽的加法器,能够降低计算芯片的面积,降低计算功耗。 In the present application, by converting the sum of the data in the exponential form into the sum of the values having the lower bit width, the data operation for converting the high bit width data operation to the low bit width is realized, which can be reduced in the calculation process. The use of resources, thereby reducing computing power consumption. Specifically, compared with a m, a n, m and a nm is the lower-bit wide data, a m and higher data a n is calculated by adding the bit width and data bit width plus low And, the use of a high bit width adder can be avoided, which can reduce the area of the computing chip and reduce the calculation power consumption.
另外,上述a具体可以为2。In addition, the above a may specifically be 2.
可选地,作为一个实施例,所述根据所述第一加法器和所述第二加法器输入的m和n得到a n-m,并将m与a n-m的和近似确定为(log e a)*log a(A*B+C*D)的数值包括:确定对所述多个原始数据进行处理时需要达到的目标精度;在所述目标精度低于第一精度的情况下,将m与a n-m的和近似确定为(log e a)*log a(A*B+C*D)的数值。 Optionally, as an embodiment, the m and n inputs according to the first adder and the second adder obtain a nm , and the sum of m and a nm is approximated as (log e a ) The value of *log a (A*B+C*D) includes: determining a target accuracy that needs to be achieved when processing the plurality of original data; and in the case where the target accuracy is lower than the first precision, m and The sum of a nm is approximately determined as the value of (log e a )*log a (A*B+C*D).
上述第一精度可以是预先设置的,当目标精度低于第一精度可以认为对原始数据处理时要求的精度较低。通过比较目标精度与预设的精度的大小关系能够确定对原始数据进行 处理时的精度要求,当精度要求较低时可以直接将m+a n-m近似确定为(log e a)*log a(A*B+C*D)的数值,能够根据处理原始数据的精度要求来灵活确定(log e a)*log a(A*B+C*D)的数值,能够保证原始数据的精度要求,并提高运算效率。 The first precision described above may be preset, and when the target accuracy is lower than the first precision, the accuracy required for processing the original data may be considered to be low. By comparing the magnitude relationship between the target accuracy and the preset accuracy, the accuracy requirement for processing the original data can be determined. When the accuracy requirement is low, the m+a nm approximation can be directly determined as (log e a )*log a (A The value of *B+C*D) can flexibly determine the value of (log e a )*log a (A*B+C*D) according to the accuracy requirement of processing the original data, and can ensure the accuracy requirement of the original data, and Improve computing efficiency.
可选地,作为一个实施例,上述方法600还包括:根据误差补偿表确定a n-m的误差补偿值,其中,所述误差补偿表包含K个数值以及所述K个数值的误差补偿值,其中,所述K个数值是将[-1,1]分成K份得到的,所述K个误差补偿值是将所述K个数值代入到误差补偿项
Figure PCTCN2018084275-appb-000005
得到的,K和L均为大于1的整数;将m+a n-m与a n-m的误差补偿值的和近似确定为(log e a)*log a(A*B+C*D)的数值。
Optionally, as an embodiment, the method 600 further includes: determining an error compensation value of a nm according to the error compensation table, where the error compensation table includes K values and error compensation values of the K values, where The K values are obtained by dividing [-1, 1] into K shares, and the K error compensation values are obtained by substituting the K values into the error compensation term
Figure PCTCN2018084275-appb-000005
The obtained K and L are integers greater than 1; the sum of the error compensation values of m+a nm and a nm is approximately determined as the value of (log e a )*log a (A*B+C*D).
在确定(log e a)*log a(A*B+C*D)的数值时除了m+a n-m之外,还可以将a n-m的误差补偿值考虑进去,能够进一步提高计算精度。 In addition to m+a nm , the error compensation value of a nm can be taken into account in determining the value of (log e a )*log a (A*B+C*D), which can further improve the calculation accuracy.
可选地,作为一个实施例,所述将m+a n-m与a n-m的误差补偿值的和近似确定为(log e a)*log a(A*B+C*D)的数值,包括:确定对所述多个原始数据进行处理时需要达到的目标精度;在所述目标精度高于第二精度的情况下,将m+a n-m与a n-m的误差补偿值的和近似确定为(log e a)*log a(A*B+C*D)的数值。 Optionally, as an embodiment, the sum of the error compensation values of m+a nm and a nm is approximately determined as a value of (log e a )*log a (A*B+C*D), including: Determining a target accuracy that needs to be achieved when processing the plurality of original data; and if the target accuracy is higher than the second precision, determining a sum of error compensation values of m+a nm and a nm is determined as (log e a )*log a (A*B+C*D) value.
当目标精度高于第二精度时,可以认为对原始数据处理时要求的精度较高,此时在确定(log e a)*log a(A*B+C*D)的数值时可以将a n-m的误差补偿值考虑进去,以确保(log e a)*log a(A*B+C*D)的数值的精度。另外,上述第二精度可以与第一精度相同。 When the target precision is higher than the second precision, it can be considered that the precision required for the processing of the original data is high, and at this time, a value of (log e a )*log a (A*B+C*D) can be determined. The error compensation value of nm is taken into account to ensure the accuracy of the value of (log e a )*log a (A*B+C*D). In addition, the second precision described above may be the same as the first precision.
可选地,作为一个实施例,所述K是根据所述目标精度确定的。Optionally, as an embodiment, the K is determined according to the target accuracy.
可选地,作为一个实施例,所述L是根据所述目标精度确定的。Optionally, as an embodiment, the L is determined according to the target accuracy.
当目标精度较高时,K可以是一个较大的数值,而当目标精度较低时,K可以是一个较小的数值。K的数值越大将[-1,1]划分的越细,这样在查询误差补偿表确定a n-m的误差补偿值是能够取得更准确的结果。 When the target precision is high, K can be a large value, and when the target precision is low, K can be a small value. The larger the value of K is, the finer the [-1,1] is, so that the error compensation value of a nm can be obtained in the query error compensation table to obtain more accurate results.
当L的数值越大时,误差补偿项的项数越多,根据该误差补偿项得到的误差补偿值就越准确,因此,当目标精度较高时,L可以是一个较大的数值,而当目标精度较低时,L可以是一个较小的数值。When the value of L is larger, the more the number of items of the error compensation term, the more accurate the error compensation value obtained according to the error compensation term. Therefore, when the target precision is high, L can be a large value, and When the target accuracy is low, L can be a smaller value.
可选地,作为一个实施例,所述根据所述第一加法器和所述第二加法器输入的m和n得到a n-m,并将m与a n-m的和近似确定为(log e a)*log a(A*B+C*D)的数值,包括:根据n-m对a进行移位运算,得到a n-m;对m和a n-m进行加法运算,得到m+a n-mOptionally, as an embodiment, the m and n inputs according to the first adder and the second adder obtain a nm , and the sum of m and a nm is approximated as (log e a ) *log a (A*B+C*D) values, including: shifting a according to nm to obtain a nm ; adding m and a nm to obtain m+a nm .
可选地,作为一个实施例,所述根据所述第一加法器和所述第二加法器输入的m和n得到a n-m,并将m与a n-m的和近似确定为(log e a)*log a(A*B+C*D)的数值,包括:对m和n进行减法运算,得到m-n或者n-m;比较m-n或者n-m与零的大小关系;在m-n大于等于零的情况下,选择出m和n-m,或者,用于在n-m小于等于零的情况下,选择出m和n-m。 Optionally, as an embodiment, the m and n inputs according to the first adder and the second adder obtain a nm , and the sum of m and a nm is approximated as (log e a ) *log a (A*B+C*D) value, including: subtracting m and n to obtain mn or nm; comparing mn or nm to zero; if mn is greater than or equal to zero, select m and nm, or, for the case where nm is less than or equal to zero, m and nm are selected.
可选地,作为一个实施例,上述方法600还包括:根据(log e a)*log a(A*B+C*D)近似得到A*B+C*D的值。 Optionally, as an embodiment, the method 600 further includes: obtaining a value of A*B+C*D according to (log e a )*log a (A*B+C*D).
可选地,作为一个实施例,上述方法600还包括:对所述A*B+C*D的值进行量化,以达到预设的数据位宽。Optionally, as an embodiment, the foregoing method 600 further includes: quantizing the value of the A*B+C*D to reach a preset data bit width.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以 硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the various examples described in connection with the embodiments disclosed herein can be implemented in electronic hardware or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the solution. A person skilled in the art can use different methods to implement the described functions for each particular application, but such implementation should not be considered to be beyond the scope of the present application.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。A person skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working process of the system, the device and the unit described above can refer to the corresponding process in the foregoing method embodiment, and details are not described herein again.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner, for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。The functions may be stored in a computer readable storage medium if implemented in the form of a software functional unit and sold or used as a standalone product. Based on such understanding, the technical solution of the present application, which is essential or contributes to the prior art, or a part of the technical solution, may be embodied in the form of a software product, which is stored in a storage medium, including The instructions are used to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present application. The foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like, which can store program codes. .
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。The foregoing is only a specific embodiment of the present application, but the scope of protection of the present application is not limited thereto, and any person skilled in the art can easily think of changes or substitutions within the technical scope disclosed in the present application. It should be covered by the scope of protection of this application. Therefore, the scope of protection of the present application should be determined by the scope of the claims.

Claims (18)

  1. 一种处理乘加运算的装置,其特征在于,包括:An apparatus for processing a multiply-and-accumulate operation, comprising:
    第一加法器,用于对输入的第一数据和第二数据进行加法运算,得到第一中间数据,其中,所述第一数据和所述第二数据的数值分别为log aA和log aB,所述第一中间数据的数值为m,所述第一数据和第二数据是对多个原始数据中的第一原始数据A和第二原始数据B分别取对数后得到的; a first adder, configured to add the input first data and the second data to obtain first intermediate data, wherein values of the first data and the second data are log a A and log a, respectively B, the value of the first intermediate data is m, and the first data and the second data are obtained by taking a logarithm of the first original data A and the second original data B of the plurality of original data respectively;
    第二加法器,用于对输入的第三数据和第四数据进行加法运算,得到第二中间数据,其中,所述第三数据和所述第四数据的数值分别为log aC和log aD,所述第二中间数据的数值为n,所述第三数据和第四数据是对所述多个原始数据中的第三原始数据C和第四原始数据D分别取对数后得到的,其中,a为大于0且不等于1的整数,m和n为实数,且m大于等于n; a second adder, configured to add the input third data and the fourth data to obtain second intermediate data, wherein the values of the third data and the fourth data are log a C and log a, respectively D. The value of the second intermediate data is n, and the third data and the fourth data are obtained by taking a logarithm of the third original data C and the fourth original data D of the plurality of original data respectively. Where a is an integer greater than 0 and not equal to 1, m and n are real numbers, and m is greater than or equal to n;
    对数加法器,所述对数加法器的输入端口与所述第一加法器以及所述第二加法器的输出端口相连,所述对数加法器用于根据所述第一加法器和所述第二加法器输入的m和n得到a n-m,并将m与a n-m的和近似确定为(log e a)*log a(A*B+C*D)的数值; a logarithmic adder, the input port of the logarithmic adder being coupled to an output port of the first adder and the second adder, the logarithmic adder for using the first adder and the The m and n inputs of the second adder obtain a nm , and the sum of m and a nm is approximately determined as the value of (log e a )*log a (A*B+C*D);
    其中,所述第一加法器、所述第二加法器以及所述对数加法器由硬件电路实现。The first adder, the second adder, and the logarithmic adder are implemented by a hardware circuit.
  2. 如权利要求1所述的装置,其特征在于,所述对数加法器用于根据所述第一加法器和所述第二加法器输入的m和n得到a n-m,并将m与a n-m的和近似确定为(log e a)*log a(A*B+C*D)的数值,包括: The apparatus according to claim 1, wherein said logarithmic adder is configured to obtain a nm according to m and n input from said first adder and said second adder, and to obtain m and a nm And approximate values determined as (log e a )*log a (A*B+C*D), including:
    确定对所述多个原始数据进行处理时需要达到的目标精度;Determining a target accuracy that needs to be achieved when processing the plurality of original data;
    在所述目标精度低于第一精度的情况下,将m与a n-m的和近似确定为(log e a)*log a(A*B+C*D)的数值。 In the case where the target accuracy is lower than the first precision, the sum of m and a nm is approximately determined as a value of (log e a )*log a (A*B+C*D).
  3. 如权利要求1所述的装置,其特征在于,所述对数加法器还用于:The apparatus of claim 1 wherein said logarithmic adder is further
    根据误差补偿表确定a n-m的误差补偿值,其中,所述误差补偿表包含K个数值以及所述K个数值的误差补偿值,其中,所述K个数值是将[-1,1]分成K份得到的,所述K个误差补偿值是将所述K个数值代入到误差补偿项
    Figure PCTCN2018084275-appb-100001
    得到的,K和L均为大于1的整数;
    Determining an error compensation value of a nm according to an error compensation table, wherein the error compensation table includes K values and error compensation values of the K values, wherein the K values are dividing [-1, 1] K, the K error compensation values are obtained by substituting the K values into the error compensation term
    Figure PCTCN2018084275-appb-100001
    The obtained K and L are both integers greater than one;
    将m+a n-m与a n-m的误差补偿值的和近似确定为(log e a)*log a(A*B+C*D)的数值。 The sum of the error compensation values of m+a nm and a nm is approximately determined as the value of (log e a )*log a (A*B+C*D).
  4. 如权利要求3所述的装置,其特征在于,所述对数加法器将m+a n-m与a n-m的误差补偿值的和近似确定为(log e a)*log a(A*B+C*D)的数值,包括: The apparatus according to claim 3, wherein said logarithmic adder approximates a sum of error compensation values of m+a nm and a nm as (log e a )*log a (A*B+C *D) values, including:
    确定对所述多个原始数据进行处理时需要达到的目标精度;Determining a target accuracy that needs to be achieved when processing the plurality of original data;
    在所述目标精度高于第二精度的情况下,将m+a n-m与a n-m的误差补偿值的和近似确定为(log e a)*log a(A*B+C*D)的数值。 In the case where the target accuracy is higher than the second precision, the sum of the error compensation values of m+a nm and a nm is approximately determined as the value of (log e a )*log a (A*B+C*D) .
  5. 如权利要求3或4所述的装置,其特征在于,所述K是根据所述目标精度确定的。The apparatus of claim 3 or 4 wherein said K is determined based on said target accuracy.
  6. 如权利要求3-5中任一项所述的装置,其特征在于,所述L是根据所述目标精度确定的。Apparatus according to any one of claims 3-5, wherein said L is determined based on said target accuracy.
  7. 如权利要求1-6中任一项所述的装置,其特征在于,所述对数加法器具体包括:The apparatus according to any one of claims 1 to 6, wherein the logarithmic adder specifically comprises:
    移位电路,用于根据n-m对a进行移位运算,得到a n-ma shifting circuit for shifting a according to nm to obtain a nm ;
    子加法电路,用于对m和a n-m进行加法运算,得到m+a n-mA sub-addition circuit for adding m and a nm to obtain m+a nm .
  8. 如权利要求7所述的装置,其特征在于,所述对数加法器还包括:The apparatus of claim 7 wherein said logarithmic adder further comprises:
    减法电路,用于对m和n进行减法运算,得到m-n或者n-m;a subtraction circuit for subtracting m and n to obtain m-n or n-m;
    比较电路,用于比较m-n或者n-m与零的大小关系;a comparison circuit for comparing the magnitude relationship of m-n or n-m with zero;
    选择电路,用于在m-n大于等于零的情况下,选择出m和n-m,Selecting a circuit for selecting m and n-m if m-n is greater than or equal to zero,
    或者,用于在n-m小于等于零的情况下,选择出m和n-m。Alternatively, for the case where n-m is less than or equal to zero, m and n-m are selected.
  9. 如权利要求1-8中任一项所述的装置,其特征在于,所述装置还包括:The device of any of claims 1-8, wherein the device further comprises:
    转换器,用于根据(log e a)*log a(A*B+C*D)近似得到A*B+C*D的值,其中,所述转换器由硬件电路实现。 A converter for approximating the value of A*B+C*D according to (log e a )*log a (A*B+C*D), wherein the converter is implemented by a hardware circuit.
  10. 一种处理乘加运算的方法,其特征在于,包括:A method for processing a multiply-and-accumulate operation, comprising:
    对输入的第一数据和第二数据进行加法运算,得到第一中间数据,其中,所述第一数据和所述第二数据的数值分别为log aA和log aB,所述第一中间数据的数值为m,所述第一数据和第二数据是对多个原始数据中的第一原始数据A和第二原始数据B分别取对数后得到的; Adding the input first data and the second data to obtain first intermediate data, wherein values of the first data and the second data are log a A and log a B, respectively, the first middle The value of the data is m, and the first data and the second data are obtained by taking a logarithm of the first original data A and the second original data B of the plurality of original data respectively;
    对输入的第三数据和第四数据进行加法运算,得到第二中间数据,其中,所述第三数据和所述第四数据的数值分别为log aC和log aD,所述第二中间数据的数值为n,所述第三数据和第四数据是对所述多个原始数据中的第三原始数据C和第四原始数据D分别取对数后得到的,其中,a为大于0且不等于1的整数,m和n为实数,且m大于等于n; Adding the input third data and the fourth data to obtain second intermediate data, wherein the values of the third data and the fourth data are log a C and log a D, respectively, the second middle The value of the data is n, and the third data and the fourth data are obtained by taking a logarithm of the third original data C and the fourth original data D of the plurality of original data, respectively, wherein a is greater than 0. And an integer not equal to 1, m and n are real numbers, and m is greater than or equal to n;
    根据所述第一加法器和所述第二加法器输入的m和n得到a n-m,并将m与a n-m的和近似确定为(log e a)*log a(A*B+C*D)的数值。 Obtaining a nm according to m and n input from the first adder and the second adder, and approximating the sum of m and a nm as (log e a )*log a (A*B+C*D The value of ).
  11. 如权利要求10所述的方法,其特征在于,所述根据所述第一加法器和所述第二加法器输入的m和n得到a n-m,并将m与a n-m的和近似确定为(log e a)*log a(A*B+C*D)的数值包括: The method according to claim 10, wherein said m and n according to said first adder and said second adder input a nm , and the sum of m and a nm is approximately determined as ( The values of log e a )*log a (A*B+C*D) include:
    确定对所述多个原始数据进行处理时需要达到的目标精度;Determining a target accuracy that needs to be achieved when processing the plurality of original data;
    在所述目标精度低于第一精度的情况下,将m与a n-m的和近似确定为(log e a)*log a(A*B+C*D)的数值。 In the case where the target accuracy is lower than the first precision, the sum of m and a nm is approximately determined as a value of (log e a )*log a (A*B+C*D).
  12. 如权利要求10所述的方法,其特征在于,所述方法还包括:The method of claim 10, wherein the method further comprises:
    根据误差补偿表确定a n-m的误差补偿值,其中,所述误差补偿表包含K个数值以及所述K个数值的误差补偿值,其中,所述K个数值是将[-1,1]分成K份得到的,所述K个误差补偿值是将所述K个数值代入到误差补偿项
    Figure PCTCN2018084275-appb-100002
    得到的,K和L均为大于1的整数;
    Determining an error compensation value of a nm according to an error compensation table, wherein the error compensation table includes K values and error compensation values of the K values, wherein the K values are dividing [-1, 1] K, the K error compensation values are obtained by substituting the K values into the error compensation term
    Figure PCTCN2018084275-appb-100002
    The obtained K and L are both integers greater than one;
    将m+a n-m与a n-m的误差补偿值的和近似确定为(log e a)*log a(A*B+C*D)的数值。 The sum of the error compensation values of m+a nm and a nm is approximately determined as the value of (log e a )*log a (A*B+C*D).
  13. 如权利要求12所述的方法,其特征在于,所述将m+a n-m与a n-m的误差补偿值的和近似确定为(log e a)*log a(A*B+C*D)的数值,包括: The method according to claim 12, wherein said sum of error compensation values of m+a nm and a nm is approximately (log e a )*log a (A*B+C*D) Values, including:
    确定对所述多个原始数据进行处理时需要达到的目标精度;Determining a target accuracy that needs to be achieved when processing the plurality of original data;
    在所述目标精度高于第二精度的情况下,将m+a n-m与a n-m的误差补偿值的和近似确定为(log e a)*log a(A*B+C*D)的数值。 In the case where the target accuracy is higher than the second precision, the sum of the error compensation values of m+a nm and a nm is approximately determined as the value of (log e a )*log a (A*B+C*D) .
  14. 如权利要求12或13所述的方法,其特征在于,所述K是根据所述目标精度确定的。The method of claim 12 or 13, wherein said K is determined based on said target accuracy.
  15. 如权利要求12-14中任一项所述的方法,其特征在于,所述L是根据所述目标精度确定的。The method of any of claims 12-14, wherein the L is determined based on the target accuracy.
  16. 如权利要求10-15中任一项所述的方法,其特征在于,所述根据所述第一加法器和所述第二加法器输入的m和n得到a n-m,并将m与a n-m的和近似确定为(log e a)*log a(A*B+C*D)的数值,包括: The method according to any one of claims 10 to 15, wherein said m and n according to inputs of said first adder and said second adder obtain a nm and m and a nm And the approximate value determined as (log e a )*log a (A*B+C*D), including:
    根据n-m对a进行移位运算,得到a n-mShifting a according to nm to obtain a nm ;
    对m和a n-m进行加法运算,得到m+a n-mAdding m and a nm gives m+a nm .
  17. 如权利要求16所述的方法,其特征在于,所述根据所述第一加法器和所述第二加法器输入的m和n得到a n-m,并将m与a n-m的和近似确定为(log e a)*log a(A*B+C*D)的数值,包括: The method according to claim 16, wherein said m and n according to said first adder and said second adder input a nm , and the sum of m and a nm is approximately determined as ( Log e a )*log a (A*B+C*D) values, including:
    对m和n进行减法运算,得到m-n或者n-m;Subtracting m and n to obtain m-n or n-m;
    比较m-n或者n-m与零的大小关系;Compare the size relationship of m-n or n-m with zero;
    在m-n大于等于零的情况下,选择出m和n-m,In the case where m-n is greater than or equal to zero, m and n-m are selected,
    或者,用于在n-m小于等于零的情况下,选择出m和n-m。Alternatively, for the case where n-m is less than or equal to zero, m and n-m are selected.
  18. 如权利要求10-17中任一项所述的方法,其特征在于,所述方法还包括:The method of any of claims 10-17, wherein the method further comprises:
    根据(log e a)*log a(A*B+C*D)近似得到A*B+C*D的值。 The value of A*B+C*D is approximated according to (log e a )*log a (A*B+C*D).
PCT/CN2018/084275 2017-04-24 2018-04-24 Device for processing multiplication and addition operations and method for processing multiplication and addition operations WO2018196750A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710269126.2 2017-04-24
CN201710269126.2A CN107220025B (en) 2017-04-24 2017-04-24 Apparatus for processing multiply-add operation and method for processing multiply-add operation

Publications (1)

Publication Number Publication Date
WO2018196750A1 true WO2018196750A1 (en) 2018-11-01

Family

ID=59945435

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/084275 WO2018196750A1 (en) 2017-04-24 2018-04-24 Device for processing multiplication and addition operations and method for processing multiplication and addition operations

Country Status (2)

Country Link
CN (1) CN107220025B (en)
WO (1) WO2018196750A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107220025B (en) * 2017-04-24 2020-04-21 华为机器有限公司 Apparatus for processing multiply-add operation and method for processing multiply-add operation
WO2019165602A1 (en) * 2018-02-28 2019-09-06 深圳市大疆创新科技有限公司 Data conversion method and device
GB2577132B (en) * 2018-09-17 2021-05-26 Apical Ltd Arithmetic logic unit, data processing system, method and module
US11243743B2 (en) * 2018-10-18 2022-02-08 Facebook, Inc. Optimization of neural networks using hardware calculation efficiency and adjustment factors

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1996027839A1 (en) * 1995-03-03 1996-09-12 Motorola Inc. Computational array circuit for providing parallel multiplication
US5956264A (en) * 1992-02-29 1999-09-21 Hoefflinger; Bernd Circuit arrangement for digital multiplication of integers
US20060101243A1 (en) * 2004-11-10 2006-05-11 Nvidia Corporation Multipurpose functional unit with multiply-add and logical test pipeline
CN105867876A (en) * 2016-03-28 2016-08-17 武汉芯泰科技有限公司 Multiply accumulator, multiply accumulator array, digital filter and multiply accumulation method
CN107220025A (en) * 2017-04-24 2017-09-29 华为机器有限公司 The method for handling the device and processing multiply-add operation of multiply-add operation

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100340972C (en) * 2005-06-07 2007-10-03 北京北方烽火科技有限公司 Method for implementing logarithm computation by field programmable gate array in digital auto-gain control
JP2008257407A (en) * 2007-04-04 2008-10-23 Fujitsu Microelectronics Ltd Logarithmic computing unit and logarithmic computing method
GB2554167B (en) * 2014-05-01 2019-06-26 Imagination Tech Ltd Approximating functions
CN106528046B (en) * 2016-11-02 2019-06-07 上海集成电路研发中心有限公司 Long bit wide timing adds up multiplier

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5956264A (en) * 1992-02-29 1999-09-21 Hoefflinger; Bernd Circuit arrangement for digital multiplication of integers
WO1996027839A1 (en) * 1995-03-03 1996-09-12 Motorola Inc. Computational array circuit for providing parallel multiplication
US20060101243A1 (en) * 2004-11-10 2006-05-11 Nvidia Corporation Multipurpose functional unit with multiply-add and logical test pipeline
CN105867876A (en) * 2016-03-28 2016-08-17 武汉芯泰科技有限公司 Multiply accumulator, multiply accumulator array, digital filter and multiply accumulation method
CN107220025A (en) * 2017-04-24 2017-09-29 华为机器有限公司 The method for handling the device and processing multiply-add operation of multiply-add operation

Also Published As

Publication number Publication date
CN107220025A (en) 2017-09-29
CN107220025B (en) 2020-04-21

Similar Documents

Publication Publication Date Title
WO2018196750A1 (en) Device for processing multiplication and addition operations and method for processing multiplication and addition operations
US11249721B2 (en) Multiplication circuit, system on chip, and electronic device
US10491239B1 (en) Large-scale computations using an adaptive numerical format
US11074041B2 (en) Method and system for elastic precision enhancement using dynamic shifting in neural networks
CN111240746B (en) Floating point data inverse quantization and quantization method and equipment
CN110888623B (en) Data conversion method, multiplier, adder, terminal device and storage medium
CN110222833B (en) Data processing circuit for neural network
TW202115560A (en) Multiplier and method for floating-point arithmetic, integrated circuit chip, and computing device
CN110852434A (en) CNN quantization method, forward calculation method and device based on low-precision floating point number
US9983850B2 (en) Shared hardware integer/floating point divider and square root logic unit and associated methods
US8346831B1 (en) Systems and methods for computing mathematical functions
CN114612996A (en) Method for operating neural network model, medium, program product, and electronic device
US6182100B1 (en) Method and system for performing a logarithmic estimation within a data processing system
KR100433131B1 (en) A pipelined divider with small lookup table and a method for dividing the same
WO2018210339A1 (en) Data processing apparatus and method
CN115827555B (en) Data processing method, computer device, storage medium, and multiplier structure
Hanuman et al. Hardware implementation of 24-bit vedic multiplier in 32-bit floating-point divider
CN114860193A (en) Hardware operation circuit for calculating Power function and data processing method
Kim et al. Low-power multiplierless DCT for image/video coders
Chen et al. Design and analysis of an approximate 2D convolver
JP2015015026A (en) Model calculation unit for calculating function model based on data using data on various numeric format, and control device
Naregal et al. Design and implementation of high efficiency vedic binary multiplier circuit based on squaring circuits
CN114691082A (en) Multiplier circuit, chip, electronic device, and computer-readable storage medium
Wang et al. A multiplier structure based on a novel real-time CSD recoding
US20240069865A1 (en) Fractional logarithmic number system adder

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18792008

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18792008

Country of ref document: EP

Kind code of ref document: A1