WO2018196750A1

WO2018196750A1 - Device for processing multiplication and addition operations and method for processing multiplication and addition operations

Info

Publication number: WO2018196750A1
Application number: PCT/CN2018/084275
Authority: WO
Inventors: 徐斌; 陈清龙; 戎建江
Original assignee: 华为技术有限公司
Priority date: 2017-04-24
Filing date: 2018-04-24
Publication date: 2018-11-01
Also published as: CN107220025A; CN107220025B

Abstract

The present application provides a device and method for processing multiplication and addition operations. The device comprises: a first adder, used for performing an addition operation on inputted first data and second data to obtain first intermediate data, wherein values the first data and the second data are respectively logaA and logaB; a second adder, used for performing an addition operation on inputted third data and fourth data to obtain second intermediate data, wherein values of the third data and the fourth data are respectively logaC and logaD, and the value of the second intermediate data is N; a logarithm adder, used for obtaining an-m according to m and n inputted by the first adder and the second adder and approximately determining the sum of m and an-m as the value of (loge a)*loga(A*B+C*D), wherein the first adder, the second adder, and the logarithm adder are implemented by hardware circuits. According to the present application, calculation power consumption can be reduced during a calculation process.

Description

Device for processing multiplication and addition operations and method for processing multiplication and addition operations

This application claims the priority of the Chinese patent application filed on April 24, 2017, the Chinese Patent Office, the application number is 201710269126.2, and the application name is "the device for processing the multiply-and-accumulate operation and the method for processing the multiply-and-accumulate operation". The citations are incorporated herein by reference.

Technical field

The present application relates to the field of computers, and more particularly to an apparatus for processing multiply-add operations and a method of processing multiply-add operations.

Background technique

The computer often uses the multiplication and addition operation when processing the input data. When the computer performs the multiplication and addition operation, it first multiplies the input data, and then adds the data obtained by the multiplication operation. Since the input data is generally data in a linear domain, and the data in the linear domain occupies a relatively large bit width (for example, 32 bits), the computer needs to occupy more resources when performing multiplication and addition operations. In addition, since the multiplication and addition operations include a large number of multiplication operations, the multiplication operation has a large computation amount and the operation speed is relatively slow, which results in a computer having a low computational efficiency when performing multiplication and addition operations.

In order to solve the above problem, the prior art proposes a scheme for processing multiplication and addition operations, which converts input data in a linear domain into data in a logarithmic domain, thereby converting multiplication operations in a linear domain into logarithms. Addition in the domain. By converting the data in the linear domain into data in the logarithmic domain, the bit width occupied by the data can be reduced (for example, the original data is 32-bit data, and the bit width occupied by the logarithm becomes 5 bits). Converting multiplications in the linear domain to additions in the logarithmic domain also increases computational efficiency.

However, after completing the addition in the logarithmic domain, the above scheme also needs to reconvert the data in the logarithmic domain into data in the linear domain, and add the data in these linear domains to obtain the final result of multiply and accumulate. result. When performing the addition operation, since the bit width occupied by the data in the linear domain is large, the computer still needs to occupy more resources when performing the addition operation.

Summary of the invention

The present application provides an apparatus and method for processing a multiply-accumulate operation to reduce computational power consumption.

In a first aspect, an apparatus for processing a multiply-add operation is provided, the apparatus comprising: a first adder for adding the input first data and the second data to obtain first intermediate data, wherein the The values of the first data and the second data are log _a A and log _a B, respectively, and the value of the first intermediate data is m, and the first data and the second data are the number of the plurality of original data a raw data A and a second original data B are respectively obtained by taking a logarithm; a second adder is configured to add the input third data and the fourth data to obtain second intermediate data, wherein the first The values of the three data and the fourth data are log _a C and log _a D, respectively, and the value of the second intermediate data is n, and the third data and the fourth data are in the plurality of original data. The third original data C and the fourth original data D are respectively obtained by taking a logarithm, wherein a is an integer greater than 0 and not equal to 1, m and n are real numbers, and m is greater than or equal to n; a logarithmic adder, An input port of the logarithmic adder and the first adder and the Two adder output port connected to said adders for obtaining a ^nm according to the first adder and said second adder input m and n, and m and a ^nm is determined and the approximation ( A value of log _e ^a )*log _a (A*B+C*D); wherein the first adder, the second adder, and the logarithmic adder are implemented by a hardware circuit.

The first adder, the second adder, and the logarithmic adder may be implemented by using various hardware circuits such as an Application Specific Integrated Circuit (ASIC) or a Field Programmable Gate Array (FPGA). .

In the present application, by converting the sum of the data in the exponential form into the sum of the values having the lower bit width, the data operation for converting the high bit width data operation to the low bit width is realized, which can be reduced in the calculation process. The use of resources, thereby reducing computing power consumption.

It should be understood, as compared with a ^m, a ^n, m and a ^nm is the lower-bit wide data, a ^m and higher data a ⁿ is calculated by adding the bit width and data bit width plus and lower It can avoid the use of a high bit width adder, which can reduce the area of the computing chip and reduce the calculation power consumption. It should also be understood that the above A, B, C, and D are all real numbers greater than zero.

The above numerical value which approximates the sum of m and a ^nm to (log _e ^a )*log _a (A*B+C*D) may be the sum of m and a ^nm as (log _e ^a )*log _a (A Approximate value of *B+C*D).

It should be understood that the above a may specifically be 2.

Optionally, the logarithmic adder may be further configured to obtain a ^nm according to m and n input by the first adder and the second adder, and determine a sum of m and -a ^nm as (log _e ^a )*log _a (A*BC*D) value.

The above multiplication and addition operation is a generalized multiplication operation, which may include an addition operation between products, or may include a subtraction operation between products. For example, the above multiplication operation may include A*B+C*D or A*B-C*D.

In conjunction with the first aspect, in some implementations of the first aspect, the logarithmic adder is configured to derive a ^nm from m and n of the first adder and the second adder input, and to The sum of a ^nm is approximately determined as a value of (log _e ^a )*log _a (A*B+C*D), including: determining a target accuracy to be achieved when processing the plurality of original data; In the case where the accuracy is lower than the first precision, the sum of m and a ^nm is approximately determined as a value of (log _e ^a )*log _a (A*B+C*D).

The first precision described above may be preset, and when the target accuracy is lower than the first precision, the accuracy required for processing the original data may be considered to be low.

By comparing the magnitude relationship between the target accuracy and the preset accuracy, the accuracy requirement for processing the original data can be determined. When the accuracy requirement is low, the m+a ^nm approximation can be directly determined as (log _e ^a )*log _a (A The value of *B+C*D) can flexibly determine the value of (log _e ^a )*log _a (A*B+C*D) according to the accuracy requirement of processing the original data, and can ensure the accuracy requirement of the original data, and Improve computing efficiency.

In conjunction with the first aspect, in some implementations of the first aspect, the logarithmic adder is specifically configured to: determine an error compensation value of a ^nm according to an error compensation table, where the error compensation table includes K values and The error compensation value of the K values, wherein the K values are obtained by dividing [-1, 1] into K shares, and the K error compensation values are obtained by substituting the K values into an error compensation term

The obtained K and L are integers greater than 1; the sum of the error compensation values of m+a ^nm and a ^nm is approximately determined as the value of (log _e ^a )*log _a (A*B+C*D).

In addition to m+a ^nm , the error compensation value of a ^nm can be taken into account in determining the value of (log _e ^a )*log _a (A*B+C*D), which can further improve the calculation accuracy.

In conjunction with the first aspect, in some implementations of the first aspect, the logarithmic adder approximates the sum of the error compensation values of m+a ^nm and a ^nm to (log _e ^a )*log _a (A* The value of B+C*D) includes: determining a target accuracy to be achieved when processing the plurality of original data; and if the target accuracy is higher than the second precision, m+a ^nm and a ^nm The sum of the error compensation values is approximately determined as the value of (log _e ^a )*log _a (A*B+C*D).

When the target precision is higher than the second precision, it can be considered that the precision required for the processing of the original data is high, and at this time, a value of (log _e ^a )*log _a (A*B+C*D) can be determined. The error compensation value of ^nm is taken into account to ensure the accuracy of the value of (log _e ^a )*log _a (A*B+C*D). In addition, the second precision may be the same as the first precision, and the second precision may be greater than the first precision.

In conjunction with the first aspect, in some implementations of the first aspect, the K is determined based on the target accuracy.

When the target precision is high, K can be a large value, and when the target precision is low, K can be a small value.

The larger the value of K, the finer the [-1,1] is divided, so that the error compensation value of a ^nm can be obtained in the query error compensation table to obtain more accurate results.

In conjunction with the first aspect, in some implementations of the first aspect, the L is determined based on the target accuracy.

When the value of L is larger, the more the number of items of the error compensation term, the more accurate the error compensation value obtained according to the error compensation term. Therefore, when the target precision is high, L can be a large value, and When the target accuracy is low, L can be a smaller value.

In conjunction with the first aspect, in some implementations of the first aspect, the logarithmic adder specifically includes: a shifting circuit for performing a shift operation on a according to nm to obtain a ^nm ; a sub-addition circuit for Adding m and a ^nm gives m+a ^nm .

In conjunction with the first aspect, in some implementations of the first aspect, the logarithmic adder further includes: a subtraction circuit for subtracting m and n to obtain mn or nm; and a comparison circuit for comparing mn Or a relationship between nm and zero; a selection circuit for selecting m and nm in the case where mn is greater than or equal to zero, or for selecting m and nm in the case where nm is less than or equal to zero.

In conjunction with the first aspect, in some implementations of the first aspect, the apparatus further comprises: a converter for approximating A*B according to (log _e ^a )*log _a (A*B+C*D) approximation A value of +C*D, wherein the converter is implemented by a hardware circuit.

In conjunction with the first aspect, in some implementations of the first aspect, the apparatus further includes: a quantizer for quantizing the value of the A*B+C*D to achieve a preset data bit width .

In a second aspect, a method for processing a multiply-add operation is provided, the method comprising: adding an input first data and a second data to obtain first intermediate data, wherein the first data and the first The values of the two data are log _a A and log _a B, respectively, the value of the first intermediate data is m, and the first data and the second data are the first original data A and the second of the plurality of original data. The raw data B is obtained by taking the logarithm respectively; adding the third data and the fourth data to obtain the second intermediate data, wherein the values of the third data and the fourth data are respectively log _a C and log _a D, the value of the second intermediate data is n, and the third data and the fourth data are respectively paired with the third original data C and the fourth original data D of the plurality of original data Obtained after the number, where a is an integer greater than 0 and not equal to 1, m and n are real numbers, and m is greater than or equal to n; according to m and n input by the first adder and the second adder a ^nm and approximate the sum of m and a ^nm as (log _e ^a )*log _a (A*B+C*D) The value.

In conjunction with the second aspect, in some implementations of the second aspect, the m and n inputs according to the first adder and the second adder obtain a ^nm and approximate the sum of m and a ^nm Determining the value of (log _e ^a )*log _a (A*B+C*D) includes: determining a target accuracy that needs to be achieved when processing the plurality of original data; and the target accuracy is lower than the first precision In the case, the sum of m and a ^nm is approximately determined as the value of (log _e ^a )*log _a (A*B+C*D).

With reference to the second aspect, in some implementations of the second aspect, the method further comprises: determining an error compensation value of a ^nm according to the error compensation table, wherein the error compensation table includes K values and the K Numerical error compensation value, wherein the K values are obtained by dividing [-1, 1] into K shares, and the K error compensation values are obtained by substituting the K values into an error compensation term

In conjunction with the second aspect, in some implementations of the second aspect, the sum of the error compensation values of m+a ^nm and a ^nm is approximately (log _e ^a )*log _a (A*B+C* The value of D) includes: determining a target accuracy that needs to be achieved when processing the plurality of original data; and correcting an error of m+a ^nm and a ^nm when the target accuracy is higher than the second precision The sum is approximately determined as the value of (log _e ^a )*log _a (A*B+C*D).

In conjunction with the second aspect, in some implementations of the second aspect, the K is determined based on the target accuracy.

In conjunction with the second aspect, in some implementations of the second aspect, the m and n inputs according to the first adder and the second adder obtain a ^nm and approximate the sum of m and a ^nm The value determined as (log _e ^a )*log _a (A*B+C*D) includes: shifting a according to nm to obtain a ^nm ; adding m and a ^nm to obtain m+a ^Nm .

In conjunction with the second aspect, in some implementations of the second aspect, the m and n inputs according to the first adder and the second adder obtain a ^nm and approximate the sum of m and a ^nm The value determined as (log _e ^a )*log _a (A*B+C*D) includes: subtracting m and n to obtain mn or nm; comparing the magnitude relationship of mn or nm with zero; In the case of being equal to zero, m and nm are selected, or, in the case where nm is less than or equal to zero, m and nm are selected.

In conjunction with the second aspect, in some implementations of the second aspect, the method further comprises: approximating A*B+C*D according to (log _e ^a )*log _a (A*B+C*D) Value, wherein the converter is implemented by a hardware circuit.

In conjunction with the second aspect, in some implementations of the second aspect, the method further comprises: quantizing the value of the A*B+C*D to achieve a preset data bit width.

DRAWINGS

1 is a schematic flow chart of a method for processing a multiply-and-accumulate operation in the prior art;

2 is a schematic block diagram of an apparatus for processing a multiply-and-accumulate operation according to an embodiment of the present application;

3 is a schematic block diagram of an apparatus for processing a multiply-and-accumulate operation in an embodiment of the present application;

4 is a schematic block diagram of an apparatus for processing a multiply-and-accumulate operation according to an embodiment of the present application;

FIG. 5 is a schematic flowchart of a method for processing a multiply-and-accumulate operation according to an embodiment of the present application; FIG.

FIG. 6 is a schematic flowchart of a method for processing a multiply and add operation in an embodiment of the present application.

detailed description

The technical solutions in the present application will be described below with reference to the accompanying drawings. In order to better implement the apparatus for processing data in the embodiment of the present application, a method for processing the multiply-add operation in the prior art will be briefly described below with reference to FIG.

FIG. 1 shows a schematic flow chart of a method of processing a multiply-and-accumulate operation in the prior art.

In FIG. 1, four multipliers (a first multiplier, a second multiplier, a third multiplier, and a fourth multiplier) respectively multiply four pairs of data to obtain four 32-bit data, and then, The first adder and the second adder respectively add four 32-bit data outputted by the four multipliers to obtain two 32-bit data, and then the third adder and the second adder and the second adder The two 32-bit data output by the adder is added to obtain a 32-bit data, and finally a 32-bit data obtained by the addition is quantized to obtain 16-bit data.

Since the energy consumption and chip area of the multiplier are much larger than the adder, if there are too many multipliers inside the computer, the energy consumption is high and the calculation efficiency is relatively low. In order to solve this problem, the prior art proposes a scheme for processing the multiply-and-accumulate operation. This scheme converts data in a linear domain into data in a logarithmic domain, thereby transforming multiplication operations in the linear domain into addition operations in the logarithmic domain.

The following takes the data A, B, C, and D in the linear domain as an example to describe the calculation process of A*B+C*D in detail:

First, convert A, B, C, and D in the linear domain into data in the logarithmic domain to get:

x=log ₂ A, y=log ₂ B, z=log ₂ C, w=log ₂ D, where A=2 ^x , B=2 ^y , C=2 ^z , D=2 ^w

Second, converting the multiplication operations in the linear domain into additions in the logarithmic domain yields:

A*B+C*D=2 ^x+y +2 ^z+w

Therefore, the multiplication of A and B is converted into the addition of x and y, and the multiplication of C and D is converted into the addition of z and w. Finally, 2 ^x+y +2 ^{z+w is} calculated by x+y and z+w respectively, and then 2 ^x+y and 2 ^z+w are added to obtain the operation result of A×B+C×D.

Although this scheme converts multiplication operations in the linear domain into addition operations in the logarithmic domain, multiplication is avoided, but after the addition in the logarithmic domain is completed, the data in the logarithmic domain is also x, y, z, w) are transformed into data in the linear domain (2 ^x+y , 2 ^z+w ) and then added, since the data in the log domain occupies less bit width (for example, x, y The data width occupied by z and w is 5 bits. The data in the linear domain occupies more bits (for example, 2 ^x+y and 2 ^z+w occupy 32 bits of data width), therefore, After converting the data in the log domain into data in the linear domain, it is still necessary to use a high bit width adder to perform the addition, resulting in more resources that the computer still needs to occupy when performing the addition operation.

Therefore, the embodiment of the present application proposes a device for processing a multiply-and-accumulate operation, which is capable of converting an addition operation between data of an exponential form of a higher bit width into an addition operation of data of a lower bit width, and is capable of The computational process reduces the use of resources, thereby reducing computational power consumption.

FIG. 2 is a schematic block diagram of an apparatus for processing data according to an embodiment of the present application. The apparatus 200 of Figure 2 includes:

The first adder 210 is configured to add the input first data and the second data to obtain the first intermediate data, wherein the values of the first data and the second data are log _a A and log _a B, respectively The value of an intermediate data is m, and the first data and the second data are obtained by taking a logarithm of the first original data A and the second original data B of the plurality of original data respectively;

a second adder 220, configured to add the input third data and the fourth data to obtain second intermediate data, wherein the values of the third data and the fourth data are log _a C and log _a D, respectively The value of the second intermediate data is n, and the third data and the fourth data are obtained by taking the logarithm of the third original data C and the fourth original data D of the plurality of original data respectively, wherein a is greater than 0 and not An integer equal to 1, m and n are real numbers, and m is greater than or equal to n.

The above raw data may be RGB pixel data when the image is processed.

The value of a above may be 2.

When processing the plurality of original data to obtain a plurality of intermediate data, the product operation between the original data may be first converted into an addition operation in the logarithmic domain, and then a plurality of intermediate data in an exponential form are obtained.

The logarithmic adder 230, the input port of the logarithmic adder 230 is connected to the output ports of the first adder 210 and the second adder 220, and the logarithmic adder 230 is used according to the first adder 210 and the second adder 220. The input m and n are a ^nm , and the sum of m and a ^nm is approximately determined as the value of (log _e ^a )*log _a (A*B+C*D).

The first adder 210, the second adder 220, and the logarithmic adder 230 described above may be implemented by hardware circuits. Specifically, the first adder 210, the second adder 220, and the logarithmic adder 230 may be based on an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), or the like. A variety of hardware circuits are implemented.

In the present application, by converting the addition operation between the data in the exponential form into the addition operation with the value of the lower bit width, the data operation for converting the high bit width data operation to the low bit width is realized, which can be reduced in the calculation process. The use of resources, thereby reducing computing power consumption.

Specifically, since the data bit width occupied by m and a ^{nm is} smaller than the data bit width occupied by a ^m and a ⁿ , in the embodiment of the present application, the addition of the high bit width of a ^m and a ⁿ is converted into m. The addition of a low bit width to a ^nm reduces the occupation of system resources during the calculation process and improves computational efficiency.

Alternatively, the logarithmic adder 230 may determine the sum of m and a ^nm to be approximately (log _e ^a )*log _a (A*B+C*D), or m and -a ^nm. The sum is approximately determined as the value of (log _e ^a )*log _a (A*BC*D).

The above multiplication and addition operation is a generalized multiplication and addition operation, and may include an addition operation between products, or may include a subtraction operation between products. For example, the multiply-accumulate operation may include A*B+C*D or A*B-C*D.

Alternatively, the logarithmic adder 230 obtains a ^nm at m and n input according to the first adder 210 and the second adder 220, and approximates the sum of m and a ^nm to (log _e ^a )*log _{The value of a} (A*B+C*D) specifically includes: determining the target accuracy to be achieved when processing a plurality of original data; and the sum of m and a ^nm when the target accuracy is lower than the first precision Approximately determined as the value of (log _e ^a )*log _a (A*B+C*D).

The first precision described above may be preset, and when the target accuracy is lower than the first precision, the accuracy required for processing the original data may be considered to be low. By comparing the magnitude relationship between the target accuracy and the preset accuracy, the accuracy requirement for processing the original data can be determined. When the accuracy requirement is low, the m+a ^nm approximation can be directly determined as (log _e ^a )*log _a (A *B+C*D) value. Therefore, the present application can flexibly determine the value of (log _e ^a )*log _a (A*B+C*D) according to the precision requirement of processing the original data, can ensure the accuracy requirement of the original data, and improve the operation efficiency.

In some embodiments, the logarithmic adder 230 is specifically configured to: determine an error compensation value of a ^nm according to the error compensation table, where the error compensation table includes K values and error compensation values of K values, wherein K The value is obtained by dividing [-1,1] into K parts, and K error compensation values are substituted for K values into the error compensation term.

Alternatively, when determining (log _e ^a )*log _a (A*B+C*D) based on the error compensation values of m+a ^nm and a ^nm , the K values may be divided into [0, 1] K shares were obtained.

When the log compensation value according to ma ^nm and -a ^nm is determined (log _e ^a )*log _a (A*BC*D), the K values may be obtained by dividing [-1, 0] into K shares.

It should be understood that when dividing [-1, 1], [0, 1] or [-1, 0] into K shares, it is possible to divide the intervals equally to obtain K values.

It should be understood that determining the error compensation value of a ^nm according to the error compensation table may be determining the error compensation value of a ^nm by querying the error compensation table. Specifically, the error compensation table may first query a value closest to a ^nm among the K values, and then determine the error compensation value of the value as the error compensation value of a ^nm .

Optionally, the logarithmic adder 230 determines the sum of the error compensation values of m+a ^nm and a ^{nm to be} a value of (log _e ^a )*log _a (A*B+C*D), specifically including: determining The target accuracy to be achieved when processing multiple raw data; if the target accuracy is higher than the second precision, the sum of the error compensation values of m+a ^nm and a ^nm is approximately (log _e ^a )*log _{The value of a} (A*B+C*D).

When the target precision is higher than the second precision, it can be considered that the precision required for the processing of the original data is high, and at this time, a value of (log _e ^a )*log _a (A*B+C*D) can be determined. The error compensation value of ^nm is taken into account to ensure the accuracy of the value of (log _e ^a )*log _a (A*B+C*D). In addition, the second precision described above may be the same as the first precision.

Optionally, as an embodiment, the logarithmic adder 230 may determine the absolute value of the nm and the first threshold when determining the value of (log _e ^a )*log _a (A*B+C*D). Size relationship; if the absolute value of nm is greater than or equal to the first threshold, logarithmic adder 230 may directly determine m as a value of (log _e ^a )*log _a (A*B+C*D).

When the absolute value of nm is large, the value of a ^nm is very small compared with m. Therefore, a ^nm can be ignored in the calculation, and the value of m is directly determined to be (log _e ^a )*log _a ( The value of A*B+C*D) can reduce the computational complexity.

For example, n = 2, m = 10 , the first threshold value is 5, the absolute value is greater than a first threshold nm, ^-8 A value much smaller than 10, the value may be ignored A ^-8, 10 directly determined Is the value of (log _e ^a )*log _a (A*B+C*D).

When the absolute value of nm is less than the first threshold, the logarithmic adder 230 still determines the sum of m and a ^nm approximately as a value of (log _e ^a )*log _a (A*B+C*D).

In some embodiments, K is determined based on target accuracy. Specifically, K may be a larger value when the target precision is higher, and K may be a smaller value when the target precision is lower.

Specifically, the larger the value of K is, the finer the [-1,1] is divided, and the more data is included in the error compensation table. At this time, the error compensation value of a ^nm can be obtained according to the error compensation table to obtain more accurate results. .

In some embodiments, L is determined based on target accuracy.

Specifically, when the value of L is larger, the more the number of items of the error compensation term, the more accurate the error compensation value obtained according to the error compensation term, and the smaller the value of L is, the smaller the number of items of the error compensation term is. The less accurate the error compensation value obtained according to the error compensation term, therefore, L can be a larger value when the target precision is higher, and L can be a smaller value when the target precision is lower. .

In the present application, the flexibility of the original data processing can be flexibly adjusted by flexibly setting the values of K and L.

Optionally, in some embodiments, the logarithmic adder 230 specifically includes:

The shift circuit 2301 is configured to perform a shift operation on a according to nm to obtain a ^nm ;

Sub-addition circuit 2302 is used to add m and a ^nm to obtain m+a ^nm .

Optionally, in some embodiments, the logarithmic adder 230 further includes:

The subtraction circuit 2303 is configured to perform subtraction on m and n to obtain m-n or n-m;

Comparation circuit 2304, for comparing the magnitude relationship of m-n or n-m with zero;

The selection circuit 2305 is configured to select m and n-m if m-n is greater than or equal to zero, or to select m and n-m if n-m is less than or equal to zero.

It should be understood that the shift circuit 2301 may first acquire nm from the selection circuit 2305 before performing a shift operation on a according to nm, and the sub-addition circuit 2302 may first select from the selection circuit 2305 before adding m and a ^nm . Get m.

Further, when the subtraction circuit 2303 performs subtraction on m and n, either one of them may be subtracted and the other may be subtracted, thereby obtaining m-n or n-m.

Optionally, as an embodiment, the foregoing apparatus 200 further includes: a converter 240, configured to approximate the value of A*B+C*D according to (log _e ^a )*log _a (A*B+C*D) .

Optionally, as an embodiment, the apparatus 200 further includes: a quantizer 250, configured to quantize the value of the A*B+C*D to reach a preset data bit width.

The converter 240 and the quantizer 250 can be implemented by hardware circuits. Specifically, the converter 240 and the quantizer 250 can be implemented based on hardware circuits such as an ASIC and an FPGA.

Wherein, quantification refers to matching data of different bit widths. For example, the bit width of the data obtained in the first step is 8 bits, and the bit width required for the second step operation is 5 bits, then 8 The bit data is truncated into 5 bits of data to meet the calculation of the bit width requirement in the second step. The specific implementation may be to adjust the maximum value of more than 5 bits of the 8-bit data to the 5-bit maximum value, which will be less than The 5-bit minimum is adjusted to the 5-bit minimum, and the other values are unchanged.

FIG. 3 is a schematic block diagram of a logarithmic adder 300 for processing a multiply-and-accumulate operation in an embodiment of the present application.

The logarithmic adder 300 specifically includes a subtraction circuit 310, a comparison circuit 320, a selection circuit 330, a shift circuit 340, an error compensation circuit 350, and an addition circuit 360.

Assuming that there are raw data A, B, C, D, you need to calculate the value of A*B+C*D, then first log the original data, get x=log _a A, y=log _a B,z=log _a C and w=log _a D, and let x+y=m,z+w=n, then A*B+C*D=a ^m +a ⁿ , that is, by calculating a ^m + a ⁿ The value gives the value of A*B+C*D.

The process of determining the numerical value of a ^m + a ⁿ by the logarithmic adder 300 will be described in detail below. n and m are the input 5 bits of data (assuming m>n), and sign indicates whether the sign bits of n and m are the same. For example, when sign is 1, it means that a ^m and a ^{n have the} same number, and when sign is 0, it means a. ^{The m} and a ⁿ different numbers (herein the case where sign is 1), the specific steps of the device 300 for calculating a ^m + a ⁿ are as follows:

401, the subtraction circuit 310 makes a difference between n and m, and obtains n-m or m-n;

402, the comparison circuit 320 obtains the result n-m or m-n calculated by the subtraction circuit 310, and compares the size of n-m or m-n with zero;

403, the selection circuit 330 selects a larger number m and n-m from n and m according to the magnitude relationship of n-m or m-n and zero;

404, the shift circuit 340 performs a shift operation on a according to nm to obtain a ^nm ;

405. The error compensation circuit 350 calculates an error compensation value of a ^nm .

The error compensation circuit 350 may specifically be a multiple-selector combination combination circuit. The error compensation circuit 350 may also be referred to as an error compensation table, that is, a dotted line portion in the figure.

The following describes the generation process of the error compensation table in detail.

According to the Taylor formula, ln(1+x) is expanded to get:

When x ∈ [-1, 1], the series converges. Therefore, it can be written as: (1):

Ln(1+x)=x+error(x) (2)

In the formula (2), error(x) represents the sum of the quadratic term and the high-order term in the expansion, and as long as a sufficiently high number of items are retained, a sufficiently high precision can be ensured.

Since log _a (x) and ln(x) differ by a constant, that is, log _a (x)=C*ln(x), log _a (x) can also be expanded into the form of formula (2). Where C=log _a ^e .

When x>y,

Log _a (a ^x +a ^y )=x+log _a (1+a ^yx )

=C[x+a ^yx +error(x)] (3)

The same reason, you can get,

Log _a (a ^x -a ^y )=C[xa ^yx +error(x)] (4)

According to the above formula (3):

Log _a (a ^m + a ⁿ )=C[m+a ^nm +error(a ^nm )] (5)

(log _e ^a )*log _a (a ^m +a ⁿ )=C[m+a ^nm +error(a ^nm )] (6)

The error(a ^nm ) is expanded according to the Taylor series. According to the accuracy requirement, the higher order items of the third level, the fourth level or more are retained, and the value ranges of x belonging to [-1, 1] are equally divided into K equal parts ( K is a positive integer), and the result is recorded into a K-select 1 selector combination circuit, which is called an error compensation table. For scenes with high computational accuracy requirements, the error compensation value is added to the results of other parts of the logarithmic addition circuit; for scenarios with low computational accuracy requirements, all circuits related to the error compensation table can be turned off, and this part of the function is not used.

406. The adder 360 adds the error compensation values of m, a ^nm, and a ^nm to obtain a value of (log _e ^a )*log _a (a ^m + a ⁿ ).

It can be seen from the above formulas (1) to (6) that (log _e ^a )*log _a (A*B+C*D) is equal to the sum of m+a ^nm and the error term error(a ^nm ) due to the actual calculation error. In the process of error(a ^nm ), only the Taylor series of finite number of terms can be expanded. Therefore, the sum of m+a ^nm or m+a ^nm and the error term error(a ^nm ) is only approximate as (log _e ^a ). *log _a (A*B+C*D) value.

It should be understood that after the log adder 300 determines the value of (log _e ^a )*log _a (a ^m +a ⁿ ), it may further be based on (log _e ^a )*log _a (a ^m +a ⁿ ) The value is used to determine the value of a ^m + a ⁿ , or the value of a ^m + a ⁿ is not calculated, but the value of (log _e ^a )*log _a (a ^m + a ⁿ ) is input to other arithmetic circuits for calculation. .

4 is a schematic block diagram of an apparatus for processing a multiply-and-accumulate operation in an embodiment of the present application. The device 400 of FIG. 4 is composed of a central processing unit (CPU), a double data rate synchronous dynamic random access memory (DDR) memory, an AXI bus, and a computing chip. The computing chip includes an input buffer module, a calculation engine module, an output control module, and the like. The input buffer module is configured to store the input raw data, the calculation engine module is used to calculate the original data, and the output control module controls the output of the calculation result output by the calculation engine module.

It should be understood that the apparatus 200 shown in FIG. 2 and the apparatus 300 shown in FIG. 3 may correspond to the computing chip in FIG. 4, which is capable of implementing the processing of data by the apparatus 200 and the apparatus 300 above. In addition, the above apparatus 200 and apparatus 300 may also directly correspond to the calculation engine module in FIG. 4, which is capable of implementing the processing of data by the apparatus 200 and the apparatus 300 above. In addition, the above calculation engine module may also be implemented based on a hardware circuit.

FIG. 5 is a schematic flowchart of a multiplication and addition operation performed by the apparatus for processing multiplication and addition operations in the embodiment of the present application. Specifically, FIG. 5 may specifically represent a schematic flowchart of the above-described multiplication and addition operation of the device 400. It should be understood that FIG. 5 may represent a calculation process of multiplying and accumulating a plurality of data.

501. The input buffer module converts image data in the buffered linear domain into data in a logarithmic domain;

502. The calculation engine module adds the values in the logarithmic domain to calculate a result of multiplying the values in the linear domain;

503. The calculation engine module adds the results obtained by multiplying the data in the linear domain, and completes the addition operation of the index through the comparison circuit, the shift circuit, and the error compensation circuit to obtain a processing result.

504. The output control module quantizes the data output by the calculation engine module, aligns the data bit width of the next-level operation, and outputs the data.

The calculation process of steps 502 to 504 may be repeated in the actual calculation process.

The apparatus for processing the multiply-and-accumulate operation of the embodiment of the present application is described in detail above with reference to FIG. 2 to FIG. 4 . The method for processing the multiplication and addition operation of the embodiment of the present application will be described below with reference to FIG. 6 . It should be understood that the apparatus for processing multiply-add operation in FIGS. 2 to 4 can implement the processing multiplication and addition operation in FIG. 6, the processing multiplication and addition operation in FIG. 6, and the processing multiplication and addition operation in FIGS. 2 to 5. The device is corresponding. For the sake of brevity, the repeated description is appropriately omitted below.

FIG. 6 is a schematic flowchart of a method for processing data according to an embodiment of the present application. The method of FIG. 6 can be performed by the apparatus 200, the apparatus 300, or the apparatus 400 that processes the data described above. The method 600 of Figure 6 includes:

610. Add the input first data and the second data to obtain the first intermediate data, where the values of the first data and the second data are log _a A and log _a B, respectively. The value of an intermediate data is m, and the first data and the second data are obtained by taking a logarithm of the first original data A and the second original data B of the plurality of original data respectively;

620. Add the input third data and the fourth data to obtain second intermediate data, where the values of the third data and the fourth data are log _a C and log _a D, respectively. The value of the second intermediate data is n, and the third data and the fourth data are obtained by taking a logarithm of the third original data C and the fourth original data D of the plurality of original data respectively, wherein a is An integer greater than 0 and not equal to 1, m and n are real numbers, and m is greater than or equal to n;

630. Obtain a ^nm according to m and n input by the first adder and the second adder, and approximate the sum of m and a ^nm to (log _e ^a )*log _a (A*B+C *D) The value.

In the present application, by converting the sum of the data in the exponential form into the sum of the values having the lower bit width, the data operation for converting the high bit width data operation to the low bit width is realized, which can be reduced in the calculation process. The use of resources, thereby reducing computing power consumption. Specifically, compared with a ^m, a ^n, m and a ^nm is the lower-bit wide data, a ^m and higher data a ⁿ is calculated by adding the bit width and data bit width plus low And, the use of a high bit width adder can be avoided, which can reduce the area of the computing chip and reduce the calculation power consumption.

In addition, the above a may specifically be 2.

Optionally, as an embodiment, the m and n inputs according to the first adder and the second adder obtain a ^nm , and the sum of m and a ^nm is approximated as (log _e ^a ) The value of *log _a (A*B+C*D) includes: determining a target accuracy that needs to be achieved when processing the plurality of original data; and in the case where the target accuracy is lower than the first precision, m and The sum of a ^nm is approximately determined as the value of (log _e ^a )*log _a (A*B+C*D).

The first precision described above may be preset, and when the target accuracy is lower than the first precision, the accuracy required for processing the original data may be considered to be low. By comparing the magnitude relationship between the target accuracy and the preset accuracy, the accuracy requirement for processing the original data can be determined. When the accuracy requirement is low, the m+a ^nm approximation can be directly determined as (log _e ^a )*log _a (A The value of *B+C*D) can flexibly determine the value of (log _e ^a )*log _a (A*B+C*D) according to the accuracy requirement of processing the original data, and can ensure the accuracy requirement of the original data, and Improve computing efficiency.

Optionally, as an embodiment, the method 600 further includes: determining an error compensation value of a ^nm according to the error compensation table, where the error compensation table includes K values and error compensation values of the K values, where The K values are obtained by dividing [-1, 1] into K shares, and the K error compensation values are obtained by substituting the K values into the error compensation term

Optionally, as an embodiment, the sum of the error compensation values of m+a ^nm and a ^nm is approximately determined as a value of (log _e ^a )*log _a (A*B+C*D), including: Determining a target accuracy that needs to be achieved when processing the plurality of original data; and if the target accuracy is higher than the second precision, determining a sum of error compensation values of m+a ^nm and a ^nm is determined as (log _e ^a )*log _a (A*B+C*D) value.

Optionally, as an embodiment, the K is determined according to the target accuracy.

Optionally, as an embodiment, the L is determined according to the target accuracy.

When the target precision is high, K can be a large value, and when the target precision is low, K can be a small value. The larger the value of K is, the finer the [-1,1] is, so that the error compensation value of a ^nm can be obtained in the query error compensation table to obtain more accurate results.

Optionally, as an embodiment, the m and n inputs according to the first adder and the second adder obtain a ^nm , and the sum of m and a ^nm is approximated as (log _e ^a ) *log _a (A*B+C*D) values, including: shifting a according to nm to obtain a ^nm ; adding m and a ^nm to obtain m+a ^nm .

Optionally, as an embodiment, the m and n inputs according to the first adder and the second adder obtain a ^nm , and the sum of m and a ^nm is approximated as (log _e ^a ) *log _a (A*B+C*D) value, including: subtracting m and n to obtain mn or nm; comparing mn or nm to zero; if mn is greater than or equal to zero, select m and nm, or, for the case where nm is less than or equal to zero, m and nm are selected.

Optionally, as an embodiment, the method 600 further includes: obtaining a value of A*B+C*D according to (log _e ^a )*log _a (A*B+C*D).

Optionally, as an embodiment, the foregoing method 600 further includes: quantizing the value of the A*B+C*D to reach a preset data bit width.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the various examples described in connection with the embodiments disclosed herein can be implemented in electronic hardware or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the solution. A person skilled in the art can use different methods to implement the described functions for each particular application, but such implementation should not be considered to be beyond the scope of the present application.

A person skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working process of the system, the device and the unit described above can refer to the corresponding process in the foregoing method embodiment, and details are not described herein again.

In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner, for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.

The functions may be stored in a computer readable storage medium if implemented in the form of a software functional unit and sold or used as a standalone product. Based on such understanding, the technical solution of the present application, which is essential or contributes to the prior art, or a part of the technical solution, may be embodied in the form of a software product, which is stored in a storage medium, including The instructions are used to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present application. The foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like, which can store program codes. .

The foregoing is only a specific embodiment of the present application, but the scope of protection of the present application is not limited thereto, and any person skilled in the art can easily think of changes or substitutions within the technical scope disclosed in the present application. It should be covered by the scope of protection of this application. Therefore, the scope of protection of the present application should be determined by the scope of the claims.

Claims

An apparatus for processing a multiply-and-accumulate operation, comprising:

a first adder, configured to add the input first data and the second data to obtain first intermediate data, wherein values of the first data and the second data are log a A and log a, respectively B, the value of the first intermediate data is m, and the first data and the second data are obtained by taking a logarithm of the first original data A and the second original data B of the plurality of original data respectively;

a second adder, configured to add the input third data and the fourth data to obtain second intermediate data, wherein the values of the third data and the fourth data are log a C and log a, respectively D. The value of the second intermediate data is n, and the third data and the fourth data are obtained by taking a logarithm of the third original data C and the fourth original data D of the plurality of original data respectively. Where a is an integer greater than 0 and not equal to 1, m and n are real numbers, and m is greater than or equal to n;

a logarithmic adder, the input port of the logarithmic adder being coupled to an output port of the first adder and the second adder, the logarithmic adder for using the first adder and the The m and n inputs of the second adder obtain a nm , and the sum of m and a nm is approximately determined as the value of (log e a )*log a (A*B+C*D);

The first adder, the second adder, and the logarithmic adder are implemented by a hardware circuit.
The apparatus according to claim 1, wherein said logarithmic adder is configured to obtain a nm according to m and n input from said first adder and said second adder, and to obtain m and a nm And approximate values determined as (log e a )*log a (A*B+C*D), including:

Determining a target accuracy that needs to be achieved when processing the plurality of original data;

In the case where the target accuracy is lower than the first precision, the sum of m and a nm is approximately determined as a value of (log e a )*log a (A*B+C*D).
The apparatus of claim 1 wherein said logarithmic adder is further

Determining an error compensation value of a nm according to an error compensation table, wherein the error compensation table includes K values and error compensation values of the K values, wherein the K values are dividing [-1, 1] K, the K error compensation values are obtained by substituting the K values into the error compensation term
The obtained K and L are both integers greater than one;

The sum of the error compensation values of m+a nm and a nm is approximately determined as the value of (log e a )*log a (A*B+C*D).
The apparatus according to claim 3, wherein said logarithmic adder approximates a sum of error compensation values of m+a nm and a nm as (log e a )*log a (A*B+C *D) values, including:

Determining a target accuracy that needs to be achieved when processing the plurality of original data;

In the case where the target accuracy is higher than the second precision, the sum of the error compensation values of m+a nm and a nm is approximately determined as the value of (log e a )*log a (A*B+C*D) .
The apparatus of claim 3 or 4 wherein said K is determined based on said target accuracy.
Apparatus according to any one of claims 3-5, wherein said L is determined based on said target accuracy.
The apparatus according to any one of claims 1 to 6, wherein the logarithmic adder specifically comprises:

a shifting circuit for shifting a according to nm to obtain a nm ;

A sub-addition circuit for adding m and a nm to obtain m+a nm .
The apparatus of claim 7 wherein said logarithmic adder further comprises:

a subtraction circuit for subtracting m and n to obtain m-n or n-m;

a comparison circuit for comparing the magnitude relationship of m-n or n-m with zero;

Selecting a circuit for selecting m and n-m if m-n is greater than or equal to zero,

Alternatively, for the case where n-m is less than or equal to zero, m and n-m are selected.
The device of any of claims 1-8, wherein the device further comprises:

A converter for approximating the value of A*B+C*D according to (log e a )*log a (A*B+C*D), wherein the converter is implemented by a hardware circuit.
A method for processing a multiply-and-accumulate operation, comprising:

Adding the input first data and the second data to obtain first intermediate data, wherein values of the first data and the second data are log a A and log a B, respectively, the first middle The value of the data is m, and the first data and the second data are obtained by taking a logarithm of the first original data A and the second original data B of the plurality of original data respectively;

Adding the input third data and the fourth data to obtain second intermediate data, wherein the values of the third data and the fourth data are log a C and log a D, respectively, the second middle The value of the data is n, and the third data and the fourth data are obtained by taking a logarithm of the third original data C and the fourth original data D of the plurality of original data, respectively, wherein a is greater than 0. And an integer not equal to 1, m and n are real numbers, and m is greater than or equal to n;

Obtaining a nm according to m and n input from the first adder and the second adder, and approximating the sum of m and a nm as (log e a )*log a (A*B+C*D The value of ).
The method according to claim 10, wherein said m and n according to said first adder and said second adder input a nm , and the sum of m and a nm is approximately determined as ( The values of log e a )*log a (A*B+C*D) include:

Determining a target accuracy that needs to be achieved when processing the plurality of original data;

In the case where the target accuracy is lower than the first precision, the sum of m and a nm is approximately determined as a value of (log e a )*log a (A*B+C*D).
The method of claim 10, wherein the method further comprises:

Determining an error compensation value of a nm according to an error compensation table, wherein the error compensation table includes K values and error compensation values of the K values, wherein the K values are dividing [-1, 1] K, the K error compensation values are obtained by substituting the K values into the error compensation term
The obtained K and L are both integers greater than one;

The sum of the error compensation values of m+a nm and a nm is approximately determined as the value of (log e a )*log a (A*B+C*D).
The method according to claim 12, wherein said sum of error compensation values of m+a nm and a nm is approximately (log e a )*log a (A*B+C*D) Values, including:

Determining a target accuracy that needs to be achieved when processing the plurality of original data;

In the case where the target accuracy is higher than the second precision, the sum of the error compensation values of m+a nm and a nm is approximately determined as the value of (log e a )*log a (A*B+C*D) .
The method of claim 12 or 13, wherein said K is determined based on said target accuracy.
The method of any of claims 12-14, wherein the L is determined based on the target accuracy.
The method according to any one of claims 10 to 15, wherein said m and n according to inputs of said first adder and said second adder obtain a nm and m and a nm And the approximate value determined as (log e a )*log a (A*B+C*D), including:

Shifting a according to nm to obtain a nm ;

Adding m and a nm gives m+a nm .
The method according to claim 16, wherein said m and n according to said first adder and said second adder input a nm , and the sum of m and a nm is approximately determined as ( Log e a )*log a (A*B+C*D) values, including:

Subtracting m and n to obtain m-n or n-m;

Compare the size relationship of m-n or n-m with zero;

In the case where m-n is greater than or equal to zero, m and n-m are selected,

Alternatively, for the case where n-m is less than or equal to zero, m and n-m are selected.
The method of any of claims 10-17, wherein the method further comprises:

The value of A*B+C*D is approximated according to (log e a )*log a (A*B+C*D).