CN110598172A

CN110598172A - Convolution operation method and circuit based on CSA adder

Info

Publication number: CN110598172A
Application number: CN201910779278.6A
Authority: CN
Inventors: 廖裕民; 张义群
Original assignee: Fuzhou Rockchip Electronics Co Ltd
Current assignee: Fuzhou Rockchip Electronics Co Ltd
Priority date: 2019-08-22
Filing date: 2019-08-22
Publication date: 2019-12-20
Anticipated expiration: 2039-08-22
Also published as: CN110598172B

Abstract

The invention discloses a convolution operation method and a circuit based on a CSA adder, wherein the circuit comprises a multiplication operation unit, a first addition operation unit, a patch operation unit and a second addition operation unit; the multiplication operation unit comprises a plurality of multipliers; the method comprises the following steps: the multiplier acquires a first multiplier and a second multiplier to carry out multiplication operation to obtain a multiplication operation result; the first addition operation unit performs first addition operation on the multiplication operation result of each multiplier to obtain an addition intermediate value; the patch operation unit generates corresponding patch information according to the multiplication result; the second addition operation unit is used for executing a second addition operation on each addition intermediate value and each patch information to obtain a convolution operation result. According to the scheme, when the patch information is calculated firstly and then added for operation, the patch information is added back at one time, so that the circuit consumption and the algorithm difficulty are greatly reduced, and the efficiency of convolution operation can be effectively improved.

Description

Convolution operation method and circuit based on CSA adder

Technical Field

The invention relates to the field of chip circuits, in particular to a convolution operation method and circuit based on a CSA adder.

Background

With the rapid development of the artificial intelligence industry, the requirement of users on the speed of the neural network operation is higher and higher. group _ convolution is an important algorithm in a neural network, and a corresponding hardware acceleration circuit structure is not proposed for the algorithm in the prior art, and is still given to a CPU to complete convolution operation, specifically, each group convolution is independently operated according to a convolution task, so that the overall operation efficiency is very low.

Disclosure of Invention

Therefore, a technical scheme of convolution operation based on the CSA adder is required to be provided, so as to solve the problem that the existing neural network circuit is low in efficiency when convolution operation is performed.

In order to achieve the above object, the inventors provide a convolution operation circuit based on a CSA adder, the circuit including a multiplication operation unit, a first addition operation unit, a patch operation unit, and a second addition operation unit; the multiplication operation unit comprises a plurality of multipliers; each multiplier is connected with a first addition unit, and the first addition unit and the patch operation unit are also connected with a second addition unit respectively;

the multiplier is used for acquiring a first multiplier and a second multiplier to carry out multiplication operation to obtain a multiplication operation result;

the first addition operation unit is used for performing first addition operation on the multiplication operation result of each multiplier to obtain an addition intermediate value;

the patch operation unit is used for generating corresponding patch information according to the multiplication result;

and the second addition operation unit is used for executing second addition operation on each addition intermediate value and each patch information to obtain a convolution operation result.

Furthermore, the multiplier is a radix-4 multiplier, and the radix-4 multiplier comprises a data splitting unit, a first data cache unit, a low-order zero padding unit, a second data cache unit, a radix-4 coding unit, a coding cache unit, a coding table look-up operation unit and a displacement unit; the data splitting unit is connected with a first data cache unit, the first data cache unit is connected with a low-order zero padding unit, the low-order zero padding unit is connected with a second data cache unit, the second data cache unit is connected with a base 4 coding unit, the base 4 coding unit is connected with a coding cache unit, the coding cache unit is connected with a coding table look-up operation unit, and the coding table look-up operation unit is connected with the displacement unit;

the displacement unit is connected to the first addition unit.

Further, the circuit also comprises a negative number statistical unit;

the negative number counting unit is used for counting negative number result information in the multiplication result and sending the negative number result information to the patch operation unit; the negative result information comprises negative indication bit information of a calculation result corresponding to each split data calculated by the encoding table look-up operation unit, and the split data is obtained by splitting the second multiplier through the data splitting unit.

Furthermore, the patch operation unit comprises a plurality of patch storage units, a gating unit and a logic operation unit; each gating unit is correspondingly connected with one patch storage unit, and the gating unit is also connected with the logic operation unit; patch sub information is stored in the patch storage unit, and the patch sub information stored in different storage units is different;

the gating unit is used for selecting and sending the corresponding patch sub information to the logic operation unit according to the negative number indicating bit information;

the logic operation unit is used for carrying out logic OR operation on the received patch sub information to obtain the patch information.

Further, the circuit further comprises a grouping configuration unit and a path selection unit; the path selection unit comprises a plurality of path selectors, and the path selectors are respectively connected with the first addition unit, the second addition unit and the patch unit;

the grouping configuration unit is used for configuring grouping information of convolution operation and determining a path signal of a path selector corresponding to the grouping configuration information according to the grouping configuration information so that the second addition operation unit outputs a corresponding grouping convolution operation result.

The inventor also provides a convolution operation method based on the CSA adder, which is applied to a convolution operation circuit based on the CSA adder, wherein the circuit comprises a multiplication operation unit, a first addition operation unit, a patch operation unit and a second addition operation unit; the multiplication operation unit comprises a plurality of multipliers; each multiplier is connected with a first addition unit, and the first addition unit and the patch operation unit are also connected with a second addition unit respectively; the method comprises the following steps:

the multiplier acquires a first multiplier and a second multiplier to carry out multiplication operation to obtain a multiplication operation result;

the first addition operation unit performs first addition operation on the multiplication operation result of each multiplier to obtain an addition intermediate value;

the patch operation unit generates corresponding patch information according to the multiplication result;

the second addition operation unit is used for executing a second addition operation on each addition intermediate value and each patch information to obtain a convolution operation result.

the displacement unit is connected to the first addition unit.

Further, the circuit also comprises a negative number statistical unit; the method comprises the following steps:

the negative number counting unit counts negative number result information in the multiplication result and sends the negative number result information to the patch operation unit;

the negative result information comprises negative indication bit information of a calculation result corresponding to each split data calculated by the encoding table look-up operation unit, and the split data is obtained by splitting the second multiplier through the data splitting unit.

Furthermore, the patch operation unit comprises a plurality of patch storage units, a gating unit and a logic operation unit; each gating unit is correspondingly connected with one patch storage unit, and the gating unit is also connected with the logic operation unit; patch sub information is stored in the patch storage unit, and the patch sub information stored in different storage units is different; the method comprises the following steps:

the gating unit selects and sends the corresponding patch sub-information to the logic operation unit according to the negative number indicating bit information;

the logic operation unit performs logic OR operation on the received patch sub information to obtain patch information.

Further, the circuit further comprises a grouping configuration unit and a path selection unit; the path selection unit comprises a plurality of path selectors, and the path selectors are respectively connected with the first addition unit, the second addition unit and the patch unit; the method comprises the following steps:

the grouping configuration unit configures the grouping information of the convolution operation, and determines the path signal of the path selector corresponding to the grouping configuration information according to the grouping configuration information, so that the second addition operation unit outputs the corresponding grouping convolution operation result.

The convolution operation method and circuit based on CSA adder in the above technical solution, the circuit includes a multiplication operation unit, a first addition operation unit, a patch operation unit and a second addition operation unit; the multiplication operation unit comprises a plurality of multipliers; each multiplier is connected with a first addition unit, and the first addition unit and the patch operation unit are also connected with a second addition unit respectively; the method comprises the following steps: the multiplier acquires a first multiplier and a second multiplier to carry out multiplication operation to obtain a multiplication operation result; the first addition operation unit performs first addition operation on the multiplication operation result of each multiplier to obtain an addition intermediate value; the patch operation unit generates corresponding patch information according to the multiplication result; the second addition operation unit is used for executing a second addition operation on each addition intermediate value and each patch information to obtain a convolution operation result. According to the scheme, when the patch information is calculated firstly and then added for operation, the patch information is added back at one time, so that the circuit consumption and the algorithm difficulty are greatly reduced, and the efficiency of convolution operation can be effectively improved.

Drawings

FIG. 1 is a schematic diagram of a convolutional neural network according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a CSA adder according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a CSA adder according to another embodiment of the present invention;

FIG. 4 is a diagram illustrating a CSA adder based convolution operation according to an embodiment of the present invention;

FIG. 5 is a circuit diagram of a radix-4 multiplier according to an embodiment of the present invention;

FIG. 6 is a diagram illustrating a patch operation unit according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of a path selector according to an embodiment of the present invention;

FIG. 8 is a schematic diagram of a path selector according to another embodiment of the present invention;

FIG. 9 is a diagram of an encoding operation table according to another embodiment of the present invention;

FIG. 10 is a diagram of a circuit for CSA adder-based convolution operation according to an embodiment of the present invention;

FIG. 11 is a flowchart illustrating a method for CSA adder-based convolution according to an embodiment of the present invention;

description of reference numerals:

101. a multiplication unit;

102. a first addition operation unit;

103. a patch operation unit;

104. a second addition operation unit;

105. and a negative number counting unit.

Detailed Description

To explain technical contents, structural features, and objects and effects of the technical solutions in detail, the following detailed description is given with reference to the accompanying drawings in conjunction with the embodiments.

Fig. 1 is a schematic diagram of a convolutional neural network according to an embodiment of the present invention. As can be seen from fig. 1, the neural network includes a convolutional layer, a sampling layer and a fully-connected layer, and the convolutional layer and the activation layer are the most computationally intensive when the neural network performs computation. The multiplier is applied to neural network identification calculation, convolution calculation occupies the main part of the neural network in various calculation types of the neural network, the convolution calculation comprises multiplication operation and addition operation, and the multiplication and the addition operation need corresponding hardware circuit resources to be completed.

As shown in fig. 10, the present invention provides a convolution operation circuit based on a CSA adder, which includes a multiplication unit 101, a first addition unit 102, a patch operation unit 103, and a second addition unit 104; the multiplication unit 101 includes a plurality of multipliers; each multiplier is connected to first adding section 102, and first adding section 102 and patch calculating section 103 are connected to second adding section 104.

The first adding unit 102 is a CSA adder, and the second adding unit 104 is a normal adder. As shown in fig. 2, when the CSA adder adds two addends, two addition results are output, one is an exclusive or result of the two addends (i.e., CSA value), and the other is a result obtained by performing a logical and operation on the two addends and then shifting the two addends by 1 bit to the left (i.e., CPA value). When a CSA adder receives a plurality of addends, such as the CSA adder 10 in fig. 4, which receives the results from the CSA adder 00 and the CSA adder 01, since the CSA adder 00 outputs two values (CSA00 and CSP00) and the CSA adder 01 outputs two values (CSA01 and CSP01), the CSA adder 10 receives 4 values, and for the CSA addition of the 4 values, as shown in fig. 3, the CSA adder 01 first performs the CSA addition on the CSA00 and the CSP00, and since the CSP00 is shifted by 1 bit to the left compared with the CSA00, the CSA00 is subjected to high bit zero padding and the CSP 63csp 25 is subjected to low bit zero padding and then the two are subjected to the exclusive or, or the two CSA and the CSA are subjected to the and then to the left shift to obtain the two values after the CSA 4 and the CSA00 are subjected to the exclusive or, and then the two values are subjected to the exclusive or operation on the two values are subjected to the CSA01 respectively, or the two intermediate result values are subjected to exclusive or operation with the CSP01 respectively, or the two intermediate result values are subjected to AND operation and then shifted to the left by one bit, so that the final CSA10 value and the CSP10 value are obtained, namely the result output by the adder CSA 10. The CSA10 value is a result obtained by carrying out exclusive OR operation on four values of CSA00, CSP00, CSA01 and CSP01 in sequence; the CSP10 value is a result obtained by sequentially performing a first operation on four values of CSA00, CSP00, CSA01 and CSP01, wherein the first operation is as follows: the AND operation is shifted left by 1 bit.

In the process of addition operation, when a plurality of addends need to be added, the first two addends can be added first, and then the third is added, and the process is circulated. Of course, the addends can be paired in pairs and then added, and the obtained addends are paired in pairs, the time consumption of the algorithm is mainly that the carry needs to be accumulated from back to front when the summation result is obtained every time, and in the case of multiple addends, the carry of the current stage can be stored and used as the operation result to be summed at the next stage, so that the sum is not needed to be immediately summed. For example, if it is necessary to add the output results of the multipliers 0 to 7 in fig. 4, the output results of the multipliers may be added layer by layer in the order shown in fig. 4, and the output result CSA20 value and the CSP20 value of the CSA adder 20 may be output to the second addition unit (i.e., the normal adder connected to the CSA adder 20 in fig. 4) as the output result of the entire first addition unit to perform the second addition operation.

When the circuit carries out convolution operation, the multiplier is used for obtaining a first multiplier and a second multiplier to carry out multiplication operation so as to obtain a multiplication operation result; the first addition operation unit is used for performing first addition operation on the multiplication operation result of each multiplier to obtain an addition intermediate value; the patch operation unit is used for generating corresponding patch information according to the multiplication result; and the second addition operation unit is used for executing second addition operation on each addition intermediate value and each patch information to obtain a convolution operation result.

In this embodiment, the multiplier is a radix-4 multiplier. As shown in fig. 5, the radix-4 multiplier includes a data splitting unit, a first data buffer unit, a low-order zero padding unit, a second data buffer unit, a radix-4 encoding unit, an encoding buffer unit, an encoding table lookup operation unit, and a displacement unit; the data splitting unit is connected with a first data cache unit, the first data cache unit is connected with a low-order zero padding unit, the low-order zero padding unit is connected with a second data cache unit, the second data cache unit is connected with a base 4 coding unit, the base 4 coding unit is connected with a coding cache unit, the coding cache unit is connected with a coding table look-up operation unit, and the coding table look-up operation unit is connected with the displacement unit; the displacement unit is connected to the first addition unit.

The operation principle of the radix-4 multiplier is as follows: assume there is an 8-bit binary Multiplier (Multiplier): 0111_1110, before calculating multiplication, it can be divided into groups of three bits (adjacent high bit, home bit, adjacent low bit), overlapping one bit, and divided into four groups of split data: 01(1), 11(1), 10(0), and an auxiliary bit 0 is added at the end. By looking up the coding table as shown in fig. 9 (X in the table corresponds to the input data a in fig. 5, and Y corresponds to the input data b in fig. 5), it is possible to obtain: 10 (2X), 00 (0X), -10 (-2X), or as 1000_ 000-10. At this time, multiplying by 2 is a shift operation for binary. The multiplication operation can be converted into the operation only involving the addition, the subtraction and the shift operation by adopting the Booth algorithm, which greatly simplifies the multiplication operation and reduces the operation amount. For the negative intermediate result of the multiplication (i.e. the subtraction operation is required), such as-2Y, -Y in the table, the negative number in the binary is equivalent to taking the complement of the original number, i.e. the operation involved is inverting each bit of the original number and then adding 1.

Since the add-1 operation also involves an add operation between different indicator bits, if the add-1 operation is performed immediately after the negation operation, the original operation becomes more complicated, and therefore, in the present embodiment, the add-1 operation for each set of division data is completed by the patch operation unit, thereby simplifying the add operation.

Specifically, as shown in fig. 5, the circuit further includes a negative number counting unit 105, where the negative number counting unit 105 is configured to count negative number result information in the multiplication result and send the negative number result information to the patch operation unit; the negative result information comprises negative indication bit information of a calculation result corresponding to each split data calculated by the encoding table look-up operation unit, and the split data is obtained by splitting the second multiplier through the data splitting unit.

For example, the input data a in fig. 5 is 0111_1110, and as described above, four groups of data 01(1), 11(1), 10(0) can be obtained after passing through the splitting operation unit, the operation results obtained by multiplying the four groups of data with the input data b by table lookup can be 2b, 0, 0, -2b in sequence, and then the obtained 4 split data are subjected to addition operation, so that the multiplication result of the multiplier a and the multiplier b can be obtained. For result 2b, b is shifted left by one bit (the multiply-by-2 operation in binary is equivalent to shift left by one bit), for result-2 b, b is shifted left by one bit and then the complement is taken (i.e., the complement is taken first and then +1 is taken), and the add-1 operation is performed by the patch unit. When the base 4 multiplier performs multiplication of input data 0111_1110 and any multiplier b, b is shifted to the left by one bit to obtain a first addition parameter, b is shifted to the left by one bit and then is inverted to obtain a second addition parameter, and then the first addition parameter and the second addition parameter are added in a staggered manner according to the position relation of split data 01(1) and 10(0) in the original multiplier 0111_1110 to obtain an intermediate operation result of the multiplier 0111_1110 and any multiplier b (the final result also needs to perform +1 operation at a corresponding position on the basis of the intermediate operation result, in the application, the +1 operation is performed in a patch operation unit, and the multiplier is only responsible for calculating the intermediate operation result). When the multiplier a is other values, the calculation method is similar to that described above, and the description thereof is omitted.

Preferably, the offset addition of each addition parameter is performed by a CSA adder, and specifically, as shown in fig. 5, each shifting unit is connected to the CSA adder in fig. 5, and is configured to perform the offset addition on the addition parameter corresponding to each split data, so as to obtain a CSA value and a CSP value as the output result corresponding to the multiplier.

As shown in fig. 6, in some embodiments, the patch operation unit includes a plurality of patch storage units, a gating unit, and a logic operation unit; each gating unit is correspondingly connected with one patch storage unit, and the gating unit is also connected with the logic operation unit; patch sub information is stored in the patch storage unit, and the patch sub information stored in different storage units is different;

the gating unit is used for selecting and sending the corresponding patch sub information to the logic operation unit according to the negative number indicating bit information; the logic operation unit is used for carrying out logic OR operation on the received patch sub information to obtain the patch information.

The patch operation is that on the basis of the intermediate result of the operation performed by the multiplier, the number of 1's are added, and the number of 1's is determined by the number of negatives obtained by the operation of the split data (because only the negatives have the complement operation, i.e. the negation +1 operation), and meanwhile, because there is a bit misalignment between the split data (the bit difference between the split data between adjacent ones is 2), the number of bits where "1" is located when the 1's operation is performed on different split data is also different, specifically, the 4 cases are shown in fig. 6, that is, the 1 st bit, the 3 rd bit, the 5 th bit and the 7 th bit in the 8-bit multiplier. Also taking the input data a as 0111_1110 as an example, since the first 3 obtained split data 01(1), 11(1) and the corresponding intermediate multiplication results 2b, 0, 0 do not involve negative operations, for the gating units 2, 3, 4, the patch sub information transmitted to the logical operation unit (i.e. the logical or operation circuit in fig. 6) is 8 0 (i.e. 8' b0 in the drawing), and the split data 10(0) results in an intermediate multiplication result of-2 b, since the add 1 operation corresponding to the split data 10(0) is the least significant bit, the gating unit 1 transmits the patch sub information 0000_0001 stored in the patch storage unit to the logical or operation circuit, and the logical or operation circuit performs a logical or operation on all the received patch sub information, thereby outputting the final patch information. Because the bits of the 1 adding operation corresponding to each split data are staggered, the condition that two bits of the same bit of the patch sub information are both 1 does not exist, the adding operation between the patch sub information can be simplified into logical OR operation, and the operation efficiency is improved.

As shown in fig. 4, the intermediate values of the additions are added layer by the CSA adders at different levels, and two values (CSA 20 and CSP20 in fig. 4) are finally obtained and transmitted to the normal adder, and then the normal adder adds the CSA20 value and the CSP20 value to obtain an intermediate result of the CSA addition. Similarly, the adding operation of the patch information may also be performed layer by layer, the patch information obtained by the patch operation units 0 to 7 is sequentially subjected to addition operation by the primary ordinary adder, the secondary ordinary adder and the tertiary ordinary adder to obtain final patch information, and the final patch information is added to the intermediate result of the CSA addition calculated by the ordinary adder to obtain the convolution operation result. The invention processes the calculation of the intermediate result of the addition of the patch information and the CSA in parallel, and adds back the patch information once only when the addition calculation is finally carried out, thereby greatly reducing the circuit consumption and the algorithm difficulty and effectively improving the efficiency of the convolution operation.

In the practical application process, the neural network is often applied to the packet convolution operation in addition to the complete convolution operation. During the packet convolution operation, the number of output results is corresponding to the number of packets of the packet convolution, for example, the number of packets is 2, and then the number of final output numerical results is also two; the number of groups is 4, and the final output numerical result is 4.

In order to make the circuit structure of the present invention also applicable to the requirement of the packet convolution operation, as shown in fig. 6 and 7, the circuit further includes a packet configuration unit and a path selection unit; the path selection unit comprises a plurality of path selectors, and the path selectors are respectively connected with the first addition unit, the second addition unit and the patch unit; the grouping configuration unit is used for configuring grouping information of convolution operation and determining a path signal of a path selector corresponding to the grouping configuration information according to the grouping configuration information so that the second addition operation unit outputs a corresponding grouping convolution operation result.

For example, the current packet number is 4, and each of the multipliers 0 to 7 is a group, so that the calculation results of the CSA adder 00, the CSA adder 01, the CSA adder 02 and the CSA adder 03 are not transmitted to the CSA adder at the previous stage, but transmitted to the corresponding ordinary adder (i.e., the second addition operation unit), and the calculation of the patch operation unit is similar, after the patch 0 and the patch 1 are added by the ordinary adder at the first stage and selected by the path selector, the results obtained by the two are directly added to the result of the ordinary adder corresponding to the CSA adder 00, so that the convolution operation result of the packet 1 is obtained, and is not transmitted to the second-stage adder again; the operation results of patches 2 and 3, patches 4 and 5, patch 6 and patch 7 are obtained in the same way, and are not described herein again.

When the current packet number is 2, assuming that multipliers 0 to 3 are one group and multipliers 4 to 7 are one group, the addition result of the CSA adder 10 and the CSA adder 11 is not transmitted to the CSA adder 20 at the upper stage, but is directly transmitted to the respective corresponding general adders to add the CSA10 value and the CSP10 value, or add the CSA11 value and the CSP11 value. In the patch operation unit, the patch 10 is directly added with the value obtained by adding the CSA10 value and the CSP10 value to obtain the 1 st group convolution operation result; the patch 11 is directly added to the value obtained by adding the CSA11 value and the CSP11 value to obtain the 2 nd packet convolution operation result.

In short, the output data trend of each adder can be selected through each path selector, so that the grouping convolution function is realized, and the operation requirement of grouping convolution under different grouping conditions is met.

As shown in fig. 11, the inventor further provides a convolution operation method based on a CSA adder, which is applied to a convolution operation circuit based on a CSA adder, wherein the circuit comprises a multiplication operation unit, a first addition operation unit, a patch operation unit and a second addition operation unit; the multiplication operation unit comprises a plurality of multipliers; each multiplier is connected with a first addition unit, and the first addition unit and the patch operation unit are also connected with a second addition unit respectively; the method comprises the following steps:

firstly, step S1101 is carried out, wherein a multiplier acquires a first multiplier and a second multiplier to carry out multiplication operation, and a multiplication operation result is obtained;

then, step S1102 is executed to perform a first addition operation on the multiplication result of each multiplier by the first addition operation unit to obtain an addition intermediate value;

then, the operation goes to step S1103, and the patch operation unit generates corresponding patch information according to the multiplication result;

then, the process proceeds to step S1104, where a second addition unit is used to perform a second addition operation on each of the addition intermediate values and each of the patch information, so as to obtain a convolution operation result.

In certain embodiments, the circuit further comprises a negative statistics unit; the method comprises the following steps: the negative number counting unit counts negative number result information in the multiplication result and sends the negative number result information to the patch operation unit; the negative result information comprises negative indication bit information of a calculation result corresponding to each split data calculated by the encoding table look-up operation unit, and the split data is obtained by splitting the second multiplier through the data splitting unit.

In some embodiments, the patch operation unit includes a plurality of patch storage units, a gating unit, and a logic operation unit; each gating unit is correspondingly connected with one patch storage unit, and the gating unit is also connected with the logic operation unit; patch sub information is stored in the patch storage unit, and the patch sub information stored in different storage units is different; the method comprises the following steps:

the gating unit selects and sends the corresponding patch sub-information to the logic operation unit according to the negative number indicating bit information; the logic operation unit performs logic OR operation on the received patch sub information to obtain patch information.

In some embodiments, the circuit further comprises a grouping configuration unit and a path selection unit; the path selection unit comprises a plurality of path selectors, and the path selectors are respectively connected with the first addition unit, the second addition unit and the patch unit; the method comprises the following steps:

It should be noted that, although the above embodiments have been described herein, the invention is not limited thereto. Therefore, based on the innovative concepts of the present invention, the technical solutions of the present invention can be directly or indirectly applied to other related technical fields by making changes and modifications to the embodiments described herein, or by using equivalent structures or equivalent processes performed in the content of the present specification and the attached drawings, which are included in the scope of the present invention.

Claims

1. A convolution operation circuit based on a CSA adder is characterized by comprising a multiplication operation unit, a first addition operation unit, a patch operation unit and a second addition operation unit; the multiplication operation unit comprises a plurality of multipliers; each multiplier is connected with a first addition unit, and the first addition unit and the patch operation unit are also connected with a second addition unit respectively;

2. The CSA adder-based convolution operation circuit of claim 1, wherein the multiplier is a radix-4 multiplier, the radix-4 multiplier comprising a data splitting unit, a first data buffer unit, a low-order zero padding unit, a second data buffer unit, a radix-4 coding unit, a coding buffer unit, a coding table look-up operation unit, and a displacement unit; the data splitting unit is connected with a first data cache unit, the first data cache unit is connected with a low-order zero padding unit, the low-order zero padding unit is connected with a second data cache unit, the second data cache unit is connected with a base 4 coding unit, the base 4 coding unit is connected with a coding cache unit, the coding cache unit is connected with a coding table look-up operation unit, and the coding table look-up operation unit is connected with the displacement unit;

the displacement unit is connected to the first addition unit.

3. The CSA adder-based convolution operation circuit of claim 2 further comprising a negative statistics unit;

4. The CSA adder-based convolution operation circuit of claim 3, wherein the patch operation unit includes a plurality of patch storage units, a gating unit, and a logical operation unit; each gating unit is correspondingly connected with one patch storage unit, and the gating unit is also connected with the logic operation unit; patch sub information is stored in the patch storage unit, and the patch sub information stored in different storage units is different;

5. The CSA adder-based convolution operation circuit of any of claims 1 to 4, further comprising a packet configuration unit and a path selection unit; the path selection unit comprises a plurality of path selectors, and the path selectors are respectively connected with the first addition unit, the second addition unit and the patch unit;

6. The convolution operation method based on the CSA adder is characterized by being applied to a convolution operation circuit based on the CSA adder, wherein the circuit comprises a multiplication operation unit, a first addition operation unit, a patch operation unit and a second addition operation unit; the multiplication operation unit comprises a plurality of multipliers; each multiplier is connected with a first addition unit, and the first addition unit and the patch operation unit are also connected with a second addition unit respectively; the method comprises the following steps:

7. The CSA adder-based convolution operation method of claim 6, wherein the multiplier is a radix-4 multiplier, the radix-4 multiplier comprises a data splitting unit, a first data buffer unit, a low-order zero padding unit, a second data buffer unit, a radix-4 coding unit, a coding buffer unit, a coding table look-up operation unit and a displacement unit; the data splitting unit is connected with a first data cache unit, the first data cache unit is connected with a low-order zero padding unit, the low-order zero padding unit is connected with a second data cache unit, the second data cache unit is connected with a base 4 coding unit, the base 4 coding unit is connected with a coding cache unit, the coding cache unit is connected with a coding table look-up operation unit, and the coding table look-up operation unit is connected with the displacement unit;

the displacement unit is connected to the first addition unit.

8. The CSA adder-based convolution operation method of claim 7 wherein the circuit further comprises a negative statistics unit; the method comprises the following steps:

9. The CSA adder-based convolution operation of claim 8 wherein the patch operation unit includes a plurality of patch storage units, a gating unit, and a logical operation unit; each gating unit is correspondingly connected with one patch storage unit, and the gating unit is also connected with the logic operation unit; patch sub information is stored in the patch storage unit, and the patch sub information stored in different storage units is different; the method comprises the following steps:

10. The CSA adder-based convolution operation method according to any of claims 6 to 9, wherein the circuit further includes a packet configuration unit and a path selection unit; the path selection unit comprises a plurality of path selectors, and the path selectors are respectively connected with the first addition unit, the second addition unit and the patch unit; the method comprises the following steps: