CN110598172B - Convolution operation method and circuit based on CSA adder - Google Patents

Convolution operation method and circuit based on CSA adder Download PDF

Info

Publication number
CN110598172B
CN110598172B CN201910779278.6A CN201910779278A CN110598172B CN 110598172 B CN110598172 B CN 110598172B CN 201910779278 A CN201910779278 A CN 201910779278A CN 110598172 B CN110598172 B CN 110598172B
Authority
CN
China
Prior art keywords
unit
patch
addition
operation unit
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910779278.6A
Other languages
Chinese (zh)
Other versions
CN110598172A (en
Inventor
廖裕民
张义群
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Rockchip Electronics Co Ltd
Original Assignee
Rockchip Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Rockchip Electronics Co Ltd filed Critical Rockchip Electronics Co Ltd
Priority to CN201910779278.6A priority Critical patent/CN110598172B/en
Publication of CN110598172A publication Critical patent/CN110598172A/en
Application granted granted Critical
Publication of CN110598172B publication Critical patent/CN110598172B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/15Correlation function computation including computation of convolution operations

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computational Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses a convolution operation method and a circuit based on a CSA adder, wherein the circuit comprises a multiplication operation unit, a first addition operation unit, a patch operation unit and a second addition operation unit; the multiplication operation unit comprises a plurality of multipliers; the method comprises the following steps: the multiplier obtains a first multiplier and a second multiplier to carry out multiplication operation to obtain a multiplication operation result; the first addition operation unit performs first addition operation on the multiplication operation result of each multiplier to obtain an addition intermediate value; the patch operation unit generates corresponding patch information according to the multiplication result; the second addition operation unit is used for executing a second addition operation on each addition intermediate value and each patch information to obtain a convolution operation result. According to the scheme, when the patch information is calculated firstly and then added for operation, the patch information is added back at one time, so that the circuit consumption and the algorithm difficulty are greatly reduced, and the efficiency of convolution operation can be effectively improved.

Description

Convolution operation method and circuit based on CSA adder
Technical Field
The invention relates to the field of chip circuits, in particular to a convolution operation method and circuit based on a CSA adder.
Background
With the rapid development of the artificial intelligence industry, the requirement of users on the speed of the neural network operation is higher and higher. group _ convolution is an important algorithm in a neural network, and a corresponding hardware acceleration circuit structure is not provided for the algorithm in the prior art, and is still given to a CPU to complete convolution operation when the convolution operation is performed, specifically, each group convolution is performed independently according to a convolution task, so that the overall operation efficiency is very low.
Disclosure of Invention
Therefore, a technical scheme of convolution operation based on the CSA adder is required to be provided, so as to solve the problem that the existing neural network circuit is low in efficiency when convolution operation is performed.
In order to achieve the above object, the inventors provide a convolution operation circuit based on a CSA adder, the circuit including a multiplication operation unit, a first addition operation unit, a patch operation unit, and a second addition operation unit; the multiplication operation unit comprises a plurality of multipliers; each multiplier is connected with a first addition unit, and the first addition unit and the patch operation unit are also connected with a second addition unit respectively;
the multiplier is used for acquiring a first multiplier and a second multiplier to carry out multiplication operation to obtain a multiplication operation result;
the first addition operation unit is used for performing first addition operation on the multiplication operation result of each multiplier to obtain an addition intermediate value;
the patch operation unit is used for generating corresponding patch information according to the multiplication result;
and the second addition operation unit is used for executing second addition operation on each addition intermediate value and each patch information to obtain a convolution operation result.
Furthermore, the multiplier is a radix-4 multiplier, and the radix-4 multiplier comprises a data splitting unit, a first data cache unit, a low-order zero padding unit, a second data cache unit, a radix-4 coding unit, a coding cache unit, a coding table look-up operation unit and a displacement unit; the data splitting unit is connected with a first data cache unit, the first data cache unit is connected with a low-order zero padding unit, the low-order zero padding unit is connected with a second data cache unit, the second data cache unit is connected with a base 4 coding unit, the base 4 coding unit is connected with a coding cache unit, the coding cache unit is connected with a coding table look-up operation unit, and the coding table look-up operation unit is connected with the displacement unit;
the displacement unit is connected to the first addition unit.
Further, the circuit also comprises a negative number statistical unit;
the negative number counting unit is used for counting negative number result information in a multiplication result and sending the negative number result information to the patch operation unit; the negative result information comprises negative indication bit information of a calculation result corresponding to each split data calculated by the encoding table look-up operation unit, and the split data is obtained by splitting the second multiplier through the data splitting unit.
Furthermore, the patch operation unit comprises a plurality of patch storage units, a gating unit and a logic operation unit; each gating unit is correspondingly connected with one patch storage unit, and the gating unit is also connected with the logic operation unit; patch sub information is stored in the patch storage unit, and the patch sub information stored in different storage units is different;
the gating unit is used for selecting and sending the corresponding patch sub-information to the logic operation unit according to the negative number indicating bit information;
the logic operation unit is used for carrying out logic OR operation on the received patch sub information to obtain the patch information.
Further, the circuit further comprises a grouping configuration unit and a path selection unit; the path selection unit comprises a plurality of path selectors, and the path selectors are respectively connected with the first addition unit, the second addition unit and the patch operation unit;
the grouping configuration unit is used for configuring grouping information of convolution operation and determining a path signal of a path selector corresponding to the grouping information according to the grouping information so that the second addition operation unit outputs a corresponding grouping convolution operation result.
The inventor also provides a convolution operation method based on the CSA adder, which is applied to a convolution operation circuit based on the CSA adder, wherein the circuit comprises a multiplication operation unit, a first addition operation unit, a patch operation unit and a second addition operation unit; the multiplication operation unit comprises a plurality of multipliers; each multiplier is connected with a first addition unit, and the first addition unit and the patch operation unit are also connected with a second addition unit respectively; the method comprises the following steps:
the multiplier acquires a first multiplier and a second multiplier to carry out multiplication operation to obtain a multiplication operation result;
the first addition operation unit performs first addition operation on the multiplication operation result of each multiplier to obtain an addition intermediate value;
the patch operation unit generates corresponding patch information according to the multiplication result;
the second addition operation unit is used for executing a second addition operation on each addition intermediate value and each patch information to obtain a convolution operation result.
Furthermore, the multiplier is a radix-4 multiplier, and the radix-4 multiplier comprises a data splitting unit, a first data cache unit, a low-order zero padding unit, a second data cache unit, a radix-4 coding unit, a coding cache unit, a coding table look-up operation unit and a displacement unit; the data splitting unit is connected with a first data cache unit, the first data cache unit is connected with a low-order zero padding unit, the low-order zero padding unit is connected with a second data cache unit, the second data cache unit is connected with a base 4 coding unit, the base 4 coding unit is connected with a coding cache unit, the coding cache unit is connected with a coding table look-up operation unit, and the coding table look-up operation unit is connected with the displacement unit;
the displacement unit is connected to the first addition unit.
Further, the circuit also comprises a negative number statistical unit; the method comprises the following steps:
the negative number counting unit counts negative number result information in the multiplication result and sends the negative number result information to the patch operation unit;
the negative result information comprises negative indication bit information of a calculation result corresponding to each split data calculated by the encoding table look-up operation unit, and the split data is obtained by splitting the second multiplier through the data splitting unit.
Furthermore, the patch operation unit comprises a plurality of patch storage units, a gating unit and a logic operation unit; each gating unit is correspondingly connected with one patch storage unit, and the gating unit is also connected with the logic operation unit; patch sub information is stored in the patch storage unit, and the patch sub information stored in different storage units is different; the method comprises the following steps:
the gating unit selects and sends the corresponding patch sub-information to the logic operation unit according to the negative number indicating bit information;
the logic operation unit performs logic OR operation on the received patch sub information to obtain patch information.
Further, the circuit further comprises a grouping configuration unit and a path selection unit; the path selection unit comprises a plurality of path selectors, and the path selectors are respectively connected with the first addition unit, the second addition unit and the patch operation unit; the method comprises the following steps:
the grouping configuration unit configures the grouping information of the convolution operation, and determines the path signal of the path selector corresponding to the grouping information according to the grouping information, so that the second addition operation unit outputs the corresponding grouping convolution operation result.
The convolution operation method and circuit based on CSA adder in the above technical solution, the circuit includes a multiplication operation unit, a first addition operation unit, a patch operation unit and a second addition operation unit; the multiplication operation unit comprises a plurality of multipliers; each multiplier is connected with a first addition unit, and the first addition unit and the patch operation unit are also respectively connected with a second addition unit; the method comprises the following steps: the multiplier acquires a first multiplier and a second multiplier to carry out multiplication operation to obtain a multiplication operation result; the first addition operation unit performs first addition operation on the multiplication operation result of each multiplier to obtain an addition intermediate value; the patch operation unit generates corresponding patch information according to the multiplication result; the second addition operation unit is used for executing a second addition operation on each addition intermediate value and each patch information to obtain a convolution operation result. According to the scheme, when the patch information is calculated firstly and then added for operation, the patch information is added back at one time, so that the circuit consumption and the algorithm difficulty are greatly reduced, and the efficiency of convolution operation can be effectively improved.
Drawings
FIG. 1 is a schematic diagram of a convolutional neural network according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a CSA adder according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a CSA adder according to another embodiment of the present invention;
FIG. 4 is a diagram illustrating a convolution operation based on a CSA adder according to an embodiment of the present invention;
FIG. 5 is a circuit diagram of a radix-4 multiplier according to an embodiment of the present invention;
FIG. 6 is a diagram illustrating a patch operation unit according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of a path selector according to an embodiment of the present invention;
FIG. 8 is a schematic diagram of a path selector according to another embodiment of the present invention;
FIG. 9 is a diagram of an encoding operation table according to another embodiment of the present invention;
FIG. 10 is a diagram illustrating a circuit for performing a CSA adder-based convolution operation according to an embodiment of the present invention;
FIG. 11 is a flowchart illustrating a method for CSA adder-based convolution according to an embodiment of the present invention;
description of the reference numerals:
101. a multiplication unit;
102. a first addition operation unit;
103. a patch operation unit;
104. a second addition operation unit;
105. and a negative number counting unit.
Detailed Description
To explain technical contents, structural features, and objects and effects of the technical solutions in detail, the following detailed description is given with reference to the accompanying drawings in conjunction with the embodiments.
Fig. 1 is a schematic diagram of a convolutional neural network according to an embodiment of the present invention. As can be seen from fig. 1, the neural network includes a convolutional layer, a sampling layer and a fully-connected layer, and the convolutional layer and the activation layer are the most computationally intensive when the neural network performs computation. The multiplier is applied to neural network identification calculation, convolution calculation occupies the main part of the neural network in various calculation types of the neural network, the convolution calculation comprises multiplication operation and addition operation, and the multiplication and the addition operation need corresponding hardware circuit resources to be completed.
As shown in fig. 10, the present invention provides a convolution operation circuit based on a CSA adder, which includes a multiplication unit 101, a first addition unit 102, a patch operation unit 103, and a second addition unit 104; the multiplication unit 101 includes a plurality of multipliers; each multiplier is connected to first adding section 102, and first adding section 102 and patch calculating section 103 are connected to second adding section 104.
The first addition unit 102 is a CSA adder, and the second addition unit 104 is a normal adder. As shown in fig. 2, when the CSA adder adds two addends, two addition results are output, one is an exclusive or result of the two addends (i.e., CSA value), and the other is a result obtained by performing a logical and operation on the two addends and then shifting the two addends by 1 bit to the left (i.e., CPA value). When one CSA adder receives a plurality of addends, as in the CSA adder 10 in fig. 4, the result of the operations from the CSA adder 00 and the CSA adder 01 is received, since the CSA adder 00 outputs two values (CSA 00 and CSP 00) and the CSA adder 01 outputs two values (CSA 01 and CSP 01), the CSA adder 10 receives 4 values, and the CSA addition operation for these 4 values, as shown in fig. 3, specifically, the CSA addition operation is performed on the CSA00 and the CSP00 first, since the CSP00 is shifted by 1 bit to the left compared with the CSA00, the CSA00 is subjected to high-order zero padding and the CSP00 is subjected to low-order zero padding, then, the two are subjected to XOR operation, or the two are subjected to AND operation and then shifted to the left by one bit to respectively obtain two numerical results after CSA00 and CSP00 are subjected to CSA addition operation, then the two obtained numerical results are subjected to XOR operation with CSA01 respectively, or the two obtained numerical results are subjected to AND operation and then shifted to the left by one bit to obtain two intermediate numerical values, and finally the two obtained intermediate numerical values are subjected to XOR operation with CSP01 respectively, or the two obtained intermediate numerical values are subjected to AND operation and then shifted to the left by one bit to obtain the final CSA10 value and CSP10 value, namely the result output by the adder CSA 10. The CSA10 value is a result obtained by carrying out XOR operation on the four values of CSA00, CSP00, CSA01 and CSP01 in sequence; the CSP10 value is a result obtained by sequentially performing a first operation on four values of CSA00, CSP00, CSA01, and CSP01, where the first operation is: the AND operation is shifted left by 1 bit.
In the process of addition operation, when a plurality of addends need to be added, the first two addends can be added first, and then the third is added, and the process is circulated. Of course, the addends can be paired in pairs and then added, and the obtained addends are paired in pairs, the time consumption of the algorithm is mainly that the carry needs to be accumulated from back to front when the summation result is obtained every time, and in the case of multiple addends, the carry of the current stage can be stored and used as the operation result to be summed at the next stage, so that the sum is not needed to be immediately summed. For example, if it is necessary to add the output results of the multipliers 0 to 7 in fig. 4, the output results of the multipliers may be added layer by layer in the order shown in fig. 4, and the output result CSA20 value and the CSP20 value of the CSA adder 20 may be output to the second addition unit (i.e., the normal adder connected to the CSA adder 20 in fig. 4) as the output result of the entire first addition unit to perform the second addition operation.
When the circuit carries out convolution operation, the multiplier is used for obtaining a first multiplier and a second multiplier to carry out multiplication operation so as to obtain a multiplication operation result; the first addition operation unit is used for performing first addition operation on the multiplication operation result of each multiplier to obtain an addition intermediate value; the patch operation unit is used for generating corresponding patch information according to the multiplication result; and the second addition operation unit is used for executing second addition operation on each addition intermediate value and each patch information to obtain a convolution operation result.
In this embodiment, the multiplier is a radix-4 multiplier. As shown in fig. 5, the radix-4 multiplier includes a data splitting unit, a first data buffer unit, a low-order zero padding unit, a second data buffer unit, a radix-4 encoding unit, an encoding buffer unit, an encoding table lookup operation unit, and a displacement unit; the data splitting unit is connected with a first data cache unit, the first data cache unit is connected with a low-order zero padding unit, the low-order zero padding unit is connected with a second data cache unit, the second data cache unit is connected with a base 4 coding unit, the base 4 coding unit is connected with a coding cache unit, the coding cache unit is connected with a coding table look-up operation unit, and the coding table look-up operation unit is connected with the displacement unit; the displacement unit is connected to the first addition unit.
The operation principle of the radix-4 multiplier is as follows: assume there is an 8-bit binary Multiplier (Multiplier): 0111 \u1110, before calculating multiplication, it can be divided into one group of three bits (adjacent high bit, home bit, adjacent low bit), overlapping one bit, and totally divided into four groups of split data: 01 (1), 11 (1), 11 (1), 10 (0), with an auxiliary bit of 0 at the end. By looking up the coding table as shown in fig. 9 (X in the table corresponds to the input data a in fig. 5, and Y corresponds to the input data b in fig. 5), it is possible to obtain: 10 (2X), 00 (0X), -10 (-2X), or 1000 u 000-10. At this time, multiplying by 2 is a shift operation for binary. The multiplication operation can be converted into the operation only involving the addition, the subtraction and the shift operation by adopting the Booth algorithm, which greatly simplifies the multiplication operation and reduces the operation amount. For the negative intermediate result of the multiplication (i.e. the subtraction operation is required), such as-2Y, -Y in the table, the negative number in the binary is equivalent to taking the complement of the original number, i.e. the operation involved is inverting each bit of the original number and then adding 1.
Since the add-1 operation also involves an add operation between different indicator bits, if the add-1 operation is performed immediately after the negation operation, the original operation becomes more complicated, and therefore, in the present embodiment, the add-1 operation for each set of division data is completed by the patch operation unit, thereby simplifying the add operation.
Specifically, as shown in fig. 5, the circuit further includes a negative number counting unit 105, where the negative number counting unit 105 is configured to count negative number result information in the multiplication result and send the negative number result information to the patch operation unit; the negative result information comprises negative indication bit information of a calculation result corresponding to each split data calculated by the encoding table look-up operation unit, and the split data is obtained by splitting the second multiplier through the data splitting unit.
For example, the input data a in fig. 5 is 0111 \u1110, as described above, four groups of 01 (1), 11 (1), 11 (1), 10 (0) data can be obtained after passing through the splitting operation unit, the operation results obtained by multiplying the four groups of data with the input data b through table lookup can be 2b,0, -2b in sequence, and then the obtained 4 split data are subjected to addition operation, so that the multiplication results of the multiplier a and the multiplier b can be obtained. For result 2b, b is shifted left by one bit (the multiply-by-2 operation in binary is equivalent to a shift left by one bit), for result-2 b, b is shifted left by one bit and then the complement is taken (i.e., the complement is taken first and then + 1), and the add-1 operation is performed by the patch unit. When the base 4 multiplier performs multiplication of input data 0111 _1110and any multiplier b, b is only required to be shifted to the left by one bit to obtain a first addition parameter, b is shifted to the left by one bit and then is inverted to obtain a second addition parameter, and then the first addition parameter and the second addition parameter are added in a staggered manner according to the position relation of split data 01 (1) and 10 (0) in the original multiplier 0111 _1110to obtain an intermediate operation result of the multiplier 0111 _1110and any multiplier b (the final result also needs to be subjected to +1 operation at a corresponding position on the basis of the intermediate operation result, in the application, the +1 operation is performed in a patch operation unit, and the multiplier is only responsible for calculating the intermediate operation result). When the multiplier a is other values, the calculation method is similar to that described above, and the description thereof is omitted.
Preferably, the offset addition of each addition parameter is performed by a CSA adder, and specifically, as shown in fig. 5, each shifting unit is connected to the CSA adder in fig. 5, and is configured to perform the offset addition on the addition parameter corresponding to each split data, so as to obtain a CSA value and a CSP value as the output result corresponding to the multiplier.
As shown in fig. 6, in some embodiments, the patch operation unit includes a plurality of patch storage units, a gating unit, and a logic operation unit; each gating unit is correspondingly connected with one patch storage unit and is also connected with the logic operation unit; patch sub information is stored in the patch storage unit, and the patch sub information stored in different storage units is different;
the gating unit is used for selecting and sending the corresponding patch sub information to the logic operation unit according to the negative number indicating bit information; the logic operation unit is used for carrying out logic OR operation on the received patch sub information to obtain the patch information.
The so-called patch operation is that on the basis of the intermediate result of the operation performed by the multiplier, the number of 1's needs to be added, and the number of 1's addition is determined by the number of negatives obtained by the operation performed on the split data (because only negatives have complement operations, that is, negation +1 operations), and meanwhile, because there is a bit misalignment between the split data (the bit difference between adjacent split data is 2), the bits of "1" are different when the split data is subjected to the 1's addition, which is specifically divided into 4 cases shown in fig. 6, that is, the 1 st bit, the 3 rd bit, the 5 th bit and the 7 th bit in the 8-bit multiplier. Similarly, taking the input data a as 0111 \u1110 as an example, since none of the intermediate multiplication results 2b,0 corresponding to the first 3 pieces of split data 01 (1), 11 (1), 11 (1) obtained involves a negative number operation, the patch sub information transmitted to the logic operation unit (i.e., the logical or operation circuit in fig. 6) is 8 pieces of 0 (i.e., 8' b0 in the figure) for the gate units 2, 3, 4, and the intermediate multiplication result obtained by the split data 10 (0) is-2 b, since the add 1 operation corresponding to the split data 10 (0) is the lowest bit, the gate unit 1 transmits the patch sub information 0000 \u0001 stored in the patch storage unit to the logical or operation circuit, and the logical or operation circuit performs a logical or operation on all the received patch sub information, thereby outputting the final patch information. Because the bits of the 1 adding operation corresponding to each split data are staggered, the condition that two bits of the same bit of the patch sub information are both 1 does not exist, the adding operation between the patch sub information can be simplified into logical OR operation, and the operation efficiency is improved.
As shown in fig. 4, the intermediate values of each addition are added layer by the CSA adders in different levels, and finally two values (CSA 20 and CSP20 in fig. 4) are obtained and transmitted to the normal adder, and then the normal adder adds the CSA20 value and the CSP20 value to obtain an intermediate result of the CSA addition. Similarly, the adding operation of the patch information may also be performed layer by layer, the patch information obtained by the patch operation units 0 to 7 is sequentially subjected to addition operation by the primary ordinary adder, the secondary ordinary adder and the tertiary ordinary adder to obtain final patch information, and the final patch information is added to the intermediate result of the CSA addition calculated by the ordinary adder to obtain the convolution operation result. The invention processes the calculation of the intermediate result of the addition of the patch information and the CSA in parallel, and adds back the patch information once only when the addition calculation is finally carried out, thereby greatly reducing the circuit consumption and the algorithm difficulty and effectively improving the efficiency of the convolution operation.
In the practical application process, the neural network is often applied to the packet convolution operation in addition to the complete convolution operation. During the packet convolution operation, the number of output results is corresponding to the packet number of the packet convolution, for example, the packet number is 2, and then the final output numerical value results are also two; the number of groups is 4, and the final output numerical result is 4.
In order to make the circuit structure of the present invention also applicable to the requirement of the packet convolution operation, as shown in fig. 6 and 7, the circuit further includes a packet configuration unit and a path selection unit; the path selection unit comprises a plurality of path selectors, and the path selectors are respectively connected with the first addition unit, the second addition unit and the patch operation unit; the grouping configuration unit is used for configuring grouping information of convolution operation and determining a path signal of a path selector corresponding to the grouping information according to the grouping information so that the second addition operation unit outputs a corresponding grouping convolution operation result.
For example, the current packet number is 4, and the multipliers 0 to 7 are grouped in pairs, so that the calculation results of the CSA adder 00, the CSA adder 01, the CSA adder 02 and the CSA adder 03 are not transmitted to the CSA adder at the previous stage, but transmitted to the respective corresponding ordinary adders (i.e., the second addition operation unit), and the calculation of the patch operation unit is the same, after the patch 0 and the patch 1 are added by the ordinary adder at the first stage and selected by the path selector, the results obtained by the two are directly added to the result of the ordinary adder corresponding to the CSA adder 00, so that the convolution operation result of the packet 1 is obtained, and is not transmitted to the second-stage adder; the operation results of patches 2 and 3, patches 4 and 5, patch 6 and patch 7 are obtained in the same way, and are not described herein again.
When the current packet number is 2, assuming that multipliers 0 to 3 are one group and multipliers 4 to 7 are one group, the addition result of the CSA adder 10 and the CSA adder 11 is not transmitted to the CSA adder 20 at the upper stage, but is directly transmitted to the corresponding general adder to add the CSA10 value and the CSP10 value, or add the CSA11 value and the CSP11 value. In the patch operation unit, the patch 10 is directly added with the value obtained by adding the CSA10 value and the CSP10 value to obtain a1 st group convolution operation result; the patch 11 will be added to the value obtained by adding the CSA11 value and CSP11 value to obtain the 2 nd packet convolution operation result.
In short, the output data trend of each adder can be selected through each path selector, so that the grouping convolution function is realized, and the operation requirement of grouping convolution under different grouping conditions is met.
As shown in fig. 11, the inventor further provides a convolution operation method based on a CSA adder, which is applied to a convolution operation circuit based on a CSA adder, wherein the circuit comprises a multiplication operation unit, a first addition operation unit, a patch operation unit and a second addition operation unit; the multiplication operation unit comprises a plurality of multipliers; each multiplier is connected with a first addition unit, and the first addition unit and the patch operation unit are also connected with a second addition unit respectively; the method comprises the following steps:
firstly, step S1101 is carried out, wherein a multiplier acquires a first multiplier and a second multiplier to carry out multiplication operation, and a multiplication operation result is obtained;
then, step S1102 is executed to perform a first addition operation on the multiplication result of each multiplier by the first addition operation unit to obtain an addition intermediate value;
then, the step S1103 patch operation unit generates corresponding patch information according to the multiplication result;
then, the process proceeds to step S1104, where a second addition unit is used to perform a second addition operation on each of the addition intermediate values and each of the patch information, so as to obtain a convolution operation result.
In certain embodiments, the circuit further comprises a negative statistics unit; the method comprises the following steps: the negative number counting unit counts negative number result information in the multiplication operation result and sends the negative number result information to the patch operation unit; the negative result information comprises negative indication bit information of a calculation result corresponding to each split data calculated by the encoding table look-up operation unit, and the split data is obtained by splitting the second multiplier through the data splitting unit.
In some embodiments, the patch operation unit includes a plurality of patch storage units, a gating unit, and a logic operation unit; each gating unit is correspondingly connected with one patch storage unit, and the gating unit is also connected with the logic operation unit; patch sub information is stored in the patch storage unit, and the patch sub information stored in different storage units is different; the method comprises the following steps:
the gating unit selects and sends the corresponding patch sub-information to the logic operation unit according to the negative number indicating bit information; the logic operation unit performs logic OR operation on the received patch sub information to obtain patch information.
In some embodiments, the circuit further comprises a grouping configuration unit and a path selection unit; the path selection unit comprises a plurality of path selectors, and the path selectors are respectively connected with the first addition unit, the second addition unit and the patch operation unit; the method comprises the following steps:
the grouping configuration unit configures the grouping information of the convolution operation, and determines the path signal of the path selector corresponding to the grouping information according to the grouping information, so that the second addition operation unit outputs the corresponding grouping convolution operation result.
The invention discloses a convolution operation method and a circuit based on a CSA adder, wherein the circuit comprises a multiplication operation unit, a first addition operation unit, a patch operation unit and a second addition operation unit; the multiplication operation unit comprises a plurality of multipliers; the method comprises the following steps: the multiplier acquires a first multiplier and a second multiplier to carry out multiplication operation to obtain a multiplication operation result; the first addition operation unit performs first addition operation on the multiplication operation result of each multiplier to obtain an addition intermediate value; the patch operation unit generates corresponding patch information according to the multiplication result; the second addition operation unit is used for executing a second addition operation on each addition intermediate value and each patch information to obtain a convolution operation result. According to the scheme, when the patch information is calculated firstly and then added for operation, the patch information is added back at one time, so that the circuit consumption and the algorithm difficulty are greatly reduced, and the efficiency of convolution operation can be effectively improved.
It should be noted that, although the above embodiments have been described herein, the scope of the present invention is not limited thereby. Therefore, based on the innovative concepts of the present invention, the technical solutions of the present invention can be directly or indirectly applied to other related technical fields by changing and modifying the embodiments described herein or by using the equivalent structures or equivalent processes of the content of the present specification and the attached drawings, and are included in the scope of the present invention.

Claims (6)

1. A convolution operation circuit based on a CSA adder is characterized by comprising a multiplication operation unit, a first addition operation unit, a patch operation unit and a second addition operation unit; the multiplication operation unit comprises a plurality of multipliers; each multiplier is connected with a first addition unit, and the first addition unit and the patch operation unit are also connected with a second addition unit respectively;
the multiplier is used for acquiring a first multiplier and a second multiplier to carry out multiplication operation to obtain a multiplication operation result;
the first addition operation unit is used for performing first addition operation on the multiplication operation result of each multiplier to obtain an addition intermediate value;
the patch operation unit is used for generating corresponding patch information according to the multiplication result;
the second addition operation unit is used for executing second addition operation on each addition intermediate value and each patch information to obtain a convolution operation result;
the circuit further comprises a negative number statistics unit;
the negative number counting unit is used for counting negative number result information in the multiplication result and sending the negative number result information to the patch operation unit; the negative result information comprises negative indication bit information of a calculation result corresponding to each split data calculated by the encoding table look-up operation unit, and the split data is obtained by splitting the second multiplier;
the patch operation unit comprises a plurality of patch storage units, a gating unit and a logic operation unit; each gating unit is correspondingly connected with one patch storage unit and is also connected with the logic operation unit; patch sub information is stored in the patch storage unit, and the patch sub information stored in different storage units is different;
the gating unit is used for selecting and sending the corresponding patch sub information to the logic operation unit according to the negative number indicating bit information;
the logic operation unit is used for carrying out logic OR operation on the received patch sub information to obtain the patch information.
2. The CSA adder-based convolution operation circuit of claim 1, wherein the multiplier is a radix-4 multiplier, the radix-4 multiplier comprising a data splitting unit, a first data buffer unit, a low-order zero padding unit, a second data buffer unit, a radix-4 coding unit, a coding buffer unit, a coding table look-up operation unit, and a displacement unit; the data splitting unit is connected with a first data cache unit, the first data cache unit is connected with a low-order zero padding unit, the low-order zero padding unit is connected with a second data cache unit, the second data cache unit is connected with a base 4 coding unit, the base 4 coding unit is connected with a coding cache unit, the coding cache unit is connected with a coding table look-up operation unit, and the coding table look-up operation unit is connected with the displacement unit;
the displacement unit is connected to the first addition unit.
3. The CSA adder-based convolution operation circuit of claim 1 or 2, further comprising a packet configuration unit and a path selection unit; the path selection unit comprises a plurality of path selectors, and the path selectors are respectively connected with the first addition unit, the second addition unit and the patch operation unit;
the grouping configuration unit is used for configuring grouping information of convolution operation and determining a path signal of a path selector corresponding to the grouping information according to the grouping information so that the second addition operation unit outputs a corresponding grouping convolution operation result.
4. The convolution operation method based on the CSA adder is characterized by being applied to a convolution operation circuit based on the CSA adder, wherein the circuit comprises a multiplication operation unit, a first addition operation unit, a patch operation unit and a second addition operation unit; the multiplication operation unit comprises a plurality of multipliers; each multiplier is connected with a first addition unit, and the first addition unit and the patch operation unit are also connected with a second addition unit respectively; the method comprises the following steps:
the multiplier acquires a first multiplier and a second multiplier to carry out multiplication operation to obtain a multiplication operation result;
the first addition operation unit performs first addition operation on the multiplication operation result of each multiplier to obtain an addition intermediate value;
the patch operation unit generates corresponding patch information according to the multiplication result;
the second addition operation unit is used for executing second addition operation on each addition intermediate value and each patch information to obtain a convolution operation result;
the circuit further comprises a negative number statistics unit; the method further comprises the following steps:
the negative number counting unit counts negative number result information in the multiplication result and sends the negative number result information to the patch operation unit;
the negative result information comprises negative indication bit information of a calculation result corresponding to each split data calculated by the encoding table look-up operation unit, and the split data is obtained by splitting the second multiplier;
the patch operation unit comprises a plurality of patch storage units, a gating unit and a logic operation unit; each gating unit is correspondingly connected with one patch storage unit, and the gating unit is also connected with the logic operation unit; the patch storage unit stores patch sub-information, and the patch sub-information stored in different storage units is different; the method further comprises the following steps:
the gating unit selects and sends the corresponding patch sub-information to the logic operation unit according to the negative number indicating bit information;
the logic operation unit performs logic OR operation on the received patch sub information to obtain patch information.
5. The CSA adder-based convolution operation method of claim 4, wherein the multiplier is a radix-4 multiplier, the radix-4 multiplier comprises a data splitting unit, a first data buffer unit, a low-order zero padding unit, a second data buffer unit, a radix-4 coding unit, a coding buffer unit, a coding table look-up operation unit and a displacement unit; the data splitting unit is connected with a first data cache unit, the first data cache unit is connected with a low-order zero padding unit, the low-order zero padding unit is connected with a second data cache unit, the second data cache unit is connected with a base 4 coding unit, the base 4 coding unit is connected with a coding cache unit, the coding cache unit is connected with a coding table look-up operation unit, and the coding table look-up operation unit is connected with the displacement unit;
the displacement unit is connected to the first addition unit.
6. The CSA adder-based convolution operation method of claim 4 or 5, wherein the circuit further comprises a packet configuration unit and a path selection unit; the path selection unit comprises a plurality of path selectors, and the path selectors are respectively connected with the first addition unit, the second addition unit and the patch operation unit; the method comprises the following steps:
the grouping configuration unit configures the grouping information of the convolution operation, and determines the path signal of the path selector corresponding to the grouping information according to the grouping information, so that the second addition operation unit outputs the corresponding grouping convolution operation result.
CN201910779278.6A 2019-08-22 2019-08-22 Convolution operation method and circuit based on CSA adder Active CN110598172B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910779278.6A CN110598172B (en) 2019-08-22 2019-08-22 Convolution operation method and circuit based on CSA adder

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910779278.6A CN110598172B (en) 2019-08-22 2019-08-22 Convolution operation method and circuit based on CSA adder

Publications (2)

Publication Number Publication Date
CN110598172A CN110598172A (en) 2019-12-20
CN110598172B true CN110598172B (en) 2022-10-25

Family

ID=68855138

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910779278.6A Active CN110598172B (en) 2019-08-22 2019-08-22 Convolution operation method and circuit based on CSA adder

Country Status (1)

Country Link
CN (1) CN110598172B (en)

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10474458B2 (en) * 2017-04-28 2019-11-12 Intel Corporation Instructions and logic to perform floating-point and integer operations for machine learning
US10726514B2 (en) * 2017-04-28 2020-07-28 Intel Corporation Compute optimizations for low precision machine learning operations
KR101981109B1 (en) * 2017-07-05 2019-05-22 울산과학기술원 SIMD MAC unit with improved computation speed, Method for operation thereof, and Apparatus for Convolutional Neural Networks accelerator using the SIMD MAC array
CN109993272B (en) * 2017-12-29 2019-12-06 北京中科寒武纪科技有限公司 convolution and down-sampling operation unit, neural network operation unit and field programmable gate array integrated circuit
CN109388373B (en) * 2018-10-12 2023-03-14 芯来科技(武汉)有限公司 Multiplier-divider for low power consumption kernel

Also Published As

Publication number Publication date
CN110598172A (en) 2019-12-20

Similar Documents

Publication Publication Date Title
US10877733B2 (en) Segment divider, segment division operation method, and electronic device
KR100591761B1 (en) Montgomery Modular Multiplication Method Using Montgomery Modular Multiplier and Carry Store Addition
US6704762B1 (en) Multiplier and arithmetic unit for calculating sum of product
CN101140511A (en) Cascaded carry binary adder
CN111936965A (en) Random rounding logic
CN110109646A (en) Data processing method, device and adder and multiplier and storage medium
US20040267853A1 (en) Method and apparatus for implementing power of two floating point estimation
US6847986B2 (en) Divider
CN110598172B (en) Convolution operation method and circuit based on CSA adder
CN112162724A (en) Quantum division operation method and device with precision
CN112214200A (en) Quantum subtraction operation method and device, electronic device and storage medium
WO2007083377A1 (en) Parity generation circuit, counter and counting method
US11494165B2 (en) Arithmetic circuit for performing product-sum arithmetic
CN110275693B (en) Multi-addend adder circuit for random calculation
KR0146065B1 (en) Absolute value calculating circuit
CN110705196A (en) Error-free adder based on random calculation
CN110890895A (en) Method for performing polar decoding by means of representation transformation and associated polar decoder
KR100858559B1 (en) Method for adding and multipying redundant binary and Apparatus for adding and multipying redundant binary
CN220208247U (en) Division operation circuit
US20210365239A1 (en) Logarithm calculation method and logarithm calculation circuit
CN111931441B (en) Method, device and medium for establishing FPGA fast carry chain time sequence model
KR20110068801A (en) Modulo n calculation method and apparatus thereof
KR950015180B1 (en) High speed adder
CN117077742A (en) Method for optimizing control power consumption aiming at neural network
US20130262549A1 (en) Arithmetic circuit and arithmetic method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 350003 building 18, No.89, software Avenue, Gulou District, Fuzhou City, Fujian Province

Applicant after: Ruixin Microelectronics Co.,Ltd.

Address before: 350003 building 18, No.89, software Avenue, Gulou District, Fuzhou City, Fujian Province

Applicant before: FUZHOU ROCKCHIP ELECTRONICS Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant