CN110598172A - Convolution operation method and circuit based on CSA adder - Google Patents

Convolution operation method and circuit based on CSA adder Download PDF

Info

Publication number
CN110598172A
CN110598172A CN201910779278.6A CN201910779278A CN110598172A CN 110598172 A CN110598172 A CN 110598172A CN 201910779278 A CN201910779278 A CN 201910779278A CN 110598172 A CN110598172 A CN 110598172A
Authority
CN
China
Prior art keywords
unit
patch
addition
multiplier
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910779278.6A
Other languages
Chinese (zh)
Other versions
CN110598172B (en
Inventor
廖裕民
张义群
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuzhou Rockchip Electronics Co Ltd
Original Assignee
Fuzhou Rockchip Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuzhou Rockchip Electronics Co Ltd filed Critical Fuzhou Rockchip Electronics Co Ltd
Priority to CN201910779278.6A priority Critical patent/CN110598172B/en
Publication of CN110598172A publication Critical patent/CN110598172A/en
Application granted granted Critical
Publication of CN110598172B publication Critical patent/CN110598172B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/15Correlation function computation including computation of convolution operations

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computational Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses a convolution operation method and a circuit based on a CSA adder, wherein the circuit comprises a multiplication operation unit, a first addition operation unit, a patch operation unit and a second addition operation unit; the multiplication operation unit comprises a plurality of multipliers; the method comprises the following steps: the multiplier acquires a first multiplier and a second multiplier to carry out multiplication operation to obtain a multiplication operation result; the first addition operation unit performs first addition operation on the multiplication operation result of each multiplier to obtain an addition intermediate value; the patch operation unit generates corresponding patch information according to the multiplication result; the second addition operation unit is used for executing a second addition operation on each addition intermediate value and each patch information to obtain a convolution operation result. According to the scheme, when the patch information is calculated firstly and then added for operation, the patch information is added back at one time, so that the circuit consumption and the algorithm difficulty are greatly reduced, and the efficiency of convolution operation can be effectively improved.

Description

Convolution operation method and circuit based on CSA adder
Technical Field
The invention relates to the field of chip circuits, in particular to a convolution operation method and circuit based on a CSA adder.
Background
With the rapid development of the artificial intelligence industry, the requirement of users on the speed of the neural network operation is higher and higher. group _ convolution is an important algorithm in a neural network, and a corresponding hardware acceleration circuit structure is not proposed for the algorithm in the prior art, and is still given to a CPU to complete convolution operation, specifically, each group convolution is independently operated according to a convolution task, so that the overall operation efficiency is very low.
Disclosure of Invention
Therefore, a technical scheme of convolution operation based on the CSA adder is required to be provided, so as to solve the problem that the existing neural network circuit is low in efficiency when convolution operation is performed.
In order to achieve the above object, the inventors provide a convolution operation circuit based on a CSA adder, the circuit including a multiplication operation unit, a first addition operation unit, a patch operation unit, and a second addition operation unit; the multiplication operation unit comprises a plurality of multipliers; each multiplier is connected with a first addition unit, and the first addition unit and the patch operation unit are also connected with a second addition unit respectively;
the multiplier is used for acquiring a first multiplier and a second multiplier to carry out multiplication operation to obtain a multiplication operation result;
the first addition operation unit is used for performing first addition operation on the multiplication operation result of each multiplier to obtain an addition intermediate value;
the patch operation unit is used for generating corresponding patch information according to the multiplication result;
and the second addition operation unit is used for executing second addition operation on each addition intermediate value and each patch information to obtain a convolution operation result.
Furthermore, the multiplier is a radix-4 multiplier, and the radix-4 multiplier comprises a data splitting unit, a first data cache unit, a low-order zero padding unit, a second data cache unit, a radix-4 coding unit, a coding cache unit, a coding table look-up operation unit and a displacement unit; the data splitting unit is connected with a first data cache unit, the first data cache unit is connected with a low-order zero padding unit, the low-order zero padding unit is connected with a second data cache unit, the second data cache unit is connected with a base 4 coding unit, the base 4 coding unit is connected with a coding cache unit, the coding cache unit is connected with a coding table look-up operation unit, and the coding table look-up operation unit is connected with the displacement unit;
the displacement unit is connected to the first addition unit.
Further, the circuit also comprises a negative number statistical unit;
the negative number counting unit is used for counting negative number result information in the multiplication result and sending the negative number result information to the patch operation unit; the negative result information comprises negative indication bit information of a calculation result corresponding to each split data calculated by the encoding table look-up operation unit, and the split data is obtained by splitting the second multiplier through the data splitting unit.
Furthermore, the patch operation unit comprises a plurality of patch storage units, a gating unit and a logic operation unit; each gating unit is correspondingly connected with one patch storage unit, and the gating unit is also connected with the logic operation unit; patch sub information is stored in the patch storage unit, and the patch sub information stored in different storage units is different;
the gating unit is used for selecting and sending the corresponding patch sub information to the logic operation unit according to the negative number indicating bit information;
the logic operation unit is used for carrying out logic OR operation on the received patch sub information to obtain the patch information.
Further, the circuit further comprises a grouping configuration unit and a path selection unit; the path selection unit comprises a plurality of path selectors, and the path selectors are respectively connected with the first addition unit, the second addition unit and the patch unit;
the grouping configuration unit is used for configuring grouping information of convolution operation and determining a path signal of a path selector corresponding to the grouping configuration information according to the grouping configuration information so that the second addition operation unit outputs a corresponding grouping convolution operation result.
The inventor also provides a convolution operation method based on the CSA adder, which is applied to a convolution operation circuit based on the CSA adder, wherein the circuit comprises a multiplication operation unit, a first addition operation unit, a patch operation unit and a second addition operation unit; the multiplication operation unit comprises a plurality of multipliers; each multiplier is connected with a first addition unit, and the first addition unit and the patch operation unit are also connected with a second addition unit respectively; the method comprises the following steps:
the multiplier acquires a first multiplier and a second multiplier to carry out multiplication operation to obtain a multiplication operation result;
the first addition operation unit performs first addition operation on the multiplication operation result of each multiplier to obtain an addition intermediate value;
the patch operation unit generates corresponding patch information according to the multiplication result;
the second addition operation unit is used for executing a second addition operation on each addition intermediate value and each patch information to obtain a convolution operation result.
Furthermore, the multiplier is a radix-4 multiplier, and the radix-4 multiplier comprises a data splitting unit, a first data cache unit, a low-order zero padding unit, a second data cache unit, a radix-4 coding unit, a coding cache unit, a coding table look-up operation unit and a displacement unit; the data splitting unit is connected with a first data cache unit, the first data cache unit is connected with a low-order zero padding unit, the low-order zero padding unit is connected with a second data cache unit, the second data cache unit is connected with a base 4 coding unit, the base 4 coding unit is connected with a coding cache unit, the coding cache unit is connected with a coding table look-up operation unit, and the coding table look-up operation unit is connected with the displacement unit;
the displacement unit is connected to the first addition unit.
Further, the circuit also comprises a negative number statistical unit; the method comprises the following steps:
the negative number counting unit counts negative number result information in the multiplication result and sends the negative number result information to the patch operation unit;
the negative result information comprises negative indication bit information of a calculation result corresponding to each split data calculated by the encoding table look-up operation unit, and the split data is obtained by splitting the second multiplier through the data splitting unit.
Furthermore, the patch operation unit comprises a plurality of patch storage units, a gating unit and a logic operation unit; each gating unit is correspondingly connected with one patch storage unit, and the gating unit is also connected with the logic operation unit; patch sub information is stored in the patch storage unit, and the patch sub information stored in different storage units is different; the method comprises the following steps:
the gating unit selects and sends the corresponding patch sub-information to the logic operation unit according to the negative number indicating bit information;
the logic operation unit performs logic OR operation on the received patch sub information to obtain patch information.
Further, the circuit further comprises a grouping configuration unit and a path selection unit; the path selection unit comprises a plurality of path selectors, and the path selectors are respectively connected with the first addition unit, the second addition unit and the patch unit; the method comprises the following steps:
the grouping configuration unit configures the grouping information of the convolution operation, and determines the path signal of the path selector corresponding to the grouping configuration information according to the grouping configuration information, so that the second addition operation unit outputs the corresponding grouping convolution operation result.
The convolution operation method and circuit based on CSA adder in the above technical solution, the circuit includes a multiplication operation unit, a first addition operation unit, a patch operation unit and a second addition operation unit; the multiplication operation unit comprises a plurality of multipliers; each multiplier is connected with a first addition unit, and the first addition unit and the patch operation unit are also connected with a second addition unit respectively; the method comprises the following steps: the multiplier acquires a first multiplier and a second multiplier to carry out multiplication operation to obtain a multiplication operation result; the first addition operation unit performs first addition operation on the multiplication operation result of each multiplier to obtain an addition intermediate value; the patch operation unit generates corresponding patch information according to the multiplication result; the second addition operation unit is used for executing a second addition operation on each addition intermediate value and each patch information to obtain a convolution operation result. According to the scheme, when the patch information is calculated firstly and then added for operation, the patch information is added back at one time, so that the circuit consumption and the algorithm difficulty are greatly reduced, and the efficiency of convolution operation can be effectively improved.
Drawings
FIG. 1 is a schematic diagram of a convolutional neural network according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a CSA adder according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a CSA adder according to another embodiment of the present invention;
FIG. 4 is a diagram illustrating a CSA adder based convolution operation according to an embodiment of the present invention;
FIG. 5 is a circuit diagram of a radix-4 multiplier according to an embodiment of the present invention;
FIG. 6 is a diagram illustrating a patch operation unit according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of a path selector according to an embodiment of the present invention;
FIG. 8 is a schematic diagram of a path selector according to another embodiment of the present invention;
FIG. 9 is a diagram of an encoding operation table according to another embodiment of the present invention;
FIG. 10 is a diagram of a circuit for CSA adder-based convolution operation according to an embodiment of the present invention;
FIG. 11 is a flowchart illustrating a method for CSA adder-based convolution according to an embodiment of the present invention;
description of reference numerals:
101. a multiplication unit;
102. a first addition operation unit;
103. a patch operation unit;
104. a second addition operation unit;
105. and a negative number counting unit.
Detailed Description
To explain technical contents, structural features, and objects and effects of the technical solutions in detail, the following detailed description is given with reference to the accompanying drawings in conjunction with the embodiments.
Fig. 1 is a schematic diagram of a convolutional neural network according to an embodiment of the present invention. As can be seen from fig. 1, the neural network includes a convolutional layer, a sampling layer and a fully-connected layer, and the convolutional layer and the activation layer are the most computationally intensive when the neural network performs computation. The multiplier is applied to neural network identification calculation, convolution calculation occupies the main part of the neural network in various calculation types of the neural network, the convolution calculation comprises multiplication operation and addition operation, and the multiplication and the addition operation need corresponding hardware circuit resources to be completed.
As shown in fig. 10, the present invention provides a convolution operation circuit based on a CSA adder, which includes a multiplication unit 101, a first addition unit 102, a patch operation unit 103, and a second addition unit 104; the multiplication unit 101 includes a plurality of multipliers; each multiplier is connected to first adding section 102, and first adding section 102 and patch calculating section 103 are connected to second adding section 104.
The first adding unit 102 is a CSA adder, and the second adding unit 104 is a normal adder. As shown in fig. 2, when the CSA adder adds two addends, two addition results are output, one is an exclusive or result of the two addends (i.e., CSA value), and the other is a result obtained by performing a logical and operation on the two addends and then shifting the two addends by 1 bit to the left (i.e., CPA value). When a CSA adder receives a plurality of addends, such as the CSA adder 10 in fig. 4, which receives the results from the CSA adder 00 and the CSA adder 01, since the CSA adder 00 outputs two values (CSA00 and CSP00) and the CSA adder 01 outputs two values (CSA01 and CSP01), the CSA adder 10 receives 4 values, and for the CSA addition of the 4 values, as shown in fig. 3, the CSA adder 01 first performs the CSA addition on the CSA00 and the CSP00, and since the CSP00 is shifted by 1 bit to the left compared with the CSA00, the CSA00 is subjected to high bit zero padding and the CSP 63csp 25 is subjected to low bit zero padding and then the two are subjected to the exclusive or, or the two CSA and the CSA are subjected to the and then to the left shift to obtain the two values after the CSA 4 and the CSA00 are subjected to the exclusive or, and then the two values are subjected to the exclusive or operation on the two values are subjected to the CSA01 respectively, or the two intermediate result values are subjected to exclusive or operation with the CSP01 respectively, or the two intermediate result values are subjected to AND operation and then shifted to the left by one bit, so that the final CSA10 value and the CSP10 value are obtained, namely the result output by the adder CSA 10. The CSA10 value is a result obtained by carrying out exclusive OR operation on four values of CSA00, CSP00, CSA01 and CSP01 in sequence; the CSP10 value is a result obtained by sequentially performing a first operation on four values of CSA00, CSP00, CSA01 and CSP01, wherein the first operation is as follows: the AND operation is shifted left by 1 bit.
In the process of addition operation, when a plurality of addends need to be added, the first two addends can be added first, and then the third is added, and the process is circulated. Of course, the addends can be paired in pairs and then added, and the obtained addends are paired in pairs, the time consumption of the algorithm is mainly that the carry needs to be accumulated from back to front when the summation result is obtained every time, and in the case of multiple addends, the carry of the current stage can be stored and used as the operation result to be summed at the next stage, so that the sum is not needed to be immediately summed. For example, if it is necessary to add the output results of the multipliers 0 to 7 in fig. 4, the output results of the multipliers may be added layer by layer in the order shown in fig. 4, and the output result CSA20 value and the CSP20 value of the CSA adder 20 may be output to the second addition unit (i.e., the normal adder connected to the CSA adder 20 in fig. 4) as the output result of the entire first addition unit to perform the second addition operation.
When the circuit carries out convolution operation, the multiplier is used for obtaining a first multiplier and a second multiplier to carry out multiplication operation so as to obtain a multiplication operation result; the first addition operation unit is used for performing first addition operation on the multiplication operation result of each multiplier to obtain an addition intermediate value; the patch operation unit is used for generating corresponding patch information according to the multiplication result; and the second addition operation unit is used for executing second addition operation on each addition intermediate value and each patch information to obtain a convolution operation result.
In this embodiment, the multiplier is a radix-4 multiplier. As shown in fig. 5, the radix-4 multiplier includes a data splitting unit, a first data buffer unit, a low-order zero padding unit, a second data buffer unit, a radix-4 encoding unit, an encoding buffer unit, an encoding table lookup operation unit, and a displacement unit; the data splitting unit is connected with a first data cache unit, the first data cache unit is connected with a low-order zero padding unit, the low-order zero padding unit is connected with a second data cache unit, the second data cache unit is connected with a base 4 coding unit, the base 4 coding unit is connected with a coding cache unit, the coding cache unit is connected with a coding table look-up operation unit, and the coding table look-up operation unit is connected with the displacement unit; the displacement unit is connected to the first addition unit.
The operation principle of the radix-4 multiplier is as follows: assume there is an 8-bit binary Multiplier (Multiplier): 0111_1110, before calculating multiplication, it can be divided into groups of three bits (adjacent high bit, home bit, adjacent low bit), overlapping one bit, and divided into four groups of split data: 01(1), 11(1), 10(0), and an auxiliary bit 0 is added at the end. By looking up the coding table as shown in fig. 9 (X in the table corresponds to the input data a in fig. 5, and Y corresponds to the input data b in fig. 5), it is possible to obtain: 10 (2X), 00 (0X), -10 (-2X), or as 1000_ 000-10. At this time, multiplying by 2 is a shift operation for binary. The multiplication operation can be converted into the operation only involving the addition, the subtraction and the shift operation by adopting the Booth algorithm, which greatly simplifies the multiplication operation and reduces the operation amount. For the negative intermediate result of the multiplication (i.e. the subtraction operation is required), such as-2Y, -Y in the table, the negative number in the binary is equivalent to taking the complement of the original number, i.e. the operation involved is inverting each bit of the original number and then adding 1.
Since the add-1 operation also involves an add operation between different indicator bits, if the add-1 operation is performed immediately after the negation operation, the original operation becomes more complicated, and therefore, in the present embodiment, the add-1 operation for each set of division data is completed by the patch operation unit, thereby simplifying the add operation.
Specifically, as shown in fig. 5, the circuit further includes a negative number counting unit 105, where the negative number counting unit 105 is configured to count negative number result information in the multiplication result and send the negative number result information to the patch operation unit; the negative result information comprises negative indication bit information of a calculation result corresponding to each split data calculated by the encoding table look-up operation unit, and the split data is obtained by splitting the second multiplier through the data splitting unit.
For example, the input data a in fig. 5 is 0111_1110, and as described above, four groups of data 01(1), 11(1), 10(0) can be obtained after passing through the splitting operation unit, the operation results obtained by multiplying the four groups of data with the input data b by table lookup can be 2b, 0, 0, -2b in sequence, and then the obtained 4 split data are subjected to addition operation, so that the multiplication result of the multiplier a and the multiplier b can be obtained. For result 2b, b is shifted left by one bit (the multiply-by-2 operation in binary is equivalent to shift left by one bit), for result-2 b, b is shifted left by one bit and then the complement is taken (i.e., the complement is taken first and then +1 is taken), and the add-1 operation is performed by the patch unit. When the base 4 multiplier performs multiplication of input data 0111_1110 and any multiplier b, b is shifted to the left by one bit to obtain a first addition parameter, b is shifted to the left by one bit and then is inverted to obtain a second addition parameter, and then the first addition parameter and the second addition parameter are added in a staggered manner according to the position relation of split data 01(1) and 10(0) in the original multiplier 0111_1110 to obtain an intermediate operation result of the multiplier 0111_1110 and any multiplier b (the final result also needs to perform +1 operation at a corresponding position on the basis of the intermediate operation result, in the application, the +1 operation is performed in a patch operation unit, and the multiplier is only responsible for calculating the intermediate operation result). When the multiplier a is other values, the calculation method is similar to that described above, and the description thereof is omitted.
Preferably, the offset addition of each addition parameter is performed by a CSA adder, and specifically, as shown in fig. 5, each shifting unit is connected to the CSA adder in fig. 5, and is configured to perform the offset addition on the addition parameter corresponding to each split data, so as to obtain a CSA value and a CSP value as the output result corresponding to the multiplier.
As shown in fig. 6, in some embodiments, the patch operation unit includes a plurality of patch storage units, a gating unit, and a logic operation unit; each gating unit is correspondingly connected with one patch storage unit, and the gating unit is also connected with the logic operation unit; patch sub information is stored in the patch storage unit, and the patch sub information stored in different storage units is different;
the gating unit is used for selecting and sending the corresponding patch sub information to the logic operation unit according to the negative number indicating bit information; the logic operation unit is used for carrying out logic OR operation on the received patch sub information to obtain the patch information.
The patch operation is that on the basis of the intermediate result of the operation performed by the multiplier, the number of 1's are added, and the number of 1's is determined by the number of negatives obtained by the operation of the split data (because only the negatives have the complement operation, i.e. the negation +1 operation), and meanwhile, because there is a bit misalignment between the split data (the bit difference between the split data between adjacent ones is 2), the number of bits where "1" is located when the 1's operation is performed on different split data is also different, specifically, the 4 cases are shown in fig. 6, that is, the 1 st bit, the 3 rd bit, the 5 th bit and the 7 th bit in the 8-bit multiplier. Also taking the input data a as 0111_1110 as an example, since the first 3 obtained split data 01(1), 11(1) and the corresponding intermediate multiplication results 2b, 0, 0 do not involve negative operations, for the gating units 2, 3, 4, the patch sub information transmitted to the logical operation unit (i.e. the logical or operation circuit in fig. 6) is 8 0 (i.e. 8' b0 in the drawing), and the split data 10(0) results in an intermediate multiplication result of-2 b, since the add 1 operation corresponding to the split data 10(0) is the least significant bit, the gating unit 1 transmits the patch sub information 0000_0001 stored in the patch storage unit to the logical or operation circuit, and the logical or operation circuit performs a logical or operation on all the received patch sub information, thereby outputting the final patch information. Because the bits of the 1 adding operation corresponding to each split data are staggered, the condition that two bits of the same bit of the patch sub information are both 1 does not exist, the adding operation between the patch sub information can be simplified into logical OR operation, and the operation efficiency is improved.
As shown in fig. 4, the intermediate values of the additions are added layer by the CSA adders at different levels, and two values (CSA 20 and CSP20 in fig. 4) are finally obtained and transmitted to the normal adder, and then the normal adder adds the CSA20 value and the CSP20 value to obtain an intermediate result of the CSA addition. Similarly, the adding operation of the patch information may also be performed layer by layer, the patch information obtained by the patch operation units 0 to 7 is sequentially subjected to addition operation by the primary ordinary adder, the secondary ordinary adder and the tertiary ordinary adder to obtain final patch information, and the final patch information is added to the intermediate result of the CSA addition calculated by the ordinary adder to obtain the convolution operation result. The invention processes the calculation of the intermediate result of the addition of the patch information and the CSA in parallel, and adds back the patch information once only when the addition calculation is finally carried out, thereby greatly reducing the circuit consumption and the algorithm difficulty and effectively improving the efficiency of the convolution operation.
In the practical application process, the neural network is often applied to the packet convolution operation in addition to the complete convolution operation. During the packet convolution operation, the number of output results is corresponding to the number of packets of the packet convolution, for example, the number of packets is 2, and then the number of final output numerical results is also two; the number of groups is 4, and the final output numerical result is 4.
In order to make the circuit structure of the present invention also applicable to the requirement of the packet convolution operation, as shown in fig. 6 and 7, the circuit further includes a packet configuration unit and a path selection unit; the path selection unit comprises a plurality of path selectors, and the path selectors are respectively connected with the first addition unit, the second addition unit and the patch unit; the grouping configuration unit is used for configuring grouping information of convolution operation and determining a path signal of a path selector corresponding to the grouping configuration information according to the grouping configuration information so that the second addition operation unit outputs a corresponding grouping convolution operation result.
For example, the current packet number is 4, and each of the multipliers 0 to 7 is a group, so that the calculation results of the CSA adder 00, the CSA adder 01, the CSA adder 02 and the CSA adder 03 are not transmitted to the CSA adder at the previous stage, but transmitted to the corresponding ordinary adder (i.e., the second addition operation unit), and the calculation of the patch operation unit is similar, after the patch 0 and the patch 1 are added by the ordinary adder at the first stage and selected by the path selector, the results obtained by the two are directly added to the result of the ordinary adder corresponding to the CSA adder 00, so that the convolution operation result of the packet 1 is obtained, and is not transmitted to the second-stage adder again; the operation results of patches 2 and 3, patches 4 and 5, patch 6 and patch 7 are obtained in the same way, and are not described herein again.
When the current packet number is 2, assuming that multipliers 0 to 3 are one group and multipliers 4 to 7 are one group, the addition result of the CSA adder 10 and the CSA adder 11 is not transmitted to the CSA adder 20 at the upper stage, but is directly transmitted to the respective corresponding general adders to add the CSA10 value and the CSP10 value, or add the CSA11 value and the CSP11 value. In the patch operation unit, the patch 10 is directly added with the value obtained by adding the CSA10 value and the CSP10 value to obtain the 1 st group convolution operation result; the patch 11 is directly added to the value obtained by adding the CSA11 value and the CSP11 value to obtain the 2 nd packet convolution operation result.
In short, the output data trend of each adder can be selected through each path selector, so that the grouping convolution function is realized, and the operation requirement of grouping convolution under different grouping conditions is met.
As shown in fig. 11, the inventor further provides a convolution operation method based on a CSA adder, which is applied to a convolution operation circuit based on a CSA adder, wherein the circuit comprises a multiplication operation unit, a first addition operation unit, a patch operation unit and a second addition operation unit; the multiplication operation unit comprises a plurality of multipliers; each multiplier is connected with a first addition unit, and the first addition unit and the patch operation unit are also connected with a second addition unit respectively; the method comprises the following steps:
firstly, step S1101 is carried out, wherein a multiplier acquires a first multiplier and a second multiplier to carry out multiplication operation, and a multiplication operation result is obtained;
then, step S1102 is executed to perform a first addition operation on the multiplication result of each multiplier by the first addition operation unit to obtain an addition intermediate value;
then, the operation goes to step S1103, and the patch operation unit generates corresponding patch information according to the multiplication result;
then, the process proceeds to step S1104, where a second addition unit is used to perform a second addition operation on each of the addition intermediate values and each of the patch information, so as to obtain a convolution operation result.
In certain embodiments, the circuit further comprises a negative statistics unit; the method comprises the following steps: the negative number counting unit counts negative number result information in the multiplication result and sends the negative number result information to the patch operation unit; the negative result information comprises negative indication bit information of a calculation result corresponding to each split data calculated by the encoding table look-up operation unit, and the split data is obtained by splitting the second multiplier through the data splitting unit.
In some embodiments, the patch operation unit includes a plurality of patch storage units, a gating unit, and a logic operation unit; each gating unit is correspondingly connected with one patch storage unit, and the gating unit is also connected with the logic operation unit; patch sub information is stored in the patch storage unit, and the patch sub information stored in different storage units is different; the method comprises the following steps:
the gating unit selects and sends the corresponding patch sub-information to the logic operation unit according to the negative number indicating bit information; the logic operation unit performs logic OR operation on the received patch sub information to obtain patch information.
In some embodiments, the circuit further comprises a grouping configuration unit and a path selection unit; the path selection unit comprises a plurality of path selectors, and the path selectors are respectively connected with the first addition unit, the second addition unit and the patch unit; the method comprises the following steps:
the grouping configuration unit configures the grouping information of the convolution operation, and determines the path signal of the path selector corresponding to the grouping configuration information according to the grouping configuration information, so that the second addition operation unit outputs the corresponding grouping convolution operation result.
The invention discloses a convolution operation method and a circuit based on a CSA adder, wherein the circuit comprises a multiplication operation unit, a first addition operation unit, a patch operation unit and a second addition operation unit; the multiplication operation unit comprises a plurality of multipliers; the method comprises the following steps: the multiplier acquires a first multiplier and a second multiplier to carry out multiplication operation to obtain a multiplication operation result; the first addition operation unit performs first addition operation on the multiplication operation result of each multiplier to obtain an addition intermediate value; the patch operation unit generates corresponding patch information according to the multiplication result; the second addition operation unit is used for executing a second addition operation on each addition intermediate value and each patch information to obtain a convolution operation result. According to the scheme, when the patch information is calculated firstly and then added for operation, the patch information is added back at one time, so that the circuit consumption and the algorithm difficulty are greatly reduced, and the efficiency of convolution operation can be effectively improved.
It should be noted that, although the above embodiments have been described herein, the invention is not limited thereto. Therefore, based on the innovative concepts of the present invention, the technical solutions of the present invention can be directly or indirectly applied to other related technical fields by making changes and modifications to the embodiments described herein, or by using equivalent structures or equivalent processes performed in the content of the present specification and the attached drawings, which are included in the scope of the present invention.

Claims (10)

1. A convolution operation circuit based on a CSA adder is characterized by comprising a multiplication operation unit, a first addition operation unit, a patch operation unit and a second addition operation unit; the multiplication operation unit comprises a plurality of multipliers; each multiplier is connected with a first addition unit, and the first addition unit and the patch operation unit are also connected with a second addition unit respectively;
the multiplier is used for acquiring a first multiplier and a second multiplier to carry out multiplication operation to obtain a multiplication operation result;
the first addition operation unit is used for performing first addition operation on the multiplication operation result of each multiplier to obtain an addition intermediate value;
the patch operation unit is used for generating corresponding patch information according to the multiplication result;
and the second addition operation unit is used for executing second addition operation on each addition intermediate value and each patch information to obtain a convolution operation result.
2. The CSA adder-based convolution operation circuit of claim 1, wherein the multiplier is a radix-4 multiplier, the radix-4 multiplier comprising a data splitting unit, a first data buffer unit, a low-order zero padding unit, a second data buffer unit, a radix-4 coding unit, a coding buffer unit, a coding table look-up operation unit, and a displacement unit; the data splitting unit is connected with a first data cache unit, the first data cache unit is connected with a low-order zero padding unit, the low-order zero padding unit is connected with a second data cache unit, the second data cache unit is connected with a base 4 coding unit, the base 4 coding unit is connected with a coding cache unit, the coding cache unit is connected with a coding table look-up operation unit, and the coding table look-up operation unit is connected with the displacement unit;
the displacement unit is connected to the first addition unit.
3. The CSA adder-based convolution operation circuit of claim 2 further comprising a negative statistics unit;
the negative number counting unit is used for counting negative number result information in the multiplication result and sending the negative number result information to the patch operation unit; the negative result information comprises negative indication bit information of a calculation result corresponding to each split data calculated by the encoding table look-up operation unit, and the split data is obtained by splitting the second multiplier through the data splitting unit.
4. The CSA adder-based convolution operation circuit of claim 3, wherein the patch operation unit includes a plurality of patch storage units, a gating unit, and a logical operation unit; each gating unit is correspondingly connected with one patch storage unit, and the gating unit is also connected with the logic operation unit; patch sub information is stored in the patch storage unit, and the patch sub information stored in different storage units is different;
the gating unit is used for selecting and sending the corresponding patch sub information to the logic operation unit according to the negative number indicating bit information;
the logic operation unit is used for carrying out logic OR operation on the received patch sub information to obtain the patch information.
5. The CSA adder-based convolution operation circuit of any of claims 1 to 4, further comprising a packet configuration unit and a path selection unit; the path selection unit comprises a plurality of path selectors, and the path selectors are respectively connected with the first addition unit, the second addition unit and the patch unit;
the grouping configuration unit is used for configuring grouping information of convolution operation and determining a path signal of a path selector corresponding to the grouping configuration information according to the grouping configuration information so that the second addition operation unit outputs a corresponding grouping convolution operation result.
6. The convolution operation method based on the CSA adder is characterized by being applied to a convolution operation circuit based on the CSA adder, wherein the circuit comprises a multiplication operation unit, a first addition operation unit, a patch operation unit and a second addition operation unit; the multiplication operation unit comprises a plurality of multipliers; each multiplier is connected with a first addition unit, and the first addition unit and the patch operation unit are also connected with a second addition unit respectively; the method comprises the following steps:
the multiplier acquires a first multiplier and a second multiplier to carry out multiplication operation to obtain a multiplication operation result;
the first addition operation unit performs first addition operation on the multiplication operation result of each multiplier to obtain an addition intermediate value;
the patch operation unit generates corresponding patch information according to the multiplication result;
the second addition operation unit is used for executing a second addition operation on each addition intermediate value and each patch information to obtain a convolution operation result.
7. The CSA adder-based convolution operation method of claim 6, wherein the multiplier is a radix-4 multiplier, the radix-4 multiplier comprises a data splitting unit, a first data buffer unit, a low-order zero padding unit, a second data buffer unit, a radix-4 coding unit, a coding buffer unit, a coding table look-up operation unit and a displacement unit; the data splitting unit is connected with a first data cache unit, the first data cache unit is connected with a low-order zero padding unit, the low-order zero padding unit is connected with a second data cache unit, the second data cache unit is connected with a base 4 coding unit, the base 4 coding unit is connected with a coding cache unit, the coding cache unit is connected with a coding table look-up operation unit, and the coding table look-up operation unit is connected with the displacement unit;
the displacement unit is connected to the first addition unit.
8. The CSA adder-based convolution operation method of claim 7 wherein the circuit further comprises a negative statistics unit; the method comprises the following steps:
the negative number counting unit counts negative number result information in the multiplication result and sends the negative number result information to the patch operation unit;
the negative result information comprises negative indication bit information of a calculation result corresponding to each split data calculated by the encoding table look-up operation unit, and the split data is obtained by splitting the second multiplier through the data splitting unit.
9. The CSA adder-based convolution operation of claim 8 wherein the patch operation unit includes a plurality of patch storage units, a gating unit, and a logical operation unit; each gating unit is correspondingly connected with one patch storage unit, and the gating unit is also connected with the logic operation unit; patch sub information is stored in the patch storage unit, and the patch sub information stored in different storage units is different; the method comprises the following steps:
the gating unit selects and sends the corresponding patch sub-information to the logic operation unit according to the negative number indicating bit information;
the logic operation unit performs logic OR operation on the received patch sub information to obtain patch information.
10. The CSA adder-based convolution operation method according to any of claims 6 to 9, wherein the circuit further includes a packet configuration unit and a path selection unit; the path selection unit comprises a plurality of path selectors, and the path selectors are respectively connected with the first addition unit, the second addition unit and the patch unit; the method comprises the following steps:
the grouping configuration unit configures the grouping information of the convolution operation, and determines the path signal of the path selector corresponding to the grouping configuration information according to the grouping configuration information, so that the second addition operation unit outputs the corresponding grouping convolution operation result.
CN201910779278.6A 2019-08-22 2019-08-22 Convolution operation method and circuit based on CSA adder Active CN110598172B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910779278.6A CN110598172B (en) 2019-08-22 2019-08-22 Convolution operation method and circuit based on CSA adder

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910779278.6A CN110598172B (en) 2019-08-22 2019-08-22 Convolution operation method and circuit based on CSA adder

Publications (2)

Publication Number Publication Date
CN110598172A true CN110598172A (en) 2019-12-20
CN110598172B CN110598172B (en) 2022-10-25

Family

ID=68855138

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910779278.6A Active CN110598172B (en) 2019-08-22 2019-08-22 Convolution operation method and circuit based on CSA adder

Country Status (1)

Country Link
CN (1) CN110598172B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108805791A (en) * 2017-04-28 2018-11-13 英特尔公司 The calculation optimization of low precise machines learning manipulation
CN108804077A (en) * 2017-04-28 2018-11-13 英特尔公司 For executing instruction and the logic of floating-point and integer operation for machine learning
KR20190005043A (en) * 2017-07-05 2019-01-15 울산과학기술원 SIMD MAC unit with improved computation speed, Method for operation thereof, and Apparatus for Convolutional Neural Networks accelerator using the SIMD MAC array
CN109388373A (en) * 2018-10-12 2019-02-26 胡振波 Multiplier-divider for low-power consumption kernel
CN109993272A (en) * 2017-12-29 2019-07-09 北京中科寒武纪科技有限公司 Convolution and down-sampled arithmetic element, neural network computing unit and field programmable gate array IC

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108805791A (en) * 2017-04-28 2018-11-13 英特尔公司 The calculation optimization of low precise machines learning manipulation
CN108804077A (en) * 2017-04-28 2018-11-13 英特尔公司 For executing instruction and the logic of floating-point and integer operation for machine learning
KR20190005043A (en) * 2017-07-05 2019-01-15 울산과학기술원 SIMD MAC unit with improved computation speed, Method for operation thereof, and Apparatus for Convolutional Neural Networks accelerator using the SIMD MAC array
CN109993272A (en) * 2017-12-29 2019-07-09 北京中科寒武纪科技有限公司 Convolution and down-sampled arithmetic element, neural network computing unit and field programmable gate array IC
CN109388373A (en) * 2018-10-12 2019-02-26 胡振波 Multiplier-divider for low-power consumption kernel

Also Published As

Publication number Publication date
CN110598172B (en) 2022-10-25

Similar Documents

Publication Publication Date Title
US10877733B2 (en) Segment divider, segment division operation method, and electronic device
KR100591761B1 (en) Montgomery Modular Multiplication Method Using Montgomery Modular Multiplier and Carry Store Addition
US6704762B1 (en) Multiplier and arithmetic unit for calculating sum of product
JP2004326112A (en) Multiple modulus selector, accumulator, montgomery multiplier, method of generating multiple modulus, method of producing partial product, accumulating method, method of performing montgomery multiplication, modulus selector, and booth recorder
CN111936965A (en) Random rounding logic
US20040267853A1 (en) Method and apparatus for implementing power of two floating point estimation
US6847986B2 (en) Divider
CN110598172B (en) Convolution operation method and circuit based on CSA adder
US4754422A (en) Dividing apparatus
CN112162724A (en) Quantum division operation method and device with precision
CN112214200A (en) Quantum subtraction operation method and device, electronic device and storage medium
US4761757A (en) Carry-save-adder three binary dividing apparatus
WO2007083377A1 (en) Parity generation circuit, counter and counting method
CN115270155A (en) Method for obtaining maximum common divisor of big number expansion and hardware architecture
US11494165B2 (en) Arithmetic circuit for performing product-sum arithmetic
CN110275693B (en) Multi-addend adder circuit for random calculation
KR0146065B1 (en) Absolute value calculating circuit
CN220208247U (en) Division operation circuit
JP4317738B2 (en) Average value calculating apparatus and average value calculating method
KR100858559B1 (en) Method for adding and multipying redundant binary and Apparatus for adding and multipying redundant binary
CN110890895A (en) Method for performing polar decoding by means of representation transformation and associated polar decoder
US20210365239A1 (en) Logarithm calculation method and logarithm calculation circuit
CN111931441B (en) Method, device and medium for establishing FPGA fast carry chain time sequence model
KR20110068801A (en) Modulo n calculation method and apparatus thereof
KR950015180B1 (en) High speed adder

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 350003 building 18, No.89, software Avenue, Gulou District, Fuzhou City, Fujian Province

Applicant after: Ruixin Microelectronics Co.,Ltd.

Address before: 350003 building 18, No.89, software Avenue, Gulou District, Fuzhou City, Fujian Province

Applicant before: FUZHOU ROCKCHIP ELECTRONICS Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant