CN115600062A - Convolution processing method, circuit, electronic device and computer readable storage medium - Google Patents
Convolution processing method, circuit, electronic device and computer readable storage medium Download PDFInfo
- Publication number
- CN115600062A CN115600062A CN202211600016.7A CN202211600016A CN115600062A CN 115600062 A CN115600062 A CN 115600062A CN 202211600016 A CN202211600016 A CN 202211600016A CN 115600062 A CN115600062 A CN 115600062A
- Authority
- CN
- China
- Prior art keywords
- matrix
- transformation
- sub
- feature
- feature matrix
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/15—Correlation function computation including computation of convolution operations
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Computational Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computing Systems (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Complex Calculations (AREA)
Abstract
The application relates to a convolution processing method, a convolution processing circuit, an electronic device and a computer readable storage medium. The method comprises the following steps: partitioning the point-multiplied feature matrix obtained in the convolution processing process to obtain a plurality of sub-feature matrices; when post-conversion processing is carried out on each sub-feature matrix, the addition and subtraction unit is controlled to convert the sub-feature matrices according to addition and subtraction conversion rules corresponding to the sub-feature matrices to obtain a first conversion matrix; determining the displacement corresponding to the sub-feature matrix, and controlling a shifter in a shifter group to shift and transform the first transformation matrix according to the displacement to obtain a second transformation matrix; and the accumulator is controlled to accumulate the second transformation matrixes corresponding to the sub-feature matrixes to obtain a post-transformation feature matrix generated by convolution processing. By adopting the method, the post-conversion processing of the dot multiplication characteristic matrix is realized only by controlling the addition and subtraction unit, the shifter and the accumulator, and multiplication resources can be saved.
Description
Technical Field
The present disclosure relates to the field of neural network technologies, and in particular, to a convolution processing method, a convolution processing circuit, an electronic device, and a computer-readable storage medium.
Background
With the development of artificial intelligence technology, convolutional neural networks have emerged. The convolutional neural network has great advantages and potential in the aspect of target identification, and therefore, the convolutional neural network is widely applied to the fields of target detection, error detection, automatic driving and the like. When the convolutional neural network is used for target identification, convolutional operation is performed, and a large number of multiplication operations are involved in the process of convolutional operation, so that the rate of convolutional operation is low. In order to increase the convolution operation rate, a convolution acceleration algorithm, such as a Winograd algorithm (a convolution acceleration implementation method based on a polynomial interpolation algorithm), is used for convolution processing.
The convolution acceleration algorithm operates by two inputs to the convolution operation: after the neurons and the weight are segmented in a certain scale, linear transformation, namely pre-transformation, is respectively carried out, then the transformed neurons and the weight are subjected to point multiplication, the point multiplication result is subjected to linear transformation again, namely post-transformation, and finally a convolution result equivalent to the original convolution operation is obtained.
However, when performing post-transformation in convolution processing, the conventional convolution acceleration algorithm directly inputs the whole dot product result to be multiplied by the post-transformation parameter matrix, which results in a large amount of multiplication resources being consumed when calculating the post-transformation.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a convolution processing method, a convolution processing circuit, an electronic device, and a computer-readable storage medium, which can achieve the effect of saving multiplication resources.
In a first aspect, the present application provides a convolution processing method, including:
partitioning the point-multiplied feature matrix obtained in the convolution processing process to obtain a plurality of sub-feature matrices; the point multiplication characteristic matrix is obtained by performing point multiplication on the pre-transformation characteristic matrix by using a convolution acceleration algorithm;
when post-conversion processing is carried out on each sub-feature matrix, the addition and subtraction unit is controlled to convert the sub-feature matrices according to addition and subtraction conversion rules corresponding to the sub-feature matrices to obtain a first conversion matrix;
determining the displacement corresponding to the sub-feature matrix, and controlling shifters in the shifter group to shift and transform the first transformation matrix according to the displacement to obtain a second transformation matrix; each shifter corresponds to the element position of the post-transformation feature matrix to be generated; the displacement is determined according to the arrangement position characteristics of the post-conversion parameters in the post-conversion parameter matrix;
the accumulator is controlled to accumulate the second transformation matrixes corresponding to the sub-feature matrixes to obtain a post-transformation feature matrix generated by convolution processing; wherein the accumulator corresponds to the element position of the post-transform feature matrix.
In a second aspect, the present application further provides a conversion control circuit, including:
the addition and subtraction unit is used for transforming the sub-feature matrixes according to the addition and subtraction transformation rules corresponding to the sub-feature matrixes aiming at each sub-feature matrix in the plurality of sub-feature matrixes to obtain a first transformation matrix; the plurality of sub-feature matrices are obtained by partitioning the point-by-point feature matrix obtained in the convolution processing process; the point multiplication characteristic matrix is obtained by performing point multiplication on the pre-transformation characteristic matrix by using a convolution acceleration algorithm;
the shifter is used for determining the displacement amount corresponding to the sub-feature matrix, and performing shift transformation on the first transformation matrix according to the displacement amount to obtain a second transformation matrix; each shifter corresponds to the element position of the post-transformation feature matrix to be generated; the displacement is determined according to the arrangement position characteristics of the post-conversion parameters in the post-conversion parameter matrix;
the accumulator is used for accumulating the second transformation matrixes corresponding to the sub-feature matrixes to obtain a post-transformation feature matrix generated by convolution processing; wherein the accumulator corresponds to the element position of the post-transform feature matrix.
In a third aspect, the present application further provides an electronic device, which includes the above-mentioned conversion control circuit.
In a fourth aspect, the present application further provides an electronic device, where the electronic device includes a memory and a processor, the memory stores a computer program, and the processor implements the steps of the convolution processing method when executing the computer program.
In a fifth aspect, the present application further provides a computer-readable storage medium having a computer program stored thereon, which, when being executed by a processor, realizes the steps of the convolution processing method described above.
In a sixth aspect, the present application further provides a computer program product comprising a computer program which, when executed by a processor, implements the steps of the convolution processing method described above.
The convolution processing method, the circuit, the electronic equipment and the computer readable storage medium divide a dot multiplication feature matrix obtained by dot multiplication of a pre-conversion feature matrix by using a convolution acceleration algorithm into a plurality of sub-feature matrices, control an addition and subtraction unit to convert the sub-feature matrices according to an addition and subtraction rule corresponding to each sub-feature matrix to obtain a first conversion matrix, control a shifter in a shifter group to shift and convert the first conversion matrix according to a displacement corresponding to the sub-feature matrix to obtain a second conversion matrix, and finally accumulate the second conversion matrix corresponding to each sub-feature matrix by controlling an accumulator to obtain a post-conversion feature matrix generated by convolution processing. Compared with the traditional post-conversion processing method, the post-conversion processing on the dot-multiplication characteristic matrix is realized by only controlling the addition and subtraction unit, the shifter and the accumulator, so that the multiplication resources are saved.
Drawings
Fig. 1 is a schematic flowchart of a convolution processing method according to an embodiment of the present application;
fig. 2 is a schematic diagram of a division point-by-point feature matrix according to an embodiment of the present disclosure;
fig. 3 is a schematic diagram of a convolution processing operation unit according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a conversion control circuit according to an embodiment of the present disclosure;
FIG. 5 is a schematic diagram of a convolution processing pipeline according to an embodiment of the present application;
fig. 6 is an internal structural diagram of a first electronic device according to an embodiment of the present application;
fig. 7 is an internal structural diagram of a second electronic device according to an embodiment of the present application;
fig. 8 is an internal structural diagram of a computer-readable storage medium according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
When the convolutional neural network is used for target identification, convolutional operation is carried out; the convolution operation is slow because a large number of multiplication operations are involved in the convolution operation. In order to improve the speed of convolution operation, a convolution acceleration algorithm, such as a Winograd algorithm, is adopted, the operand in the convolution operation is subjected to linear transformation, a transformation method with the minimum required multiplication number is further found, and the required multiplication operation is replaced by adding partial addition operation. In terms of hardware, compared with an adder, the multiplier is more complex in structure, larger in area power consumption and poorer in comprehensive processing performance, so that a Winograd algorithm for replacing multiplication with addition has great advantages in processing two-dimensional convolution operation. For Winograd functions of the form F (n × n, k × k) (n is an even number, and k is an odd number), the Winograd algorithm performs convolution operations according to the following calculation formula:
S=A T [(GgG T )⨀(B T dB)]A
wherein, ggG T And B T dB are all pre-transformation feature matrices, and GgG T And B T The point multiplication of dB obtains the point multiplication characteristic matrix.
Y=(GgG T )⨀(B T dB) is the operation result of the dot multiplication part, namely, the dot multiplication characteristic matrix, and Y is a matrix with the size of (n + k-1) x (n + k-1);
s is afterThe result of the transformation, i.e. the post-transformation feature matrix, and S = a T YA, S is a matrix of n × n size;
a is a post-transform parameter matrix, A being a matrix of (n + k-1) x n dimensions. It can be understood that A T Is the transpose of A, provided that A or A is determined T The post-transformation parameter matrix is determined.
In some embodiments, as shown in fig. 1, a convolution processing method is provided, which is described by taking an example of applying the method to an electronic device, and includes the following steps:
It can be understood that in order to simplify the post-transformation process of the point-by-point feature matrix, the point-by-point feature matrix is divided into a plurality of sub-feature matrices, which can be regarded as one number, and the formal matrix operation can be performed as long as the shape requirement is satisfied. The convolution acceleration algorithm may be a Winograd algorithm (a convolution acceleration implementation method based on a polynomial interpolation algorithm).
And 102, when post-conversion processing is carried out on each sub-feature matrix, controlling an addition and subtraction unit to convert the sub-feature matrices according to addition and subtraction conversion rules corresponding to the sub-feature matrices to obtain first conversion matrices.
The addition and subtraction transformation rule is a rule for controlling the input signal of the addition and subtraction unit. That is, the input signal of the add-subtract unit is controlled by the add-subtract transformation rule corresponding to the sub-feature matrix, so as to perform corresponding post-transformation processing on the sub-feature matrix.
The first change matrix is obtained by inputting the sub-feature matrix into the addition and subtraction unit for transformation.
Exemplarily, the electronic device determines an addition-subtraction transformation rule (linear transformation matrix) corresponding to each sub-feature matrix according to the position information of each sub-feature matrix in the dot-product feature matrix, and then controls the addition-subtraction unit to transform the sub-feature matrix according to the addition-subtraction transformation rule to obtain a first transformation matrix.
In some embodiments, the polynomial sub-formula chosen is x ± 2 according to the principles of the Winograd algorithm u In the process, winograd post-transformation in the form of F (n × n, k × k) (n is an even number, and k is an odd number) is decomposed into combined accumulation of 4 matrix operation processes, namely, the addition and subtraction transformation rules comprise left-right transformation, left transformation, right transformation and no transformation. Therefore, the addition-subtraction transformation rule corresponding to the sub-feature matrix can be determined according to the position information of the sub-feature matrix in the dot-product feature matrix.
In some embodiments, the input sub-feature matrix is Y (i,j) Y 'is the first transformation matrix output after the post-transformation processing by the addition-subtraction transformation unit' (i,j) 。
Wherein, left-right conversion meansIt can be seen that the essence of the left-right transformation is that the sub-feature matrices are respectively left-multiplied and right-multiplied。
Left conversion meansIt can be seen that the essence of the left transformation is the left multiplication of the sub-feature matrix。
The right transformation meansThe essence of the right transformation is that the sub-feature matrix is multiplied right。
Not to transform meansIt can be seen that the essence of the invariant is the output sub-feature momentsThe matrix is equal to the input sub-feature matrix.
Wherein, inputting Y (i,j) And output Y' (i,j) All of the matrices are of size 2x2, M is a displacement reference matrix in the post-transformation parameter matrix, is。
103, determining the displacement corresponding to the sub-feature matrix, and controlling a shifter in the shifter group to shift and transform the first transformation matrix according to the displacement to obtain a second transformation matrix; each shifter corresponds to the element position of the post-transformation feature matrix to be generated; the displacement is determined according to the arrangement position characteristics of the post-conversion parameters in the post-conversion parameter matrix.
The shift amount is the amount of data logically shifted by the shifter in one clock cycle. For example, a binary number of 11 is represented as 0001011, and a left shift by 2 results in a number of 0101100 (44 decimal numbers), i.e., a logical two-bit left shift of 11 is expanded by a factor of 4.
The second transformation matrix is obtained by inputting the first transformation matrix into the shifters in the shifter group and transforming the first transformation matrix.
Exemplarily, the electronic device determines a displacement amount corresponding to each element in the sub-feature matrix according to a corresponding relationship between the sub-feature matrix and the arrangement position characteristic determination of the post-transformation parameter in the post-transformation parameter matrix, and controls a shifter in the shifter group to shift and transform the first transformation matrix according to the displacement amount, so as to obtain the second transformation matrix.
Exemplarily, the electronic device inputs the sub-feature matrix into the addition and subtraction unit and the shifter to obtain a second transformation matrix, and accumulates the second transformation matrix, where the matrix obtained after accumulation is a post-transformation feature matrix generated by convolution processing.
In the convolution processing method, a dot-multiplied feature matrix obtained by dot-multiplying a pre-transformed feature matrix by using a convolution acceleration algorithm is divided into a plurality of sub-feature matrices, an addition and subtraction unit is controlled to transform the sub-feature matrices according to an addition and subtraction transformation rule corresponding to each sub-feature matrix to obtain a first transformation matrix, a shifter in a shifter group is controlled to shift and transform the first transformation matrix according to a displacement amount corresponding to the sub-feature matrix to obtain a second transformation matrix, and finally the second transformation matrix corresponding to each sub-feature matrix is accumulated by controlling an accumulator to obtain a post-transformed feature matrix generated by convolution processing. Compared with the traditional post-conversion processing method, the post-conversion processing of the point multiplication characteristic matrix is realized by only controlling the addition and subtraction unit, the shifter and the accumulator, so that multiplication resources are saved.
In some embodiments, blocking the point-by-point feature matrix obtained in the convolution processing to obtain a plurality of sub-feature matrices includes:
partitioning the edge position data in the point-by-point feature matrix obtained in the convolution processing process according to the size of a preset sub-matrix; the edge position data is feature data located at an edge position in the dot-by-dot feature matrix; and according to the size of the preset sub-matrix, sequentially partitioning the feature data except the edge position data in the point-multiplied feature matrix to obtain a plurality of sub-feature matrices.
The edge position data is data around the dot-by-feature matrix.
In some embodiments, the dot-multiplied feature matrix Y is
The data Y of the first row in the feature matrix is dot-multiplied 00 -Y 05 Data Y of the last row 50 -Y 55 Data Y of the first column 00 -Y 50 And data Y of the last column 05 -Y 55 Is called edge location data.
In some embodiments, the predetermined sub-matrix is a 2x2 matrix. Both the edge position data and the non-edge position data are divided into a 2x2 sub-feature matrix.
In the above embodiment, the point-by-point feature matrix is divided into a plurality of sub-feature matrices, which is convenient to improve the processing speed of convolution processing.
In some embodiments, blocking edge position data in the point-by-point feature matrix obtained in the convolution processing according to a preset sub-matrix size includes:
dividing the angular point data at the angular point position in the point-by-point characteristic matrix obtained in the convolution processing process into the same sub-characteristic matrix; the size of a sub-feature matrix formed by the angular point data accords with the size of a preset sub-matrix;
merging the head and tail residual row data and the head and tail residual column data in the dot product feature matrix, and respectively partitioning the merged head and tail residual row data and the merged head and tail residual column data to obtain a sub-feature matrix which accords with the size of a preset sub-matrix;
the head and tail residual line data are characteristic data in the head and tail line data except corner data in the head and tail lines; the head-to-tail remaining column data is feature data of the head-to-tail column data except for corner data at the head-to-tail column.
The angular point positions are positions of four corners in the point-by-feature matrix. For example, the points of the above embodiment are multiplied by Y in the feature matrix Y 00 、Y 05 、Y 50 And Y 55 Referred to as corner data, the location at which the corner data is located is referred to as the corner location.
In some embodiments, as shown in fig. 2, a method for dividing a dot-by-dot feature matrix Y (hereinafter referred to as Y) to obtain sub-feature matrices is provided. For a Winograd function of the form F (n x n, k x k) (n is an even number and k is an odd number), the dot-product matrix Y is split into (n + k-1) 2/4 2x2 sub-matrices. Marking 12 multiplied by 2 submatrix formed by 4 data of corner point positions in Y as Y (0, 0); respectively removing head and tail data (marked as Yup and Ydown) from the head line data and the tail line data of Y, merging head and tail residual line data, and marking the merged head and tail residual line data as Y (0, j) (j is not equal to 0) in (n + k-3)/2 submatrices which are obtained by dividing the merged head and tail residual line data according to the size of 2 multiplied by 2 from left to right, namely according to a dotted line square lattice of 2 multiplied by 2 in the figure 2; respectively removing head and tail data (marked as YLFT and YRIGHT) from head column data and tail column data of Y, merging head and tail residual column data, dividing the merged head and tail residual column data into (n + k-3)/2 sub-matrixes marked as Y (i, 0) (i is not equal to 0) according to a 2X2 dotted line square in the figure 2 from top to bottom; the remaining data (denoted as Ymid) of Y from which the head row data, the tail row data, the head column data, and the tail column data are removed is denoted as Y (i, j) (i ≠ 0, j ≠ 0) in 2 × 2, that is, (n + k-3) 2/4 submatrices obtained by dividing the Y into 2 × 2 dotted line squares in fig. 2. It is understood that the 2 × 2 dotted line squares in fig. 2 are only used to illustrate how the sub-feature matrix Y is obtained by dividing the dot-multiplied feature matrix Y according to a 2 × 2 predetermined sub-matrix size.
In some embodiments, the dot-multiplied feature matrix Y is divided according to the method in the above embodiments, and the obtained sub-feature matrices are as follows:
in some embodiments, controlling the addition and subtraction unit to transform the sub-feature matrix according to an addition and subtraction transformation rule corresponding to the sub-feature matrix to obtain a first transformation matrix, includes:
and under the condition that the addition and subtraction transformation rule corresponding to the sub-feature matrix represents that no transformation is performed, controlling the addition and subtraction unit not to transform the sub-feature matrix by controlling both the left transformation enabling signal and the right transformation enabling signal to be invalid, and controlling the data selector not to adjust the data sequence of the sub-feature matrix to obtain a first transformation matrix.
Illustratively, when the input sub-feature matrix is Y (0, 0), the addition and subtraction transformation rule corresponding to the sub-feature matrix characterizes no transformation, and the electronic device controls both the left transformation enable signal and the right transformation enable signal to be invalid, i.e., the left transformation enable signal len is 0, the right transformation enable signal ren is 0, and accordingly the inputs of all the addition and subtraction units are 0.
In some embodiments, the input unitThe feature matrix isThen, the sub-feature matrix is not transformed, and the first transformation matrix is [ Y ] 00 Y 05 Y 50 Y 55 ]。
In some embodiments, controlling the addition and subtraction unit to transform the sub-feature matrix according to an addition and subtraction transformation rule corresponding to the sub-feature matrix to obtain a first transformation matrix, includes: under the condition that the addition and subtraction transformation rule corresponding to the sub-feature matrix represents that target transformation processing is required, controlling an enabling signal corresponding to the target transformation processing to be effective, and controlling an addition and subtraction unit to perform the target transformation processing on the sub-feature matrix to obtain an addition and subtraction transformation result; wherein the target transform processing includes at least one of left transform processing and right transform processing; and controlling the data selector to adjust the data sequence in the addition and subtraction transformation result to obtain a first transformation matrix.
Wherein, the enabling signal is a signal when the sub-feature matrix is determined to carry out target transformation. That is, the enable signal decides whether the sub-feature matrix is subjected to one of left transform processing and right transform processing, or to simultaneously perform left transform processing and right transform processing, that is, left-right transform processing.
Illustratively, when the input sub-feature matrix is Y (0, j) (j ≠ 0), the addition-subtraction transformation rule corresponding to the sub-feature matrix characterizes right transformation, and then the electronic device controls the left transformation enable signal to be invalid, i.e. the left transformation enable signal len =0, and the right transformation enable signal to be valid, i.e. the right transformation enable signal ren =1. Accordingly, the input of all the addition and subtraction units having the right transform enable signal as input is 1.
In some embodiments, right transformation of the sub-feature matrix is illustrated.
Wherein, the input sub-feature matrix of the addition and subtraction unit isLen =0,ren =1, and the first transformation matrix obtained by right transformation is [ Y = 01 +Y 02 Y 02 -Y 01 Y 51 +Y 52 Y 52 -Y 51 ]。
In addition, Y is (0,1) The essence of performing the right transform is Y (0,1) Right-handed M matrix。
Illustratively, when the input sub-feature matrix is Y (i, 0) (i ≠ 0), the addition-subtraction transformation rule corresponding to the sub-feature matrix characterizes left transformation, and then the electronic device controls the left transformation enable signal to be valid, i.e. the left transformation enable signal len =1, and the right transformation enable signal to be invalid, i.e. the right transformation enable signal ren =0. Accordingly, the input of all the addition and subtraction units having the left conversion enable signal as an input is 1.
In some embodiments, the left transformation of the sub-feature matrix is illustrated.
The input addition and subtraction unit has a sub-feature matrix ofLen =1,ren =0, and the first transformation matrix obtained by left transformation is [ Y = 10 +Y 20 Y 15+ Y 25 Y 20- Y 10 Y 25 -Y 15 ]The specific process of performing the left transformation is as follows:
Illustratively, when the input sub-feature matrix is Y (i, j) (i ≠ 0, j ≠ 0), the addition-subtraction transformation rule corresponding to the sub-feature matrix characterizes left-right transformation, and then the electronic device controls that both the left transformation enable signal and the right transformation enable signal are valid, i.e. the left transformation enable signal len =1 and the right transformation enable signal ren =1. Accordingly, the inputs of all the addition and subtraction units having the left transform enable signal as an input are 1, and the inputs of all the addition and subtraction units having the right transform enable signal as an input are 1.
In some embodiments, the sub-feature matrix Y is used (2,2) For example, the left-right conversion is performed as follows.
The input addition and subtraction unit has a sub-feature matrix ofLen =1,ren =1, for Y in turn (2,2) And performing left transformation and right transformation, wherein a first transformation matrix obtained after the left and right transformation is:
in addition, for Y (2,2) The essence of left-right transformation is Y (2,2) Left and right multiplication M matrix respectively。
In the above embodiment, the corresponding transformation is executed according to the addition-subtraction transformation rule corresponding to the sub-feature matrix to obtain the first transformation matrix, which is simple and efficient.
In some embodiments, determining the displacement amount corresponding to the sub-feature matrix comprises:
determining a corresponding parameter sub-matrix in the post-transformation parameter matrix according to the target position; the target position is the position of the sub-feature matrix in the point-by-feature matrix;
determining a displacement reference matrix from the post-transformation parameter matrix;
and determining the displacement amount required by each shifter in the shifter group for performing the shifting operation on the characteristic data in the first transformation matrix according to the relative relation between the parameter submatrix and the displacement reference matrix.
Wherein, the post-transformation parameter matrix is a calculation formula S = A when Winograd algorithm executes convolution operation T [(GgG T )⨀(B T dB)]A in A or A T And (4) a matrix. For example, for Winograd functions of the F (2X 2, 3X 3) form, the post-transformation parameter matrix is. For example, for a Winograd function of the form F (4 × 4,3 × 3), the post-transform parameter matrix is。
Illustratively, for a Winograd function of the form F (4 × 4,3 × 3), the post-transform process output is:
the output after dot multiplication is:
after the point multiplication feature matrix is partitioned, each sub-feature matrix is respectively as follows:
and determining the displacement amount required by each shifter in the shifter group for performing the shifting operation on the characteristic data in the first transformation matrix according to the relation between the parameter submatrix corresponding to each sub-point multiplication matrix and the displacement reference matrix.
In some embodiments, the plurality of sub-feature matrices are post-transform processed sequentially; the control accumulator accumulates the second transformation matrixes corresponding to the sub-feature matrixes to obtain a post-transformation feature matrix generated by convolution processing, and the method comprises the following steps:
after a second transformation matrix corresponding to each sub-feature matrix is calculated, determining a target accumulator matched with the target position from the plurality of accumulators according to the point multiplication of the sub-feature matrix and the target position in the feature matrix;
and controlling the target accumulator to open and shield the non-target accumulator so as to accumulate a second transformation matrix corresponding to the sub-feature matrix and the currently accumulated transformation matrix, and performing post-transformation processing on the next sub-feature matrix after accumulation.
Illustratively, when the input sub-feature matrix is Y (0, 0), except for the accumulator S A (0, 0)、S A (0, n-1)、S A (n-1, 0)、S A (n-1 ), the input of the other accumulators is switched to 0.
Illustratively, when the input sub-feature matrix is Y (0, j) (j ≠ 0), except for the accumulator S A (0,: and S) A (n-1:), the input of the other accumulators is switched to 0.
Illustratively, when the input sub-feature matrix is Y (i, 0) (i ≠ 0), it is excluded from the accumulator S A (: 0) and S A (: n-1), the input to the other accumulators is switched to 0.
In some embodiments, the plurality of sub-feature matrices are post-transformed sequentially according to a post-transformation processing period; controlling the target accumulator to open and shield the non-target accumulator so as to accumulate a second transformation matrix corresponding to the sub-feature matrix and the currently accumulated transformation matrix, and performing post-transformation processing on the next sub-feature matrix after accumulation, wherein the post-transformation processing comprises the following steps:
controlling the target accumulator to open and shield the non-target accumulator so as to accumulate a second transformation matrix corresponding to the sub-feature matrix and the transformation matrix accumulated in the previous period, and entering the next post-transformation processing period so as to perform post-transformation processing on the next sub-feature matrix; wherein the preceding cycle is a post-transform processing cycle that has been previously performed.
In some embodiments, as shown in FIG. 3, a convolution processing arithmetic unit is provided. The following description is made with respect to fig. 3.
The addition and subtraction unit is used for realizing any one of left transformation, right transformation, left and right transformation and non-transformation on the sub-feature matrix.
Shifting is an operation of performing a binary shift of input data to the left or right. Wherein 4 data output by addition and subtraction are connected with n 2 A/4 shifter group outputting n 2 4 2 × 2 matrices, each set denoted S h (p,q)(p,q∈[0, n/2-1]). Each data connection n 2 N 4 shifters, total 2 Each shifter group constitutes an n × n shifter array, and corresponds to the data of the final result S one by one.
The accumulator accumulates the input data, can choose not to accumulate with the present data, namely add 0 operation, connect an accumulator behind each shifter group, form an n × n accumulator array module, each accumulator is marked as S A (r,t)(r,t∈[0, n-1])。
In some embodiments, for Winograd functions of the form F (n 1 × n1, k × k) and Winograd functions of the form F (n 2 × n2, k × k) (n 1, n2 are even and k is odd), if n1 > n2, the convolution processing arithmetic unit of F (n 1 × n1, k × k) is compatible for post-transform computation of F (n 2 × n2, k × k) by masking the partial accumulator output. If n is not changed, the post-conversion of the Winograd function with k being an odd number can be completed by the same convolution processing operation unit.
In some embodiments, there is provided a conversion control circuit comprising:
the addition and subtraction unit is used for transforming the sub-feature matrixes according to the addition and subtraction transformation rules corresponding to the sub-feature matrixes aiming at each sub-feature matrix in the plurality of sub-feature matrixes to obtain a first transformation matrix; the plurality of sub-feature matrices are obtained by partitioning the point-by-point feature matrix obtained in the convolution processing process; the point multiplication characteristic matrix is obtained by performing point multiplication on the pre-transformation characteristic matrix by using a convolution acceleration algorithm;
the shifter is used for determining the displacement amount corresponding to the sub-feature matrix, and performing shift transformation on the first transformation matrix according to the displacement amount to obtain a second transformation matrix; each shifter corresponds to the element position of the post-transformation feature matrix to be generated; the displacement is determined according to the arrangement position characteristics of the post-conversion parameters in the post-conversion parameter matrix;
the accumulator is used for accumulating the second transformation matrixes corresponding to the sub-feature matrixes to obtain post-transformation feature matrixes generated by convolution processing; wherein the accumulator corresponds to the position of the elements of the post-transform feature matrix.
In some embodiments, the shift control circuit in the above embodiments further comprises a data selector that adjusts an output data order according to the left shift enable signal and the right shift enable signal.
In some embodiments, in a case where both the left transform enable signal and the right transform enable signal are inactive, the addition/subtraction unit is configured not to transform the sub-feature matrix, and the data selector is configured not to adjust a data order of the sub-feature matrix, resulting in the first transform matrix.
In some embodiments, in a case that an enable signal corresponding to the target transformation processing is valid, the addition and subtraction unit is configured to perform the target transformation processing on the sub-feature matrix to obtain an addition and subtraction transformation result, and the data selector is configured to adjust an order of data in the addition and subtraction transformation result to obtain a first transformation matrix; wherein the target transform process includes at least one of a left transform process and a right transform process.
In some embodiments, as shown in FIG. 4, a schematic of a transformation control circuit is provided.
The MUX-0 is a data selector capable of switching input data to 0 and outputting the data; the 4MUX is a 4-path data selector; the shift is a left shift module, which can be implemented by a barrel shift circuit or the like, and performs left shift of the data by a shift _ op bit. The circuit can realize the above 3 kinds of matrix transformation, and the left transformation enables the signallen and right transform enable signal ren. len and ren directly control MUX-0 of the right input of the addition and subtraction unit, and corresponding transformation is not executed when the input is switched to 0. When conversion is not executed, the data sequence output by the addition and subtraction unit is { Y 'from left to right' A , Y’ B , Y’ C , Y’ D And the other 3 transformations, namely left transformation, right transformation and left-right transformation all change the output data sequence, and the 4-way data selector rearranges the sequence of the data output by the addition and subtraction unit into { Y' A , Y’ B , Y’ C , Y’ D }。
In some embodiments, as shown in FIG. 5, a schematic diagram of a convolution processing pipeline design is provided. Data latching and flowing are realized through the register in the middle of each stage of the circuit, and continuous input of data is guaranteed. The process of using the transform control circuit to perform a convolution processing pipeline of the form F (n × n, k × k) (n is an even number, k is an odd number) is as follows: splitting the dot-product feature matrix Y into (n + k-1) 2 4, 2X2 sub-feature matrixes are sent into one 2X2 sub-matrix to enter the convolution processing unit every period, and F (n X n, k X k) Winograd post-conversion is carried out once through (n + k-1) 2 The/4 cycles result in an n matrix result at the accumulator output.
In some embodiments, the overall computational process of the convolution processing method is illustrated.
The Winograd function for F (4 × 4,3 × 3) is:
output is as
Outputting after dot multiplication:
the conversion control circuit comprises a group of addition and subtractionUnit, 4 groups of 4 left shifter groups, 16 accumulators S A (0, 0)~S A (3, 3) correspond to S, respectively 00 ~S 33 . Dividing the point-multiplied feature matrix to obtain a plurality of sub-feature matrices as follows:
each post-transformation processing cycle is sent into a group of sub-feature matrixes, wherein the sub-feature matrixes Y which are not transformed (0,0) The post-conversion processing steps are as follows: sequentially adding Y (0,0) Inputting the addition and subtraction unit, the 4-path data selector, the shifter group and the accumulator to obtain Y (0,0) The corresponding sub-post transformation matrix.
Sub-feature matrix Y for right transformation (0,1) The post-conversion processing steps are as follows: sequentially adding Y (0,1) Y is obtained by inputting an addition and subtraction unit, a 4-path data selector and a shifter group (0,1) Corresponding sub-post-transform matrix, the accumulator converts Y (0,0) And Y (0,1) And accumulating the corresponding sub post-transformation matrix.
Y (0,2) And Y (0,1) And similarly, right transformation processing is carried out. For Y (0,2) The processing steps for right transformation are as follows: sequentially adding Y (0,2) Y is obtained by inputting an addition and subtraction unit, a 4-path data selector and a shifter group (0,2) Corresponding sub-post-transform matrix, the accumulator converts Y (0,0) 、Y (0,1) And Y (0,2) The corresponding sub-post transformation matrices are accumulated.
Sub-feature matrix Y for left transformation (1,0) The post-conversion processing steps are as follows: sequentially reacting Y (1,0) Y is obtained by inputting an addition and subtraction unit, a 4-path data selector and a shifter group (1,0) Corresponding sub-post-transform matrix, the accumulator converts Y (0,0) 、Y (0,1) 、Y (0,2) And Y (1,0) The corresponding sub-post transformation matrices are accumulated.
Y (2,0) And Y (1,0) Likewise, the description is not repeated.
Sub-feature matrix Y for left-right transformation (2,2) The post-conversion processing steps are as follows: sequentially reacting Y (2,2) Y is obtained by inputting an addition and subtraction unit, a 4-path data selector and a shifter group (2,2) Corresponding sub-post-transform matrix, the accumulator converts Y (0,0) 、Y (0,1) 、Y (0,2) 、Y (1,0) And Y (2,2) And accumulating the corresponding sub post-transformation matrix.
Calculation and transformation of other sub-feature matrices and Y (2, 2) Likewise, the description is not repeated. And accumulating the sub post-transformation matrixes corresponding to all the sub characteristic matrixes by the accumulator to obtain a post-transformation characteristic matrix generated by convolution processing.
It should be understood that, although the steps in the flowcharts related to the embodiments as described above are sequentially shown as indicated by arrows, the steps are not necessarily performed sequentially as indicated by the arrows. The steps are not limited to being performed in the exact order illustrated and, unless explicitly stated herein, may be performed in other orders. Moreover, at least a part of the steps in the flowcharts related to the above embodiments may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least a part of the steps or stages in other steps.
In some embodiments, an electronic device is provided, which may be a server, and the internal structure thereof may be as shown in fig. 6. The electronic device includes a processor, a memory, an Input/Output (I/O) interface, and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the electronic device is configured to provide computing and control capabilities. The memory of the electronic device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the electronic device is used for storing relevant data required for implementing the convolution processing method. The communication interface of the electronic device is used for connecting and communicating with an external terminal through a network. The computer program is executed by a processor to implement the steps of the convolution processing method described above.
In some embodiments, an electronic device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 7. The electronic device includes a processor, a memory, an Input/Output (I/O) interface, and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the electronic device is configured to provide computing and control capabilities. The memory of the electronic equipment comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The communication interface of the electronic device is used for communicating with an external terminal in a wired or wireless manner, and the wireless manner can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement the steps of the convolution processing method described above.
It will be understood by those skilled in the art that the configurations shown in fig. 6 or fig. 7 are only block diagrams of some configurations relevant to the present disclosure, and do not constitute a limitation on the electronic devices to which the present disclosure may be applied, and a particular electronic device may include more or less components than those shown in the drawings, or may combine some components, or have a different arrangement of components.
In some embodiments, an electronic device is provided, comprising a memory storing a computer program and a processor implementing the steps of the above-described method embodiments when the processor executes the computer program.
In some embodiments, a computer-readable storage medium 800 is provided, on which a computer program 820 is stored, the computer program 820 implementing the steps in the above-described method embodiments when being executed by a processor, the internal structure of which is shown in fig. 8.
In some embodiments, a computer program product is provided, comprising a computer program which, when executed by a processor, performs the steps of the above-described method embodiments.
It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, displayed data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data need to comply with the relevant laws and regulations and standards of the relevant country and region.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware related to instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, the computer program can include the processes of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high-density embedded nonvolatile Memory, resistive Random Access Memory (ReRAM), magnetic Random Access Memory (MRAM), ferroelectric Random Access Memory (FRAM), phase Change Memory (PCM), graphene Memory, and the like. Volatile Memory can include Random Access Memory (RAM), external cache Memory, and the like. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others. The databases referred to in various embodiments provided herein may include at least one of relational and non-relational databases. The non-relational database may include, but is not limited to, a block chain based distributed database, and the like. The processors referred to in the various embodiments provided herein may be, without limitation, general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic devices, quantum computing-based data processing logic devices, or the like.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above examples only express several embodiments of the present application, and the description thereof is more specific and detailed, but not to be construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, and these are all within the scope of protection of the present application. Therefore, the protection scope of the present application shall be subject to the appended claims.
Claims (12)
1. A convolution processing method, comprising:
partitioning the point-multiplied feature matrix obtained in the convolution processing process to obtain a plurality of sub-feature matrices; the point multiplication characteristic matrix is obtained by performing point multiplication on the pre-transformation characteristic matrix by using a convolution acceleration algorithm;
when post-transformation processing is carried out on each sub-feature matrix, controlling an addition and subtraction unit to transform the sub-feature matrix according to an addition and subtraction transformation rule corresponding to the sub-feature matrix to obtain a first transformation matrix;
determining the displacement corresponding to the sub-feature matrix, and controlling a shifter in a shifter group to perform shift transformation on the first transformation matrix according to the displacement to obtain a second transformation matrix; each shifter corresponds to the element position of the post-transformation feature matrix to be generated; the displacement is determined according to the arrangement position characteristics of post-conversion parameters in the post-conversion parameter matrix;
the accumulator is controlled to accumulate the second transformation matrixes corresponding to the sub-feature matrixes to obtain the post-transformation feature matrix generated by convolution processing; wherein the accumulator corresponds to an element position of the post-transform feature matrix.
2. The method of claim 1, wherein the partitioning the point-by-point feature matrix obtained in the convolution processing to obtain a plurality of sub-feature matrices comprises:
partitioning the edge position data in the point-by-point feature matrix obtained in the convolution processing process according to the size of a preset sub-matrix; the edge position data is feature data located at an edge position in the point-by-point feature matrix;
and sequentially blocking the feature data except the edge position data in the dot-product feature matrix according to the size of the preset sub-matrix to obtain a plurality of sub-feature matrices.
3. The method of claim 2, wherein the partitioning the edge position data in the point-by-point feature matrix obtained in the convolution processing according to a preset sub-matrix size comprises:
dividing the angular point data at the angular point position in the point-by-point characteristic matrix obtained in the convolution processing process into the same sub-characteristic matrix; the size of a sub-feature matrix formed by the corner data accords with the size of a preset sub-matrix;
merging the head and tail residual row data and the head and tail residual column data in the dot-product feature matrix, and respectively blocking the merged head and tail residual row data and the merged head and tail residual column data to obtain a sub-feature matrix in accordance with the size of the preset sub-matrix;
the head and tail residual line data are characteristic data in the head and tail line data except corner data in the head and tail lines; the head-to-tail remaining column data are feature data except corner data in the head-to-tail column data.
4. The method according to claim 1, wherein the controlling an addition and subtraction unit to transform the sub-feature matrix according to an addition and subtraction transformation rule corresponding to the sub-feature matrix to obtain a first transformation matrix comprises:
and under the condition that the addition and subtraction transformation rule corresponding to the sub-feature matrix represents that no transformation is performed, controlling an addition and subtraction unit not to transform the sub-feature matrix by controlling both a left transformation enabling signal and a right transformation enabling signal to be invalid, and controlling a data selector not to adjust the data sequence of the sub-feature matrix to obtain a first transformation matrix.
5. The method according to claim 1, wherein the controlling an addition and subtraction unit to transform the sub-feature matrix according to an addition and subtraction transformation rule corresponding to the sub-feature matrix to obtain a first transformation matrix comprises:
under the condition that the addition and subtraction transformation rule corresponding to the sub-feature matrix represents that target transformation processing is required, enabling signals corresponding to the target transformation processing are controlled to be effective, and an addition and subtraction unit is controlled to perform the target transformation processing on the sub-feature matrix to obtain an addition and subtraction transformation result; wherein the target transform process includes at least one of a left transform process and a right transform process;
and controlling a data selector to adjust the data sequence in the addition and subtraction transformation result to obtain a first transformation matrix.
6. The method according to claim 1, wherein the determining the displacement corresponding to the sub-feature matrix comprises:
determining a corresponding parameter sub-matrix of the sub-feature matrix in the post-transformation parameter matrix according to the target position; the target position is a position of the sub-feature matrix in the point-by-feature matrix;
determining a displacement reference matrix from the post-transformation parameter matrix;
and determining the displacement amount required by each shifter in the shifter group for performing the shifting operation on the feature data in the first transformation matrix according to the relative relation between the parameter sub-matrix and the displacement reference matrix.
7. The method according to any one of claims 1 to 6, wherein the plurality of sub-feature matrices are post-transformed in sequence; the control accumulator accumulates the second transformation matrix corresponding to each sub-feature matrix to obtain the post-transformation feature matrix generated by convolution processing, and the method comprises the following steps:
after each second transformation matrix corresponding to one sub-feature matrix is calculated, determining a target accumulator matched with the target position from a plurality of accumulators according to the target position of the sub-feature matrix in the point-by-point feature matrix;
and controlling the target accumulator to open and shield the non-target accumulator so as to accumulate a second transformation matrix corresponding to the sub-feature matrix and the currently accumulated transformation matrix, and performing post-transformation processing on the next sub-feature matrix after accumulation.
8. The method of claim 7, wherein the plurality of sub-feature matrices are post-transformed sequentially according to a post-transform processing period; the controlling the target accumulator to open and shield a non-target accumulator so as to accumulate the second transformation matrix corresponding to the sub-feature matrix and the currently accumulated transformation matrix, and performing post-transformation processing on the next sub-feature matrix after accumulation, includes:
controlling the target accumulator to open and shield a non-target accumulator so as to accumulate a second transformation matrix corresponding to the sub-feature matrix and the transformation matrix accumulated in the previous period, and entering the next post-transformation processing period so as to perform post-transformation processing on the next sub-feature matrix; wherein the preceding cycle is a post-transform processing cycle that has been previously performed.
9. A conversion control circuit, comprising:
the addition and subtraction unit is used for transforming each sub-feature matrix in the plurality of sub-feature matrices according to an addition and subtraction transformation rule corresponding to the sub-feature matrix to obtain a first transformation matrix; the plurality of sub-feature matrices are obtained by partitioning the point-by-point feature matrix obtained in the convolution processing process; the point multiplication characteristic matrix is obtained by performing point multiplication on the pre-transformation characteristic matrix by using a convolution acceleration algorithm;
the shifter is used for determining the displacement corresponding to the sub-feature matrix and performing shift transformation on the first transformation matrix according to the displacement to obtain a second transformation matrix; each shifter corresponds to the element position of the post-transformation feature matrix to be generated; the displacement is determined according to the arrangement position characteristics of post-conversion parameters in the post-conversion parameter matrix;
the accumulator is used for accumulating the second transformation matrixes corresponding to the sub-feature matrixes to obtain the post-transformation feature matrix generated by convolution processing; wherein the accumulator corresponds to an element position of the post-transform feature matrix.
10. An electronic device characterized in that the electronic device comprises the conversion control circuit according to claim 9.
11. An electronic device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor realizes the steps of the method of any of claims 1 to 8 when executing the computer program.
12. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211600016.7A CN115600062B (en) | 2022-12-14 | 2022-12-14 | Convolution processing method, circuit, electronic device and computer readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211600016.7A CN115600062B (en) | 2022-12-14 | 2022-12-14 | Convolution processing method, circuit, electronic device and computer readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115600062A true CN115600062A (en) | 2023-01-13 |
CN115600062B CN115600062B (en) | 2023-04-07 |
Family
ID=84854068
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211600016.7A Active CN115600062B (en) | 2022-12-14 | 2022-12-14 | Convolution processing method, circuit, electronic device and computer readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115600062B (en) |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106503797A (en) * | 2015-10-08 | 2017-03-15 | 上海兆芯集成电路有限公司 | The data for being received from neural memorizer are arranged the neutral net unit and collective with neural memorizer the neural pe array for being shifted |
KR20200003661A (en) * | 2018-07-02 | 2020-01-10 | 한양대학교 산학협력단 | Matrix operator and matrix operation method for artificial neural network |
FR3085517A1 (en) * | 2018-08-31 | 2020-03-06 | Commissariat A L'energie Atomique Et Aux Energies Alternatives | CALCULATOR ARCHITECTURE OF A CONVOLUTION LAYER IN A CONVOLUTIONAL NEURON NETWORK |
CN113283587A (en) * | 2021-05-28 | 2021-08-20 | 西安交通大学 | Winograd convolution operation acceleration method and acceleration module |
CN113496008A (en) * | 2021-09-06 | 2021-10-12 | 北京壁仞科技开发有限公司 | Method, computing device, and computer storage medium for performing matrix computations |
US20220012303A1 (en) * | 2020-07-07 | 2022-01-13 | NeoNexus Pte. Ltd. | Apparatus and method for matrix multiplication using processing-in-memory |
WO2022067508A1 (en) * | 2020-09-29 | 2022-04-07 | 华为技术有限公司 | Neural network accelerator, and acceleration method and device |
CN114399036A (en) * | 2022-01-12 | 2022-04-26 | 电子科技大学 | Efficient convolution calculation unit based on one-dimensional Winograd algorithm |
CN114707114A (en) * | 2022-04-25 | 2022-07-05 | 上海壁仞智能科技有限公司 | Blocking method and device, convolution operation method and device, and storage medium |
-
2022
- 2022-12-14 CN CN202211600016.7A patent/CN115600062B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106503797A (en) * | 2015-10-08 | 2017-03-15 | 上海兆芯集成电路有限公司 | The data for being received from neural memorizer are arranged the neutral net unit and collective with neural memorizer the neural pe array for being shifted |
KR20200003661A (en) * | 2018-07-02 | 2020-01-10 | 한양대학교 산학협력단 | Matrix operator and matrix operation method for artificial neural network |
FR3085517A1 (en) * | 2018-08-31 | 2020-03-06 | Commissariat A L'energie Atomique Et Aux Energies Alternatives | CALCULATOR ARCHITECTURE OF A CONVOLUTION LAYER IN A CONVOLUTIONAL NEURON NETWORK |
US20220012303A1 (en) * | 2020-07-07 | 2022-01-13 | NeoNexus Pte. Ltd. | Apparatus and method for matrix multiplication using processing-in-memory |
WO2022067508A1 (en) * | 2020-09-29 | 2022-04-07 | 华为技术有限公司 | Neural network accelerator, and acceleration method and device |
CN113283587A (en) * | 2021-05-28 | 2021-08-20 | 西安交通大学 | Winograd convolution operation acceleration method and acceleration module |
CN113496008A (en) * | 2021-09-06 | 2021-10-12 | 北京壁仞科技开发有限公司 | Method, computing device, and computer storage medium for performing matrix computations |
CN114399036A (en) * | 2022-01-12 | 2022-04-26 | 电子科技大学 | Efficient convolution calculation unit based on one-dimensional Winograd algorithm |
CN114707114A (en) * | 2022-04-25 | 2022-07-05 | 上海壁仞智能科技有限公司 | Blocking method and device, convolution operation method and device, and storage medium |
Non-Patent Citations (2)
Title |
---|
LIQIANG LU ET.AL: "Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs", 《2017 IEEE 25TH ANNUALINTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES》 * |
沈俊忠 等: "一种支持优化分块策略的矩阵乘加速器设计", 《计算机工程与科学》 * |
Also Published As
Publication number | Publication date |
---|---|
CN115600062B (en) | 2023-04-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20240169017A1 (en) | Methods and systems for implementing a convolution transpose layer of a neural network | |
CN110352422B (en) | Implementing basic computation primitives using Matrix Multiplication Accelerators (MMA) | |
CN107665126A (en) | Processor and method for apposition accumulation operations | |
CN110399591B (en) | Data processing method and device based on convolutional neural network | |
US11341400B1 (en) | Systems and methods for high-throughput computations in a deep neural network | |
JP7171883B2 (en) | efficient convolutional engine | |
WO2023065983A1 (en) | Computing apparatus, neural network processing device, chip, and data processing method | |
KR20190089685A (en) | Method and apparatus for processing data | |
CN113947200A (en) | Acceleration calculation method of neural network, accelerator and computer-readable storage medium | |
CN115600062B (en) | Convolution processing method, circuit, electronic device and computer readable storage medium | |
CN115485656A (en) | In-memory processing method for convolution operation | |
EP4206996A1 (en) | Neural network accelerator with configurable pooling processing unit | |
KR102372869B1 (en) | Matrix operator and matrix operation method for artificial neural network | |
CN113642722A (en) | Chip for convolution calculation, control method thereof and electronic device | |
EP4128064A1 (en) | Power reduction for machine learning accelerator | |
GB2604924A (en) | Methods and systems for generating the gradients of a loss function with respectto the weights of a convolution layer | |
US20220207332A1 (en) | Scalable neural network accelerator architecture | |
EP4345691A1 (en) | Methods and systems for performing channel equalisation on a convolution layer in a neural network | |
CN110765413A (en) | Matrix summation structure and neural network computing platform | |
CN113554163B (en) | Convolutional neural network accelerator | |
US20220391172A1 (en) | Implementation of Softmax and Exponential in Hardware | |
EP4160486A1 (en) | Neural network accelerator with a configurable pipeline | |
CN117077734A (en) | Convolution input conversion method, hardware accelerator and accelerator structure determination method | |
GB2611522A (en) | Neural network accelerator with a configurable pipeline | |
GB2614705A (en) | Neural network accelerator with configurable pooling processing unit |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |