CN108416425B

CN108416425B - Convolution operation method and device

Info

Publication number: CN108416425B
Application number: CN201810105945.8A
Authority: CN
Inventors: 陆金刚; 沈强; 方伟
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Xinsheng Electronic Technology Co Ltd
Priority date: 2018-02-02
Filing date: 2018-02-02
Publication date: 2020-09-29
Anticipated expiration: 2038-02-02
Also published as: CN108416425A

Abstract

The invention discloses a convolution operation method and a device, wherein the method comprises the following steps: generating target weight coefficient synchronous data and target feature synchronous data corresponding to the target weight coefficient synchronous data according to an input feature graph to be convolved and the position of each target weight coefficient in a convolution kernel in the convolution kernel, wherein the target weight coefficient is a nonzero weight coefficient in the convolution kernel; determining a convolution result of the convolution kernel according to the target weight coefficient in the convolution kernel, and target weight coefficient synchronous data and target characteristic synchronous data corresponding to the convolution kernel; and determining a target convolution result according to the convolution result of each convolution kernel. In the embodiment of the invention, the generated target weight coefficient synchronous data does not contain the weight coefficient which is zero, and the generated target characteristic synchronous data does not contain the characteristic data which corresponds to the weight coefficient which is zero, so that the weight coefficient which is zero and the characteristic data which corresponds to the weight coefficient do not need to be operated, and the efficiency of convolution operation is improved.

Description

Convolution operation method and device

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to a convolution operation method and apparatus.

Background

Convolutional Neural Networks (CNN) is an important application of deep learning algorithms in the field of image processing. The convolution neural network combines two-dimensional discrete convolution operation in image processing with an artificial neural network, can automatically extract features, and is mainly applied to identification and detection of two-dimensional images. Convolutional layers are the most important layers of convolutional neural networks, which are named by the layer. The convolutional layer is mainly used for extracting shallow features of an image, such as: edge, gradient information. Convolution operation of the convolutional layer occupies 95% of the operation amount of the whole convolutional neural network, and is a key for practical application of the convolutional neural network.

As shown in FIG. 1, assume an input feature map (T)_in) The number is N, and a characteristic diagram (T) is output_out) The number of the convolution kernels is M, the width of the feature graph is P _ W, the height of the feature graph is P _ H, the width of the convolution kernel is K _ W, the height of the convolution kernel is K _ H, and the number of the convolution kernels is N M; the number of multiplications to be performed when performing convolution operation is P _ W _ P _ H _ K _ W _ K _ H _ N _ M, while there are a large number of multiplications when performing convolution operation, and the number of parallel multipliers in hardware implementation directly determines the performance of convolution operation. However, the hardware area of the current multiplier is large, and the number of the multipliers is often limited based on the consideration of chip cost, so how to not increase the number of the multipliers isOn the basis of adding the number of multipliers, the efficiency of convolution operation is improved, and the problem to be solved urgently is solved.

Disclosure of Invention

The invention provides a convolution operation method and a convolution operation device, which are used for improving the efficiency of convolution operation.

The embodiment of the invention discloses a convolution operation method, which comprises the following steps:

aiming at each convolution kernel, generating target weight coefficient synchronous data and target feature synchronous data corresponding to the target weight coefficient synchronous data according to an input feature graph to be convolved and the position of each target weight coefficient in the convolution kernel, wherein the target weight coefficient is a non-zero weight coefficient in the convolution kernel;

for each convolution kernel, determining the convolution result of the convolution kernel according to the target weight coefficient in the convolution kernel, and the target weight coefficient synchronous data and the target characteristic synchronous data corresponding to the convolution kernel;

and determining a target convolution result according to the convolution result of each convolution kernel.

Further, the generating, for each convolution kernel, target weight coefficient synchronization data according to the input feature map to be convolved and the position of each target weight coefficient in the convolution kernel, and the target feature synchronization data corresponding to the target weight coefficient synchronization data includes:

for each convolution kernel, generating weight coefficient synchronous data and feature synchronous data corresponding to the convolution kernel according to an input feature graph to be convolved;

and for each convolution kernel, according to the position of each target weight coefficient in the convolution kernel, removing the non-target weight coefficient in the weight coefficient synchronous data corresponding to the convolution kernel, determining target weight coefficient synchronous data, removing the characteristic data corresponding to the non-target weight coefficient in the characteristic synchronous data corresponding to the convolution kernel, and determining target characteristic synchronous data, wherein the non-target weight coefficient is a zero weight coefficient.

Further, before generating, for each convolution kernel, target weight coefficient synchronization data and target feature synchronization data corresponding to the target weight coefficient synchronization data according to the input feature graph to be convolved and the position of each target weight coefficient in the convolution kernel, the method further includes:

A. the method comprises the steps of serially sequencing the weight coefficients of a convolution kernel according to a raster sequence, determining a sequenced weight coefficient sequence, and taking the weight coefficient corresponding to the start bit of the weight coefficient sequence as a first weight coefficient;

B. identifying whether the first weight coefficient is a target weight coefficient, if so, performing C, and if not, performing D;

C. pressing the first weight coefficient into a data code stream, judging whether a next weight coefficient adjacent to the first weight coefficient exists in the weight coefficient sequence, if so, taking the adjacent next weight coefficient as the first weight coefficient, performing B, and if not, performing E;

D. taking the first weight coefficient as an initial weight coefficient in the weight coefficient sequence, identifying the number of weight coefficients with the weight coefficients being continuously zero, pressing a preset characteristic value and the number into a data code stream, judging whether a second weight coefficient with the number being reduced by 1 weight coefficient is present in the weight coefficient sequence after the first weight coefficient, if so, taking the second weight coefficient as the first weight coefficient, and carrying out B, and if not, carrying out E;

E. adding a code stream header to the data code stream to generate a compressed code stream, wherein the code stream header contains information of the size of the convolution kernel;

F. reading a code stream head and a data code stream in a compressed code stream, identifying the size of a convolution kernel in the code stream head, and determining the position of each target weight coefficient in each convolution kernel in the convolution kernel according to the size of the convolution kernel, each target weight coefficient in the data code stream, each preset characteristic value and the corresponding number of the preset characteristic values.

Further, before adding a bitstream header to the data bitstream and generating a compressed bitstream, the method further includes:

identifying a first length of the data code stream, and judging whether the first length is smaller than a second length of the weight coefficient sequence;

if yes, the subsequent steps are carried out.

Further, if the first length is not less than the second length of the sequence of weight coefficients, the method further comprises:

and taking the weight coefficient sequence as a data code stream.

Further, the bitstream header further includes information on whether the data bitstream is compressed and information on the length of the data bitstream.

The invention discloses a convolution operation device, comprising:

the generating module is used for generating target weight coefficient synchronous data and target feature synchronous data corresponding to the target weight coefficient synchronous data according to an input feature graph to be convolved and the position of each target weight coefficient in the convolution kernel aiming at each convolution kernel, wherein the target weight coefficient is a nonzero weight coefficient in the convolution kernel;

the first determining module is used for determining a convolution result of each convolution kernel according to a target weight coefficient in the convolution kernel, and target weight coefficient synchronous data and target characteristic synchronous data corresponding to the convolution kernel;

and the second determining module is used for determining a target convolution result according to the convolution result of each convolution kernel.

Further, the generating module is specifically configured to generate, for each convolution kernel, weight coefficient synchronization data and feature synchronization data corresponding to the convolution kernel according to the input feature map to be convolved; and for each convolution kernel, according to the position of each target weight coefficient in the convolution kernel, removing the non-target weight coefficient in the weight coefficient synchronous data corresponding to the convolution kernel, determining target weight coefficient synchronous data, removing the characteristic data corresponding to the non-target weight coefficient in the characteristic synchronous data corresponding to the convolution kernel, and determining target characteristic synchronous data, wherein the non-target weight coefficient is a zero weight coefficient.

Further, the apparatus further comprises: the compression determining module comprises a sorting unit, an identification unit, a first compression unit, a second compression unit, a generation unit and an identification determining unit; wherein,

the sorting unit is used for serially sorting the weight coefficients of the convolution kernel according to a raster sequence, determining a sorted weight coefficient sequence and taking the weight coefficient corresponding to the start bit of the weight coefficient sequence as a first weight coefficient;

the identification unit is used for identifying whether the first weight coefficient is a target weight coefficient, if so, triggering the first compression unit, and if not, triggering the second compression unit;

the first compression unit is used for pressing the first weight coefficient into a data code stream, judging whether a next weight coefficient adjacent to the first weight coefficient exists in the weight coefficient sequence, if so, taking the adjacent next weight coefficient as the first weight coefficient, and triggering the identification unit, and if not, triggering the generation unit;

the second compression unit is configured to identify, in the weight coefficient sequence, the number of weight coefficients with weight coefficients continuously equal to zero with the first weight coefficient as an initial weight coefficient, press a preset feature value and the number into a data code stream, determine whether a second weight coefficient with the number being less than 1 weight coefficient after the first weight coefficient exists in the weight coefficient sequence, if so, use the second weight coefficient as the first weight coefficient, and trigger the identification unit, and if not, trigger the generation unit;

the generating unit is configured to add a bitstream header to the data bitstream to generate a compressed bitstream, where the bitstream header includes information of a size of a convolution kernel;

the identification determining unit is used for reading a code stream head and a data code stream in a compressed code stream, identifying the size of a convolution kernel in the code stream head, and determining the position of each target weight coefficient in each convolution kernel in the convolution kernel according to the size of the convolution kernel, each target weight coefficient in the data code stream, each preset characteristic value and the corresponding number of the preset characteristic values.

Further, the compression determination module further comprises:

a judging unit, configured to add a stream header to the data stream, identify a first length of the data stream before generating a compressed stream, and judge whether the first length is smaller than a second length of the weight coefficient sequence; if yes, the generation unit is triggered.

Further, the compression determination module further comprises:

and the updating unit is used for taking the weight coefficient sequence as a data code stream if the judgment result of the judging unit is negative.

The invention discloses a convolution operation method and a device, wherein the method comprises the following steps: aiming at each convolution kernel, generating target weight coefficient synchronous data and target feature synchronous data corresponding to the target weight coefficient synchronous data according to an input feature graph to be convolved and the position of each target weight coefficient in the convolution kernel, wherein the target weight coefficient is a non-zero weight coefficient in the convolution kernel; for each convolution kernel, determining the convolution result of the convolution kernel according to the target weight coefficient in the convolution kernel, and the target weight coefficient synchronous data and the target characteristic synchronous data corresponding to the convolution kernel; and determining a target convolution result according to the convolution result of each convolution kernel. In the embodiment of the invention, the position of each target weight coefficient in the convolution kernel is identified for each convolution kernel, and the target weight coefficient synchronous data and the target feature synchronous data corresponding to the target weight coefficient synchronous data are generated according to the input feature map to be convolved, wherein the target weight coefficient is a non-zero weight coefficient, the generated target weight coefficient synchronous data does not contain a weight coefficient which is zero, and the generated target feature synchronous data does not contain feature data corresponding to the zero weight coefficient, so that when the convolution operation is performed, a multiplier in the electronic equipment does not need to perform operation on the weight coefficient which is zero in the convolution kernel and the feature data corresponding to the weight coefficient which is zero, and the efficiency of the convolution operation is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a diagram illustrating a convolution operation in the prior art;

fig. 2 is a schematic diagram of a convolution operation process provided in embodiment 1 of the present invention;

FIG. 3 is a schematic diagram illustrating a feature to be convolved according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a convolution kernel according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a convolution process according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of target weight coefficient synchronization data and target feature synchronization data according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of weight coefficient synchronization data and feature synchronization data according to an embodiment of the present invention;

FIG. 8 is a diagram illustrating a convolution kernel according to an embodiment of the present invention;

FIG. 9 is a schematic diagram of a compressed code stream according to an embodiment of the present invention;

FIG. 10 is a schematic diagram of a compressed code stream according to an embodiment of the present invention;

FIG. 11 is a diagram illustrating a convolution operation apparatus according to an embodiment of the present invention;

fig. 12 is a schematic structural diagram of a convolution operation apparatus according to a real-time embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example 1:

fig. 2 is a schematic diagram of a convolution operation process provided in an embodiment of the present invention, where the process includes:

s201: and aiming at each convolution kernel, generating target weight coefficient synchronous data and target characteristic synchronous data corresponding to the target weight coefficient synchronous data according to the input feature graph to be convolved and the position of each target weight coefficient in the convolution kernel, wherein the target weight coefficient is a non-zero weight coefficient in the convolution kernel.

The convolution operation method provided by the embodiment of the invention is applied to electronic equipment, and the electronic equipment can be a tablet Personal Computer (PC), a server and other equipment. The feature map to be convolved may be a data map corresponding to the image.

Specifically, for each convolution kernel, according to the size of the convolution kernel and the size of the input feature graph to be convolved, the number of sliding steps (stride) of the convolution kernel when performing convolution operation on the feature graph to be convolved is determined, and according to the position of each target weight coefficient in the convolution kernel, each target weight coefficient in the convolution kernel is extracted according to the raster sequence, so as to generate target weight coefficient synchronization data. When convolution operation is carried out on the feature graph to be convolved according to the convolution kernel, determining target feature data corresponding to the target weight coefficient in the feature data in the feature graph to be convolved corresponding to each step of the convolution kernel sliding and the position of each target weight coefficient in the convolution kernel, extracting the target feature data according to the raster sequence according to the sliding sequence of the convolution kernel when the convolution kernel carries out the convolution operation on the feature graph to be convolved, and generating target feature synchronous data corresponding to the target weight coefficient synchronous data.

FIG. 3 is a schematic diagram of a feature to be convolved according to an embodiment of the present invention, where D_0，0、D_0，1…, and fig. 4 is a schematic diagram of a convolution kernel provided by an embodiment of the present invention, wherein C is_0，0、C_0，1… are the weight coefficients in the convolution kernel, where the size of the convolution kernel is 3x3 and the weight coefficients in the shaded positions are zero. When the convolution kernel shown in fig. 4 is used to perform convolution operation on the feature map to be convolved shown in fig. 3, each step of convolution kernel sliding corresponds to the feature data in the feature map to be convolved as shown in fig. 5, wherein "X" represents the multiplication of corresponding positions, as shown in fig. 5, and C is the first step of convolution kernel sliding_0，0And D_0，0Corresponding multiplication, C_0，1And D_0，1Corresponding to multiplication …, C at the second step of convolution kernel sliding_0，0And D_0，1Corresponding multiplication, C_0，1And D_0，2Corresponding to multiplication …. Wherein the target weight coefficient in the convolution kernel shown in FIG. 4 is C_0，0、C_0，2、C_1，1When the convolution kernel shown in fig. 4 is used to perform convolution operation on the feature graph to be convolved shown in fig. 3, the corresponding target weight coefficient synchronization data and the target feature synchronization data are shown in fig. 6, where the target weight coefficient synchronization data only includes the target weight coefficient D_0，0、D_0，2And the target feature synchronous data only comprises the target weight coefficient D_0，0、D_0，2Equal corresponding target feature C_0，0、C_0，2And the like.

S202: and aiming at each convolution kernel, determining the convolution result of the convolution kernel according to the target weight coefficient in the convolution kernel, and the target weight coefficient synchronous data and the target characteristic synchronous data corresponding to the convolution kernel.

Specifically, for each convolution kernel, according to the target weight coefficient synchronization data and the target feature synchronization data corresponding to the convolution kernel, for each target weight coefficient in the target weight coefficient synchronization data, multiplying the target weight coefficient by the feature synchronization data corresponding to the target weight coefficient in the target feature synchronization data, and adding the product results corresponding to the same convolution kernel according to the number of the target weight coefficients in the convolution kernel to obtain a convolution result.

As shown in fig. 6, for each target weight coefficient in the target weight coefficient synchronization data, the target weight coefficient is multiplied by the feature synchronization data corresponding to the target weight coefficient in the target feature synchronization data, that is, D_0，0And C_0，0Multiplication, D_0，2And C_0，2And multiplying …, and adding the product results corresponding to the same convolution kernel according to the number of the target weight coefficients in the convolution kernels to obtain a convolution result.

S203: and determining a target convolution result according to the convolution result of each convolution kernel.

Specifically, the convolution results of each convolution kernel are added according to the convolution result of each convolution kernel, and a target convolution result is determined. In the embodiment of the present invention, determining the target convolution result according to the convolution result of each convolution kernel is performed in the prior art, and details are not repeated.

In the embodiment of the invention, the position of each target weight coefficient in the convolution kernel is identified for each convolution kernel, and the target weight coefficient synchronous data and the target feature synchronous data corresponding to the target weight coefficient synchronous data are generated according to the input feature map to be convolved, wherein the target weight coefficient is a non-zero weight coefficient, the generated target weight coefficient synchronous data does not contain a weight coefficient which is zero, and the generated target feature synchronous data does not contain feature data corresponding to the zero weight coefficient, so that when the convolution operation is performed, a multiplier in the electronic equipment does not need to perform operation on the weight coefficient which is zero in the convolution kernel and the feature data corresponding to the weight coefficient which is zero, and the efficiency of the convolution operation is improved.

Example 2:

in order to ensure the accuracy of the convolution operation, on the basis of the above embodiment, in the embodiment of the present invention, for each convolution kernel, generating target weight coefficient synchronization data according to an input feature map to be convolved and a position of each target weight coefficient in the convolution kernel, where the target weight coefficient synchronization data corresponds to the target weight coefficient synchronization data includes:

Specifically, for each convolution kernel, according to an input feature map to be convolved, generating weight coefficient synchronization data and feature synchronization data corresponding to the convolution kernel, according to the position of each target weight coefficient in the convolution kernel, determining the position of each non-target weight coefficient in the weight coefficient synchronization data corresponding to the convolution kernel, removing the non-target weight coefficient in the weight coefficient synchronization data corresponding to the convolution kernel, determining target weight coefficient synchronization data, according to the position of each non-target weight coefficient in the weight coefficient synchronization data corresponding to the convolution kernel, determining each non-target feature data corresponding to the non-target weight coefficient in the feature synchronization data, removing the feature data corresponding to the non-target weight coefficient in the feature synchronization data corresponding to the convolution kernel, and determining the target feature synchronization data. In the embodiment of the present invention, it is prior art to generate the weight coefficient synchronization data and the feature synchronization data according to the convolution kernel and the feature graph to be convolved, and details are not repeated.

Taking the feature map to be convolved shown in fig. 3 and the convolution kernel shown in fig. 4 as examples, the generated weight coefficient synchronization data and feature synchronization data are shown in fig. 7 according to the feature map to be convolved shown in fig. 3 and the convolution kernel shown in fig. 4, the position of the target weight coefficient in the convolution kernel is determined according to the position of each target weight coefficient in the convolution kernel, the position of each non-target weight coefficient in the weight coefficient synchronization data corresponding to the convolution kernel is determined, the non-target weight coefficient in the weight coefficient synchronization data corresponding to the convolution kernel is removed, the target weight coefficient synchronization data shown in fig. 6 is determined, each non-target feature data corresponding to the non-target weight coefficient in the feature synchronization data is determined according to the position of each non-target weight coefficient in the weight coefficient synchronization data corresponding to the convolution kernel, the feature data corresponding to the non-target weight coefficient in the feature synchronization data corresponding to the convolution kernel is removed, and the target feature data shown in fig. 6 is obtained.

Example 3:

on the basis of the foregoing embodiments, in the embodiment of the present invention, before generating, for each convolution kernel, target weight coefficient synchronization data and target feature synchronization data corresponding to the target weight coefficient synchronization data according to an input feature map to be convolved and a position of each target weight coefficient in the convolution kernel, the method further includes:

C. pressing the first weight coefficient into a data code stream, judging whether a next weight coefficient adjacent to the first weight coefficient exists in the weight coefficient sequence, if so, taking the adjacent next weight coefficient as the first weight coefficient, performing B, and otherwise, performing E;

D. taking the first weight coefficient as an initial weight coefficient in the weight coefficient sequence, identifying the number of weight coefficients with the weight coefficients being continuously zero, pressing a preset characteristic value and the number into a data code stream, judging whether a second weight coefficient with the number being reduced by 1 weight coefficient is present in the weight coefficient sequence after the first weight coefficient, if so, taking the second weight coefficient as the first weight coefficient, and carrying out B, otherwise, carrying out E;

Specifically, after the compressed code stream is generated, the electronic device reads a code stream head and a data code stream in the compressed code stream, identifies the data size in the code stream head, and determines a convolution kernel corresponding to each target weight coefficient and a position in the convolution kernel according to the size of the convolution kernel, the position of each target weight coefficient in the data code stream, the position of each preset characteristic value and the corresponding number thereof in the data code stream.

As shown in fig. 8, it is described that there are 3 convolution kernels in a convolution layer, each convolution kernel has a size of 3 × 3, and the preset characteristic value is T, where the weight coefficient of the corresponding shaded portion in the convolution kernel is 0, the weight coefficients of the convolution kernel C0, the convolution kernel C1, and the convolution kernel C2 are sorted in series according to the raster order, the sorted weight coefficient sequence C0,0, C0,0, 1 … C2, 2, 2, C0,0,0 is determined as the target weight coefficient, C0,0,0 is pushed into the data stream, C0,0,0 after 0, there is a weight coefficient C0,0, 1, C0,0, 1 is identified as the non-target weight coefficient, the number of weight coefficients with C0,0, 1 as the initial weight coefficient and the weight coefficient with zero in succession is 1, the characteristic value and the number "1" are pushed into the data stream, and it is continuously determined whether there are the weight coefficients of C0,0, 1, 0 and 362 are the target weight coefficients, until the weight coefficient sequence is encoded, adding a code stream header containing information of the size of the convolution kernel to the generated data code stream to generate a compressed code stream, wherein the compressed code stream generated according to the convolution kernel shown in fig. 8 is shown in fig. 9, and N is the size of the convolution kernel.

Still taking the compressed code stream shown in fig. 9 as an example, the size of the convolution kernel is 3 × 3, 9 weight coefficients exist in the convolution kernel, the position coordinates (P _ COE) of the weight coefficients in the convolution kernel after being sorted according to the raster are determined to be P0, P1, and P2 … P8 in sequence, the position coordinates P _ COE of the weight coefficients are initialized to be P0, and the calculation range is determined to be 0-8 because of the 9 weight coefficients existing in the convolution kernel; sequentially identifying each data in the data code stream, wherein the first data is non-characteristic data T, extracting the first data C0,0 and 0 as the 1 st non-zero weight coefficient, setting the position of the first data in a first convolution kernel (C1) as P0, and updating P _ COE to be P1; if the second data is T, continuously identifying the number 1 corresponding to the third data, and updating the P _ COE to be P2 according to the number 1; the fourth data is non-characteristic data T, the fourth data C0 is extracted, 0,2 is the 2 nd non-zero weight coefficient, the position of the fourth data in the first convolution kernel (C1) is P2, and the P _ COE is updated to be P3; if the fifth data is T, continuously identifying the number 1 corresponding to the sixth data, and updating the P _ COE to be P4 according to the number 1; the seventh data is non-characteristic data T, the seventh data C0 is extracted, 1,1 is the 3 rd non-zero weight coefficient, the position of the seventh data in the first convolution kernel (C1) is P4, and the P _ COE is updated to be P5; when the eighth data is T, continuously identifying the quantity 4 corresponding to the ninth data, and updating the P _ COE to be P9 according to the quantity 4; and if the P _ COE is 9, the counting range is 0-8, the decoding of the nonzero weight coefficient of the current convolution kernel is completed, the position of each target weight coefficient in the current convolution kernel in the convolution kernel is determined, the P _ COE is updated to be 0 (the current value is 9 minus the number of the convolution kernel coefficients 9), the position of the first target weight coefficient in the next convolution kernel (C2) is determined until all data in the data code stream are identified, and the position of each target weight coefficient in each convolution in the convolution kernel is determined.

Example 4:

in order to further reduce storage and bandwidth consumption, before adding a bitstream header to the data bitstream and generating a compressed bitstream, the method further includes:

if yes, the subsequent steps are carried out.

If the first length is not less than the second length of the sequence of weight coefficients, the method further comprises:

and taking the weight coefficient sequence as a data code stream.

Specifically, adding a code stream header to the data code stream, before generating the compressed code stream, the electronic device identifies a first length of the data code stream, judges whether the first length is smaller than a second length of the weight coefficient sequence, if so, indicates that the length of the data code stream is smaller than the length of the weight coefficient sequence, and adds the code stream header to the data code stream to generate the compressed code stream.

And if the first length is not less than the second length of the weight coefficient sequence, taking the weight coefficient sequence as a data code stream, and adding a code stream header to the weight sequence to generate a compressed code stream in order to ensure that the storage and bandwidth occupied by the generated compressed code stream are minimum.

In addition, in order to facilitate the electronic device to identify the data code stream, the code stream header further includes information on whether the data code stream is compressed and information on the length of the data code stream.

Specifically, when the data code stream in the compressed code stream is a weight coefficient sequence, the uncompressed information of the data code stream is recorded in the data code stream header, and when the data code stream in the compressed code stream is not a weight coefficient sequence, the compressed information of the data code stream is recorded in the data code stream header.

Fig. 10 is a schematic diagram of a compressed code stream provided in an embodiment of the present invention, where the generated compressed code stream is divided into two parts: a code stream header and a data code stream; the code stream header comprises a compressed flag bit (F) with fixed byte number, a convolution kernel size flag bit (N) and code stream data length (L); the data code stream contains all convolution kernel weight coefficient information, and the shaded part in fig. 10 represents the code stream header (occupying 3 bytes); the compression flag bit F (1 byte) is 1 (indicating compression); the convolution kernel size flag bit N (occupying 7 bytes) is 3; the length L (2 bytes) of each data in the data code stream is 31 bytes, and the characteristic value T in the data code stream can be replaced by 0.

Example 5:

fig. 11 is a schematic diagram of a convolution operation device according to an embodiment of the present invention, where weight coefficient soft compression is a software offline processing portion, and the rest is a chip hardware processing portion, specifically, the weight coefficient soft compression compresses a convolution kernel in a manner as described in embodiment 3, and stores a compressed weight coefficient, that is, a compressed code stream, in a memory (DDR), and also stores image data in the DDR, which is equivalent to a feature map to be convolved. The device comprises a weight coefficient reading and caching device, a weight coefficient decompressing device and an image data reading and caching device, wherein the weight coefficient is obtained through a BUS (BUS) and used for reading and decompressing the weight coefficient, the position of each target weight coefficient in a convolution kernel is determined; the convolution result processing device is used for determining a target convolution result according to the convolution result of each convolution kernel and writing the target convolution result into the DDR through the BUS, namely storing the obtained convolution result in the DDR.

Example 6:

fig. 12 is a schematic structural diagram of a convolution operation apparatus according to a real-time embodiment of the present invention, where the apparatus includes:

a generating module 121, configured to generate, for each convolution kernel, target weight coefficient synchronization data and target feature synchronization data corresponding to the target weight coefficient synchronization data according to an input feature map to be convolved and a position of each target weight coefficient in the convolution kernel, where the target weight coefficient is a non-zero weight coefficient in the convolution kernel;

a first determining module 122, configured to determine, for each convolution kernel, a convolution result of the convolution kernel according to a target weight coefficient in the convolution kernel, and target weight coefficient synchronization data and target feature synchronization data corresponding to the convolution kernel;

and a second determining module 123, configured to determine a target convolution result according to the convolution result of each convolution kernel.

The generating module 121 is specifically configured to generate, for each convolution kernel, weight coefficient synchronization data and feature synchronization data corresponding to the convolution kernel according to an input feature map to be convolved; and for each convolution kernel, according to the position of each target weight coefficient in the convolution kernel, removing the non-target weight coefficient in the weight coefficient synchronous data corresponding to the convolution kernel, determining target weight coefficient synchronous data, removing the characteristic data corresponding to the non-target weight coefficient in the characteristic synchronous data corresponding to the convolution kernel, and determining target characteristic synchronous data, wherein the non-target weight coefficient is a zero weight coefficient.

The device further comprises: a compression determination module 124, wherein the compression determination module 124 comprises a sorting unit 1241, an identification unit 1242, a first compression unit 1243, a second compression unit 1244, a generation unit 1245, and an identification determination unit 1246; wherein,

the sorting unit 1241 is configured to serially sort the weight coefficients of the convolution kernel according to a raster order, determine a sorted weight coefficient sequence, and use a weight coefficient corresponding to a start bit of the weight coefficient sequence as a first weight coefficient;

the identifying unit 1242 is configured to identify whether the first weight coefficient is a target weight coefficient, if so, trigger the first compressing unit, and if not, trigger the second compressing unit;

the first compression unit 1243 is configured to press the first weight coefficient into a data code stream, determine whether a next weight coefficient adjacent to the first weight coefficient exists in the weight coefficient sequence, if so, use the adjacent next weight coefficient as the first weight coefficient, trigger an identification unit, and if not, trigger a generation unit;

the second compression unit 1244 is configured to, in the weight coefficient sequence, use the first weight coefficient as an initial weight coefficient, identify the number of weight coefficients whose weight coefficients are continuously zero, press a preset feature value and the number into a data code stream, determine whether a second weight coefficient whose number is less than 1 weight coefficient after the first weight coefficient exists in the weight coefficient sequence, if so, use the second weight coefficient as the first weight coefficient, trigger the identification unit, and if not, trigger the generation unit;

the generating unit 1245 is configured to add a bitstream header to the data bitstream, and generate a compressed bitstream, where the bitstream header includes information of a size of a convolution kernel;

the identification determining unit 1246 is configured to read a bitstream header and a data bitstream in a compressed bitstream, identify the size of a convolution kernel in the bitstream header, and determine the position of each target weight coefficient in each convolution kernel in the convolution kernel according to the size of the convolution kernel, each target weight coefficient in the data bitstream, each preset feature value, and the number corresponding to each preset feature value.

The compression determination module 124 further includes:

a determining unit 1247, configured to add a stream header to the data stream, identify a first length of the data stream before generating a compressed stream, and determine whether the first length is smaller than a second length of the weight coefficient sequence; if yes, the generation unit is triggered.

The compression determination module 124 further includes:

an updating unit 1248, configured to, if the determination result of the determining unit is negative, take the weight coefficient sequence as a data code stream.

The code stream header also comprises information whether the data code stream is compressed and information of the length of the data code stream.

For the system/apparatus embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference may be made to some descriptions of the method embodiments for relevant points.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A method of convolution operation, the method comprising:

for each convolution kernel, generating target weight coefficient synchronous data and target feature synchronous data corresponding to the target weight coefficient synchronous data according to a feature map to be convolved of an input image and the position of each target weight coefficient in the convolution kernel, wherein the target weight coefficient is a non-zero weight coefficient in the convolution kernel;

determining a target convolution result according to the convolution result of each convolution kernel;

before generating, for each convolution kernel, target weight coefficient synchronization data and target feature synchronization data corresponding to the target weight coefficient synchronization data according to a feature map to be convolved of an input image and a position of each target weight coefficient in the convolution kernel, the method further includes:

2. The method according to claim 1, wherein the generating, for each convolution kernel, target weight coefficient synchronization data according to the feature map to be convolved of the input image and the position of each target weight coefficient in the convolution kernel, and the target feature synchronization data corresponding to the target weight coefficient synchronization data comprises:

for each convolution kernel, generating weight coefficient synchronous data and characteristic synchronous data corresponding to the convolution kernel according to a characteristic graph to be convolved of an input image;

3. The method of claim 1, wherein prior to adding a bitstream header to the data bitstream to generate a compressed bitstream, the method further comprises:

if yes, the subsequent steps are carried out.

4. The method of claim 3, wherein if the first length is not less than the second length of the sequence of weight coefficients, the method further comprises:

and taking the weight coefficient sequence as a data code stream.

5. The method of claim 1, wherein the bitstream header further includes information on whether the data bitstream is compressed and information on a length of the data bitstream.

6. A convolution operation apparatus, the apparatus comprising:

the generating module is used for generating target weight coefficient synchronous data and target feature synchronous data corresponding to the target weight coefficient synchronous data according to a feature map to be convolved of an input image and the position of each target weight coefficient in the convolution kernel aiming at each convolution kernel, wherein the target weight coefficient is a nonzero weight coefficient in the convolution kernel;

the second determining module is used for determining a target convolution result according to the convolution result of each convolution kernel;

the device further comprises: the compression determining module comprises a sorting unit, an identification unit, a first compression unit, a second compression unit, a generation unit and an identification determining unit; wherein,

7. The apparatus according to claim 6, wherein the generating module is specifically configured to generate, for each convolution kernel, weight coefficient synchronization data and feature synchronization data corresponding to the convolution kernel according to a feature map to be convolved of an input image; and for each convolution kernel, according to the position of each target weight coefficient in the convolution kernel, removing the non-target weight coefficient in the weight coefficient synchronous data corresponding to the convolution kernel, determining target weight coefficient synchronous data, removing the characteristic data corresponding to the non-target weight coefficient in the characteristic synchronous data corresponding to the convolution kernel, and determining target characteristic synchronous data, wherein the non-target weight coefficient is a zero weight coefficient.

8. The apparatus of claim 6, wherein the compression determination module further comprises:

9. The apparatus of claim 8, wherein the compression determination module further comprises:

10. The apparatus of claim 6, wherein the bitstream header further comprises information on whether the data bitstream is compressed and information on a length of the data bitstream.