CN114239814B

CN114239814B - Training method of convolution neural network model for image processing

Info

Publication number: CN114239814B
Application number: CN202210174146.2A
Authority: CN
Inventors: 艾国; 杨作兴; 房汝明; 向志宏
Original assignee: Hangzhou Yanji Microelectronics Co ltd
Current assignee: Hangzhou Yanji Microelectronics Co ltd
Priority date: 2022-02-25
Filing date: 2022-02-25
Publication date: 2022-07-08
Anticipated expiration: 2042-02-25
Also published as: CN114239814A

Abstract

The present disclosure relates to a training method of a convolutional neural network model for image processing. A method of training a convolutional neural network model for image processing, the method comprising: constructing a convolutional neural network model to be trained, wherein the parameters of the convolutional neural network model comprise one or more original convolutional kernels and one or more groups of convolutional kernel generation parameters; training the one or more original convolution kernels and the one or more sets of convolution kernel generation parameters using a training set image, wherein in the training, one derivative convolution kernel is generated based on at least a portion of the original convolution kernels using each set of convolution kernel generation parameters, and image features of the training set image are convolved using the one or more original convolution kernels and the generated one or more derivative convolution kernels.

Description

Training method of convolution neural network model for image processing

Technical Field

The present disclosure relates to image processing on the smart device side.

And more particularly, to a training method of a convolutional neural network model for image processing on an intelligent device side, an image processing method using the convolutional neural network model thus trained, a computer storage medium having the above method stored thereon, an image processing apparatus implementing the above method, and an intelligent device including the image processing apparatus.

Background

Currently, there is a wide need for image processing techniques in smart devices (e.g., cell phones, tablet computers, smart cameras, smart gates, etc.). For example, in an intelligent camera, image processing is required to realize functions such as face recognition and beauty.

The convolution neural network model adopted by the existing image processing method is large, and a large number of parameters need to be stored in a memory. However, the memory space of the smart device is usually tight, and therefore, it is desirable to occupy less memory space when performing image processing on the smart device side.

Therefore, there is a need to improve the training method of the convolutional neural network model and the corresponding image processing method, so as to miniaturize the trained convolutional neural network model and reduce the memory space and the computing resources occupied by the trained convolutional neural network model at the intelligent device side.

Disclosure of Invention

It is an object of the present disclosure to provide a method of training a convolutional neural network model for image processing.

According to one aspect of the present disclosure, there is provided a training method of a convolutional neural network model for image processing, the method comprising: constructing a convolutional neural network model to be trained, wherein the parameters of the convolutional neural network model comprise one or more original convolutional kernels and one or more groups of convolutional kernel generation parameters; training the one or more original convolution kernels and the one or more sets of convolution kernel generation parameters using a training set image, wherein in the training, one derivative convolution kernel is generated based on at least a portion of the original convolution kernels using each set of convolution kernel generation parameters, and image features of the training set image are convolved using the one or more original convolution kernels and the generated one or more derivative convolution kernels.

According to another aspect of the present disclosure, there is provided an image processing method, characterized in that the method includes: obtaining a convolutional neural network model trained according to the method; generating a corresponding derivative convolution kernel based on each set of convolution kernel generation parameters in the trained convolution neural network model and at least a portion of the one or more original convolution kernels; and performing convolution processing on the image characteristics of the image to be processed by using the one or more original convolution kernels and the generated one or more derivative convolution kernels.

According to another aspect of the present disclosure, there is provided a computer storage medium having stored thereon executable instructions that, when executed, are capable of implementing the above-described method.

According to another aspect of the present disclosure, there is provided an image processing apparatus characterized in that the apparatus is capable of implementing the above method.

According to another aspect of the present disclosure, a smart device is provided, wherein the smart device includes the above apparatus.

Other features of the present disclosure and advantages thereof will become more apparent from the following detailed description of exemplary embodiments thereof, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description, serve to explain the principles of the disclosure.

The present disclosure may be more clearly understood from the following detailed description, taken with reference to the accompanying drawings, in which:

fig. 1 shows a schematic diagram of a convolutional neural network model for image processing in the prior art.

Fig. 2 illustrates a flow diagram of a method of training a convolutional neural network model for image processing in accordance with at least one embodiment of the present disclosure.

Fig. 3 illustrates a schematic diagram of a convolutional neural network model for training in accordance with at least one embodiment of the present disclosure.

Fig. 4 illustrates a flow diagram of an image processing method according to at least one embodiment of the present disclosure.

Note that in the embodiments described below, the same reference numerals are used in common between different drawings to denote the same portions or portions having the same functions, and a repetitive description thereof will be omitted. In some cases, similar reference numbers and letters are used to denote similar items, and thus, once an item is defined in one figure, it need not be discussed further in subsequent figures.

For convenience of understanding, the positions, sizes, ranges, and the like of the respective structures shown in the drawings and the like do not sometimes indicate actual positions, sizes, ranges, and the like. Therefore, the present disclosure is not limited to the positions, dimensions, ranges, and the like disclosed in the drawings and the like.

Detailed Description

Various exemplary embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings. It should be noted that: the relative arrangement of parts and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses. That is, the structures and methods herein are shown by way of example to illustrate different embodiments of the structures and methods of the present disclosure. Those skilled in the art will understand, however, that they are merely illustrative of exemplary ways in which the disclosure may be practiced and not exhaustive. Furthermore, the figures are not necessarily to scale, some features may be exaggerated to show details of particular components.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.

In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.

Fig. 1 shows a schematic diagram of a prior art convolutional neural network model 100 for image processing.

The convolutional neural network model 100 generally comprises a plurality of convolutional layers, the structure of which is schematically shown in fig. 1. Each level of convolutional layer includes

convolution kernels

121, 122, 123 for performing convolution processing on the image features 110 input to that level of convolutional layer to generate output image features 130. The input image features 110 are input to the convolutional layer of the previous stage from the convolutional layer of the previous stage (or input to the convolutional layer of the previous stage as input features of the image to be processed), and the output image features 130 are output to the convolutional layer of the next stage (or output as output features of the image to be processed).

In the example of fig. 1, the input image features 110 include 4 channels with a resolution of 6 × 6; the output image features 130 include 3 channels with a resolution of 4 x 4; the

convolution kernels

121, 122, 123 each include 4 channels with a resolution of 3 x 3. The number of channels of the

convolution kernels

121, 122 and 123 is the same as the number of channels of the input image feature 110, and the number of the

convolution kernels

121, 122 and 123 is the same as the number of channels of the output image feature 130.

When the

convolution kernels

121, 122, 123 are used to perform convolution processing on the input image features 110, each convolution kernel is used to perform convolution processing on the input image features 110, and a feature value of a corresponding channel of the output image features 130 is obtained. For example, the feature value (aa) of the 1 st channel of the output image feature 130 is obtained by performing convolution processing on the input image feature 110 by using the convolution kernel 121₁, ab₁,…, dd₁）。

More specifically, a convolution kernel is used to perform a multiply-add operation with a portion of the feature values of the image feature 110, so as to obtain a feature value of a corresponding channel of the image feature 130. Then, similar multiply-add operation is performed on the convolution kernel and another part of feature values of the image feature 110, so as to obtain another feature value of a corresponding one of the channels of the image feature 130. By analogy, the process of convolving the image feature 110 with the convolution kernel is completed. A multiply-add operation as referred to herein refers to the multiplication followed by the addition of the results of the multiplication together.

For example, when the convolution kernel 121 is used to perform convolution processing on the image feature 110, the convolution kernel 121 and the feature value aa of the image feature 110 may be used₁, ab₁, ac₁, ba₁, bb₁, bc₁, ca₁, cb₁, cc₁; aa₂, ab₂, ac₂, …, ca₂, cb₂, cc₂; aa₃, ab₃, ac₃, … ca₄, cb₄, cc₄Performing a multiplication and addition operation to obtain a characteristic value aa of the 1 st channel of the image characteristic 130₁(ii) a And then using the convolution kernel 121 with the feature values ab of the image features 110₁, ac₁, ad₁, bb₁, bc₁, bd₁, cb₁, cc₁, cd₁; ab₂, ac₂, ad₂, … cb₄, cc₄, cd₄To carry outMultiplication and addition operation are carried out to obtain the characteristic value ab of the 1 st channel of the image characteristics 130₁(ii) a And so on, using the convolution kernel 121 and the feature value dd of the image feature 110₁, de₁, df₁, ed₁, ee₁, ef₁, fd₁, fe₁, ff₁; dd₂, de₂, df₂, … fd₄, fe₄, ff₄Performing multiply-add operation to obtain the feature value dd of the 1 st channel of the image feature 130₁Thus, all feature values of channel 1 of the image feature 130 are obtained.

For descriptive convenience, a vector composed of values at the same coordinate in different channels in the image features 110, 130 or

convolution kernels

121, 122, 123 including a plurality of channels will be referred to as a channel vector hereinafter. For example, the feature value aa at coordinate (a, a) in different channels of the image feature 110 (i.e., at row 1, column 1) may be determined₁, aa₂, aa₃, aa₄The constructed vector is called channel vector u_aaAnd may be represented by a parameter AA at coordinates (a, a) in different channels of the convolution kernel 121 (i.e., at row 1, column 1)₁, AA₂, AA₃, AA₄The constructed vector is called a channel vector v_AAAnd so on.

When a convolution kernel and a part of feature values of the image features 110 are subjected to a multiplication and addition operation, a point multiplication result of each channel vector of the convolution kernel and a corresponding channel vector of the image features 110 is calculated for each channel vector of the convolution kernel, and the sum of the point multiplication results of each channel vector of the convolution kernel and the corresponding channel vector of the image features 110 is used as a feature value at a corresponding one coordinate of a corresponding channel of the output image features 130.

For example, the feature value aa of the image feature 110 is calculated using the convolution kernel 121₁, ab₁, ac₁, …, ca₁, cb₁, cc₁; aa₂, ab₂, ac₂, … ca₄, cb₄, cc₄When performing the multiply-add operation, each channel vector v for the convolution kernel 121_AA=[AA₁, AA₂, AA₃, AA₄], v_AB=[AB₁, AB₂, AB₃, AB₄], v_AC=[AC₁, AC₂, AC₃, AC₄], …, v_CC=[CC₁, CC₂, CC₃, CC₄]Respectively calculate its corresponding channel vector u with the image feature 110_aa=[aa₁, aa₂, aa₃, aa₄], u_ab=[ab₁, ab₂, ab₃, ab₄], u_ac=[ac₁, ac₂, ac₃, ac₄], …, u_cc=[cc₁, cc₂, cc₃, cc₄]The result of dot multiplication of (i.e. calculating u separately)_aa·v_AA, u_ab·v_AB, u_ac·v_AC,…, u_cc·v_CC. Thereafter, the sum of the above dot product results is taken as a feature value aa at the coordinates (a, a) of the 1 st channel of the image feature 130₁. Similarly, the feature value ab of the image feature 110 is calculated using the convolution kernel 121₁, ac₁, ad₁, …, cb₁, cc₁, cd₁; ab₂, ac₂, ad₂, … cb₄, cc₄, cd₄When performing the multiply-add operation, the vector v is calculated for each channel of the convolution kernel 121_AA=[AA₁, AA₂, AA₃, AA₄], v_AB=[AB₁, AB₂, AB₃, AB₄], v_AC=[AC₁, AC₂, AC₃, AC₄], …, v_CC=[CC₁, CC₂, CC₃, CC₄]Respectively calculate the corresponding channel vectors u with the image features 110_ab=[ab₁, ab₂, ab₃, ab₄], u_ac=[ac₁, ac₂, ac₃, ac₄], u_ad=[ad₁, ad₂, ad₃, ad₄], …, u_cd=[cd₁, cd₂, cd₃, cd₄]As a result of the dot multiplication, i.e. calculating u separately_ab·v_AA, u_ac·v_AB, u_ad·v_AC, …, u_cd·v_CC. Then, the sum of the dot product results is used as the feature value ab at the coordinates (a, b) of the 1 st channel of the image feature 130₁. And so on, calculate u_dd·v_AA, u_de·v_AB, u_df·v_AC, …, u_ff·v_CCAs a result of the dot multiplication of (c), the sum of them is taken as the feature value dd at the coordinates (d, d) of the 1 st channel of the image feature 130₁Thereby obtaining all feature values of channel 1 of the image feature 130.

It can be seen that the parameters of each level of convolutional layer of the convolutional neural network model 100 of the prior art include the parameters of the

convolutional kernels

121, 122, 123, and the number of parameters of each level of convolutional layer is equal to the product of the number of

convolutional kernels

121, 122, 123 and the number of channels and resolution of each convolutional kernel. In the example of fig. 1, this level of convolutional layers includes 108 parameters, which is equal to the product of the number of convolutional kernels (3) and the number of channels per convolutional kernel (4) and the resolution (3 × 3= 9). During training, the parameters need to be trained; in the subsequent image processing, these parameters need to be stored in a memory space.

It should be noted that fig. 1 only schematically illustrates the structure of the convolutional neural network model 100. In practical applications, the number of convolution kernels and the number of channels in each level of convolution layer of the convolutional neural network model 100, as well as the number of channels and resolution of the image features processed by it, are generally much larger than in the example of fig. 1. Thus, each convolutional layer of the prior art convolutional neural network model 100 includes a large number of parameters. In image processing, these parameters require a large amount of memory space.

However, on the smart device side, memory space is often tight. Therefore, there is a need for an improved image processing method that achieves similar effects using less memory space when performing image processing on the smart device side. Under the same precision, the number of the parameters of the model is reduced, and the purpose of model miniaturization is achieved.

Fig. 2 illustrates a flow diagram of a method 200 of training a convolutional neural network model for image processing in accordance with at least one embodiment of the present disclosure. The method 200 may be used to train an improved convolutional neural network model that is particularly suited for image processing on the smart device side.

At step 201, the method 200 begins.

At step 202, a convolutional neural network model to be trained is constructed. Wherein the parameters of the convolutional neural network model include one or more original convolution kernels and one or more sets of convolution kernel generation parameters. For example, the number of parameters for each set of convolution kernel generation parameters may be the same as the number of original convolution kernels.

At step 204, one or more original convolution kernels and one or more sets of convolution kernel generation parameters are trained using a training set image. Wherein at step 206, a derivative convolution kernel is generated based on at least a portion of the original convolution kernels using each set of convolution kernel generation parameters. For example, each convolution kernel generation parameter in each set of convolution kernel generation parameters may correspond to an original convolution kernel representing a weight of the original convolution kernel at the time the corresponding derivative convolution kernel was generated. At step 208, the image features of the training set images are convolved with the one or more original convolution kernels and the generated one or more derivative convolution kernels.

At step 210, the method 200 ends.

The various steps in method 200 are described in more detail below in conjunction with fig. 3.

Fig. 3 illustrates a schematic diagram of a convolutional neural network model 300 for training in accordance with at least one embodiment of the present disclosure. Convolutional neural network model 300 may include multiple convolutional layers, one of which is schematically illustrated in fig. 3. In the illustrated first-level convolution layer, the input image features 310 are convolved with

original convolution kernels

321, 322 and a generated derivative convolution kernel 331, thereby generating output image features 340. Where the derivative convolution kernel 331 is generated based on the

original convolution kernels

321, 322 using a set of convolution kernel generation parameters 323.

It can be seen that similar to the prior art convolutional neural network model 100 in fig. 1, the input image features 310 are also convolved with 3 convolution kernels in the first-level convolutional layer shown in fig. 3, respectively. However, the parameters of the first-order convolution layer of the convolutional neural network model 300 trained according to the aforementioned training method 200 include only the

original convolution kernels

321 and 322 and the convolution kernel generation parameter 323, and do not include the derivative convolution kernel 331, because the derivative convolution kernel 331 can be generated based on the

original convolution kernels

321 and 322 and the convolution kernel generation parameter 323. Accordingly, as shown in FIG. 2, at step 204 of method 200, training is performed only for the

original convolution kernels

321, 322 and the convolution kernel generation parameters 323.

Thus, the number of parameters included in the first convolutional layer of the convolutional neural network model 300 trained according to the method 200 is equal to the number of parameters of the convolutional kernel generation parameter 323 plus the number of parameters of the original convolutional kernels 321 and 322 (i.e., the product of the number of original convolutional kernels and the number of channels and resolution of each original convolutional kernel). In the example of fig. 3, this level of convolutional layers includes a number of parameters 74 equal to the number of parameters (2) of the convolutional kernel generation parameters 323 plus the number of parameters (72) of the original

convolutional kernels

321, 322, where the number of parameters of the original

convolutional kernels

321, 322 is equal to the product of the number of original convolutional kernels (2) and the number of channels (4) and resolution (3 × 3= 9) of each convolutional kernel.

In contrast, in the prior art example shown in fig. 1, the number of parameters included in the first convolutional layer of the convolutional neural network model 100 is 108. In the case of performing convolution processing on the input image features by using 3 convolution kernels as well, the number of parameters included in the first convolution layer of the convolutional neural network model 300 trained by the training method 200 according to the present invention is about 2/3 in the prior art. The training method and the image processing method greatly reduce the memory space occupied by the convolutional neural network model adopted in the image processing, and are particularly suitable for the image processing of an intelligent device side.

In the example of fig. 3, a derivative convolution kernel 331 is generated based on the

original convolution kernels

321, 322 using a set of convolution kernel generation parameters 323. In other embodiments, a plurality of derivative convolution kernels may be generated based on the

original convolution kernels

321, 322 using a plurality of sets of convolution kernel generation parameters, respectively.

In a preferred embodiment, the number of parameters for each set of convolution kernel generation parameters may be the same as the number of original convolution kernels used to generate the corresponding derivative convolution kernels. Each convolution kernel generation parameter may correspond to an original convolution kernel representing a weight of the original convolution kernel at the time of generating the corresponding derivative convolution kernel. For example, as shown in fig. 3, a set of convolution kernel generation parameters 323 includes 2 parameters α 1 and α 2, representing the weights of the

original convolution kernels

321 and 322, respectively, in generating the derivative convolution kernel 331.

In a further preferred embodiment, each derivative convolution kernel may be generated from at least a portion of the original convolution kernel linear transform. Wherein each of the corresponding set of convolution kernel generation parameters may represent a linear transform coefficient of the corresponding original convolution kernel to the derivative convolution kernel. For example, the derivative convolution kernel 331 may be generated by linear transformation of the

original convolution kernels

321 and 322, where the parameters α 1 and α 2 represent linear transformation coefficients of the

original convolution kernels

321 and 322 to the derivative convolution kernel 331, respectively.

Specifically, the parameters at the respective coordinates of the respective channels of the derivative convolution kernel 331 may be obtained by linear transformation of the parameters at the respective coordinates of the respective channels of the

original convolution kernels

321 and 322, where the linear transformation coefficients of the parameters in the

original convolution kernels

321 and 322 are α 1 and α 2, respectively. For example, parameter AA at coordinates (A, A) of channel 1 of derivative convolution kernel 331₁May be equal to the parameter AA at the coordinates (a, a) of the 1 st channel of the original convolution kernel 321₁Multiplying by α 1 plus the parameter AA at the coordinate (A, A) of channel 1 of the original convolution kernel 322₁Multiplied by alpha 2.

In a preferred embodiment, the number of parameters for each set of convolution kernel generation parameters may be the same as the product of the number of original convolution kernels and their number of channels used to generate the corresponding derivative convolution kernels. Each convolution kernel generation parameter may correspond to a channel of an original convolution kernel, and represents a weight of the channel of the original convolution kernel when a corresponding derivative convolution kernel is generated. For example, the set of convolution kernel generation parameters 323 may include 8 parameters α 1, α 2, …, α 8 (not shown in fig. 3), representing the weights of the 4 channels of the original convolution kernel 321 and the 4 channels of the original convolution kernel 322, respectively, in generating the derivative convolution kernel 331.

In a further preferred embodiment, each channel of each derived convolution kernel may be generated from a corresponding channel linear transformation of at least a portion of the original convolution kernel. Wherein each of the corresponding set of convolution kernel generation parameters may represent a linear transform coefficient of a corresponding channel of the corresponding original convolution kernel to the channel of the derived convolution kernel. For example, the 4 channels of the derivative convolution kernel 331 may be generated by linear transforms of the 4 channels of the original convolution kernel 321 and the 4 channels of the original convolution kernel 322, where the parameters α 1 and α 2 represent linear transform coefficients of the 1 st channel of the original convolution kernel 321 and the 1 st channel of the original convolution kernel 322 to the 1 st channel of the derivative convolution kernel 331, respectively, the parameters α 3 and α 4 represent linear transform coefficients of the 2 nd channel of the original convolution kernel 321 and the 2 nd channel of the original convolution kernel 322 to the 2 nd channel of the derivative convolution kernel 331, respectively, and so on.

Specifically, the parameter AA at the coordinates (a, a) of the 1 st channel of the derivative convolution kernel 331₁May be equal to the parameter AA at the coordinates (a, a) of channel 1 of the original convolution kernel 321₁Multiplying by α 1 plus the parameter AA at the coordinate (A, A) of channel 1 of the original convolution kernel 322₁Multiplying by α 2, deriving the parameter AA at the coordinates (A, A) of channel 2 of the convolution kernel 331₂May be equal to the parameter AA at the coordinates (a, a) of the 2 nd channel of the original convolution kernel 321₂Multiplied by α 3 plus the parameter AA at the coordinate (A, A) of channel 2 of the original convolution kernel 322₂Multiplied by α 4, and so on.

In some embodiments, the number of parameters for each set of convolution kernel generation parameters may be the same as the product of the number of original convolution kernels and the number of parameters included therein used to generate the corresponding derivative convolution kernel. Each convolution kernel generation parameter may correspond to a parameter of an original convolution kernel, and represents a weight of the parameter of the original convolution kernel when the corresponding derivative convolution kernel is generated. In some embodiments, each parameter of each derived convolution kernel may be generated from a corresponding parametric linear transform of at least a portion of the original convolution kernel, wherein each of the corresponding set of convolution kernel generation parameters may represent a linear transform coefficient of the corresponding parameter of the original convolution kernel to the parameter of the derived convolution kernel.

In a preferred embodiment, a normalization constraint may be applied to each set of convolution kernel generation parameters during training such that each parameter in each set of convolution kernel generation parameters is approximately of the same magnitude. In particular, in the foregoing preferred embodiment, this is such that the weight of one original convolution kernel or one channel of an original convolution kernel to which each parameter in each set of convolution kernel generation parameters corresponds in generating the corresponding derivative convolution kernel is approximately of the same order. In a preferred embodiment, the normalization constraint may include defining the sum of each set of convolution kernel generation parameters (which may be defined as 1 or 1.2, for example). Further, in some embodiments, the normalization constraint may include defining a minimum value of the parameters in each set of convolution kernel generation parameters (e.g., which may be defined as 0.1).

In some embodiments, a derivative convolution kernel may be generated in a non-linear manner based on at least a portion of the original convolution kernels such that the generated derivative convolution kernel is linearly independent of the original convolution kernels. Non-linear approaches have a higher computational complexity in image processing than linear approaches, but can achieve better image processing results because the corresponding channels of the output features generated based on the derivative convolution kernel are linearly independent of the corresponding channels of the output features generated based on the original convolution kernel.

In a preferred embodiment, a derivative convolution kernel may be generated in the training based on every two original convolution kernels. Thus, the number of sets of convolution kernel generation parameters may be half the number of original convolution kernels. For example, in the embodiment shown in FIG. 3, a derivative convolution kernel 331 may be generated based on the

original convolution kernels

321, 322. Accordingly, in a preferred embodiment, each set of convolution kernel generation parameters may include two parameters, each representing a weight of a respective original convolution kernel at the time of generation of a respective derived convolution kernel.

In some embodiments, one derivative convolution kernel may be generated in the training based at least in part on every three or more original convolution kernels, i.e., the number of sets of convolution kernel generation parameters may be less than half (e.g., one-third or less) of the number of original convolution kernels. In the case where the number of convolution kernels (including the original convolution kernels and the derivative convolution kernels) is the same, generating one derivative convolution kernel based on more original convolution kernels may result in a higher storage space required for training the resulting convolutional neural network model, but the image processing effect may be better. Furthermore, in some embodiments, a derived convolution kernel may be generated in the training based at least in part on an original convolution kernel, i.e., the number of sets of convolution kernel generation parameters may also be more than half the number of original convolution kernels.

Further, in some embodiments, in addition to generating the derivative convolution kernel based on the original convolution kernel, a new derivative convolution kernel may be generated based on at least a portion of the generated derivative convolution kernel.

In a preferred embodiment, the parameters of each level of convolutional layers in convolutional neural network model 300 include one or more original convolutional kernels and one or more sets of convolutional kernel generation parameters. In other embodiments, the parameters of the partial convolution layer in the convolutional neural network model 300 may not include convolution kernel generation parameters. That is, in training, convolution processing may be performed using only the original convolution kernel in the partial convolution layer, without generating a derivative convolution kernel to perform convolution processing.

In a preferred embodiment, the manner in which the convolution kernel generation parameters are used in each level of convolution layers in the convolutional neural network model 300 to generate a derivative convolution kernel based on the original convolution kernel may be the same. In a particularly preferred embodiment, the number of sets of convolution kernel generation parameters in each level of convolution layers in the convolutional neural network model 300 may be half the number of original convolution kernels, i.e., one derivative convolution kernel may be generated based on every two original convolution kernels in the training. Thus, in the case that each convolutional layer performs convolution processing on the input image features by using the same number of convolution kernels (including the original convolution kernel and the derivative convolution kernel) correspondingly, the number of parameters included in the convolutional layer of the convolutional neural network model trained according to the preferred embodiment of the present invention is about 2/3 in the prior art. The method greatly reduces the memory space occupied by the convolutional neural network model adopted in the image processing, and is particularly suitable for the image processing of the intelligent device side.

Fig. 4 illustrates a flow diagram of an image processing method 400 in accordance with at least one embodiment of the present disclosure.

At step 401, the method 400 begins.

At step 402, a trained convolutional neural network model 300, schematically illustrated in FIG. 3, trained in accordance with the method 200 illustrated in FIG. 2 is obtained. The parameters of the convolutional neural network model 300 include, among other things, one or more

original convolution kernels

321, 322 and one or more sets of convolution kernel generation parameters 323.

At step 404, a corresponding derivative convolution kernel 331 is generated based on each set of convolution kernel generation parameters 323 and at least a portion of the one or more

original convolution kernels

321, 322 in the trained convolutional neural network model 300. The manner in which the corresponding derivative convolution kernel 331 is generated based on the convolution kernel generation parameter 323 and the

original convolution kernels

321, 322 is the same as the manner in which the corresponding derivative convolution kernel 331 is generated based on the convolution kernel generation parameter 323 and the

original convolution kernels

321, 322 in training at step 206 of the method 200.

At step 406, the image features 310 of the image to be processed are convolved with one or more

original convolution kernels

321, 322 and the generated one or more derivative convolution kernels 331. The image features 310 of the images to be processed are convolved here using the

original convolution kernels

321, 322 and the derivative convolution kernel 331 in the same way as the image features 310 of the training set images are convolved in the training using the

original convolution kernels

321, 322 and the derivative convolution kernel 331 at step 208 of the method 200.

At step 408, the method 400 ends.

The methods according to the present disclosure may be implemented in various suitable manners, such as in software, hardware, a combination of software and hardware, and the like.

In another aspect, a computer storage medium may be implemented having executable instructions stored thereon that, when executed, are capable of implementing the above-described method. In another aspect, the present invention also includes an image processing apparatus capable of implementing the above-described image processing method. The invention also comprises an intelligent device which comprises the image processing device. For example, the smart device may be a cell phone, a tablet, a camera, a smart camera, and the like.

The terms "front," "back," "top," "bottom," "over," "under," and the like in the description and in the claims, if any, are used for descriptive purposes and not necessarily for describing permanent relative positions. It is to be understood that the terms so used are interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are, for example, capable of operation in other orientations than those illustrated or otherwise described herein.

As used herein, the word "exemplary" means "serving as an example, instance, or illustration," and not as a "model" that is to be replicated accurately. Any implementation exemplarily described herein is not necessarily to be construed as preferred or advantageous over other implementations. Furthermore, the disclosure is not limited by any expressed or implied theory presented in the preceding technical field, background, brief summary or the detailed description.

As used herein, the term "substantially" is intended to encompass any minor variation resulting from design or manufacturing imperfections, device or component tolerances, environmental influences, and/or other factors. The word "substantially" also allows for differences from a perfect or ideal situation due to parasitics, noise, and other practical considerations that may exist in a practical implementation.

In addition, the foregoing description may refer to elements or nodes or features being "connected" or "coupled" together. As used herein, unless expressly stated otherwise, "connected" means that one element/node/feature is directly connected to (or directly communicates with) another element/node/feature, either electrically, mechanically, logically, or otherwise. Similarly, unless expressly stated otherwise, "coupled" means that one element/node/feature may be mechanically, electrically, logically, or otherwise joined to another element/node/feature in a direct or indirect manner to allow for interaction, even though the two features may not be directly connected. That is, to "couple" is intended to include both direct and indirect joining of elements or other features, including connection with one or more intermediate elements.

In addition, "first," "second," and like terms may also be used herein for reference purposes only, and thus are not intended to be limiting. For example, the terms "first," "second," and other such numerical terms referring to structures or elements do not imply a sequence or order unless clearly indicated by the context.

It will be further understood that the terms "comprises/comprising," "includes" and/or "including," when used herein, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

In the present disclosure, the term "providing" is used broadly to encompass all ways of obtaining an object, and thus "providing an object" includes, but is not limited to, "purchasing," "preparing/manufacturing," "arranging/setting," "installing/assembling," and/or "ordering" the object, and the like.

Those skilled in the art will appreciate that the boundaries between the above described operations merely illustrative. Multiple operations may be combined into a single operation, single operations may be distributed in additional operations, and operations may be performed at least partially overlapping in time. Moreover, alternative embodiments may include multiple instances of a particular operation, and the order of operations may be altered in various other embodiments. However, other modifications, variations, and alternatives are also possible. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.

Although some specific embodiments of the present disclosure have been described in detail by way of example, it should be understood by those skilled in the art that the foregoing examples are for purposes of illustration only and are not intended to limit the scope of the present disclosure. The various embodiments disclosed herein may be combined in any combination without departing from the spirit and scope of the present disclosure. It will also be appreciated by those skilled in the art that various modifications may be made to the embodiments without departing from the scope and spirit of the disclosure. The scope of the present disclosure is defined by the appended claims.

Claims

1. A method of training a convolutional neural network model for image processing, the method comprising:

constructing a convolutional neural network model to be trained, wherein the convolutional neural network model comprises a plurality of convolutional layers, and parameters of the convolutional neural network model comprise one or more original convolutional kernels and one or more groups of convolutional kernel generation parameters;

training the one or more original convolution kernels and the one or more sets of convolution kernel generation parameters using a training set image, wherein,

in training, a derivative convolution kernel is generated based on at least a part of original convolution kernels by using each set of convolution kernel generation parameters, and image features of images in a training set are subjected to convolution processing by using the one or more original convolution kernels and the generated one or more derivative convolution kernels, wherein each derivative convolution kernel is generated by at least a part of the original convolution kernels through linear transformation or nonlinear mode, and each derivative convolution kernel and each of the one or more original convolution kernels used for generating the derivative convolution kernel are used for carrying out convolution processing on input image features in the primary convolution layer to obtain feature values of a corresponding one channel of the output image features.

2. The method of claim 1, wherein

The number of parameters for each set of convolution kernel generation parameters is the same as the number of original convolution kernels used to generate the corresponding derivative convolution kernels, and

each convolution kernel generation parameter corresponds to an original convolution kernel representing a weight of the original convolution kernel at the time the corresponding derivative convolution kernel was generated.

3. The method of claim 2, wherein

Each of the corresponding set of convolution kernel generation parameters represents a linear transform coefficient of the corresponding original convolution kernel to the one derivative convolution kernel.

4. The method of claim 1, wherein

The number of parameters of each set of convolution kernel generation parameters is the same as the product of the number of original convolution kernels and the number of channels thereof used to generate the corresponding derivative convolution kernels, and

each convolution kernel generation parameter corresponds to a channel of an original convolution kernel and represents a weight of the channel of the original convolution kernel at the time of generating the corresponding derivative convolution kernel.

5. The method of claim 4, wherein

Each channel of each derivative convolution kernel is generated from a corresponding channel linear transformation of at least a portion of the original convolution kernel, and

each of the corresponding set of convolution kernel generation parameters represents a linear transform coefficient of a corresponding channel of the corresponding original convolution kernel to the one channel of the one derivative convolution kernel.

6. The method of claim 1, wherein

The number of parameters of each group of convolution kernel generation parameters is the same as the product of the number of original convolution kernels used for generating the corresponding derivative convolution kernels and the number of parameters contained in the original convolution kernels.

7. The method of claim 1, wherein

The number of sets of convolution kernel generation parameters is half the number of original convolution kernels, and a derivative convolution kernel is generated in the training based on every two original convolution kernels.

8. The method of claim 1, further comprising

In training, a new derivative convolution kernel is generated based on at least a portion of the generated derivative convolution kernels.

9. The method of claim 1, wherein the method further comprises employing a normalized constraint on each set of convolution kernel generation parameters.

10. The method of claim 9, wherein the normalization constraint includes defining a sum of each set of convolution kernel generation parameters.

11. The method of claim 1, wherein

The parameters for each level of convolution layer include one or more original convolution kernels and one or more sets of convolution kernel generation parameters.

12. An image processing method, characterized in that the method comprises:

obtaining a convolutional neural network model trained according to the method of any one of claims 1-11;

generating a corresponding derivative convolution kernel based on each set of convolution kernel generation parameters in the trained convolutional neural network model and at least a portion of the one or more original convolution kernels; and

and performing convolution processing on the image characteristics of the image to be processed by utilizing the one or more original convolution kernels and the generated one or more derivative convolution kernels.

13. A computer storage medium having stored thereon executable instructions, which when executed are capable of implementing the method of any one of claims 1-12.

14. An image processing apparatus, characterized in that said apparatus is capable of implementing the method according to claim 12.

15. A smart device, characterized in that it comprises the apparatus according to claim 14.