CN107229968B

CN107229968B - Gradient parameter determination method, gradient parameter determination device and computer-readable storage medium

Info

Publication number: CN107229968B
Application number: CN201710373287.6A
Authority: CN
Inventors: 万韶华
Original assignee: Beijing Xiaomi Mobile Software Co Ltd
Current assignee: Beijing Xiaomi Mobile Software Co Ltd
Priority date: 2017-05-24
Filing date: 2017-05-24
Publication date: 2021-06-29
Anticipated expiration: 2037-05-24
Also published as: CN107229968A

Abstract

The disclosure relates to a gradient parameter determination method, a gradient parameter determination device and a computer readable storage medium, and relates to the technical field of image processing, wherein the method comprises the following steps: receiving a first gradient transferred by a next convolutional layer of a specified fully-connected layer through the specified fully-connected layer in the convolutional neural network model to be trained, wherein the specified fully-connected layer is located at a specified position among a plurality of convolutional layers included in the convolutional neural network model, and determining a second gradient through the specified fully-connected layer. And performing summation operation on the first gradient and the second gradient to obtain a third gradient, and determining the third gradient as a gradient parameter for training the convolutional neural network model. After the first gradient and the second gradient are subjected to summation operation, the gradient parameter is enhanced, so that the determined gradient parameter can be transmitted more deeply, and the convergence speed of the algorithm is increased.

Description

Gradient parameter determination method, gradient parameter determination device and computer-readable storage medium

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to a method and an apparatus for determining gradient parameters, and a computer-readable storage medium.

Background

With the rapid development of image processing technology, convolutional neural network models are widely used in image recognition, for example, if an image to be recognized is input into a trained convolutional neural network model, the class of the image can be recognized by the convolutional neural network model. For example, an image of a cat is input into a convolutional neural network model which is trained, and the image can be identified as a cat by the convolutional neural network model.

In order to successfully realize image recognition, a convolutional neural network model is usually trained in advance based on a training image, and the convolutional neural network model is generally composed of a plurality of convolutional layers, a plurality of activation layers, a plurality of pooling layers and a plurality of fully-connected layers in series. The training process of the convolutional neural network model comprises the following steps: and inputting a training image at an input layer of the convolutional neural network model, identifying the training image through the convolutional neural network model to be trained, and outputting the prediction class probability from an output layer. Then, based on the class probability error between the prediction class probability and the initial class probability, determining the gradient parameter of each layer, and adjusting the convolutional neural network model including the initial model parameter of each layer based on the gradient parameter of each layer. In practical implementation, in order to increase the accuracy of image recognition, deep training of the convolutional neural network model is generally required, and a method adopted is to increase the number of convolutional layers in the convolutional neural network model.

Disclosure of Invention

To overcome the problems in the related art, the present disclosure provides a gradient parameter determination method, apparatus, and computer-readable storage medium.

In a first aspect, a gradient parameter determination method is provided, the method comprising:

receiving a first gradient transmitted by a next convolutional layer of a specified fully-connected layer through the specified fully-connected layer in a convolutional neural network model to be trained, wherein the specified fully-connected layer is located at a specified position among a plurality of convolutional layers included in the convolutional neural network model, and the next convolutional layer of the specified fully-connected layer is close to an output layer of the convolutional neural network model;

determining a second gradient through the designated fully-connected layer, wherein the second gradient is determined based on a first class probability error, the first class probability error is an error between a first prediction class probability and an initial class probability, and the first prediction class probability is obtained by performing recognition processing on a training image through a plurality of layers above the designated fully-connected layer in the convolutional neural network model;

carrying out summation operation on the first gradient and the second gradient to obtain a third gradient;

determining the third gradient as a gradient parameter for training the convolutional neural network model.

Optionally, the determining a second gradient through the specified fully-connected layer comprises:

identifying the training image through a plurality of layers above the specified fully-connected layer in the convolutional neural network model to obtain the first prediction class probability;

determining a difference between the first prediction class probability and the initial class probability to obtain the first class probability error;

and determining the second gradient by adopting a specified gradient descent method through the specified fully-connected layer based on the first-class probability error.

Optionally, before receiving, by a designated fully-connected layer in the convolutional neural network model to be trained, a first gradient delivered by a next convolutional layer of the designated fully-connected layer, the method further includes:

identifying the training image through all layers included in the convolutional neural network model to obtain a second prediction type probability;

determining a difference between the second prediction class probability and the initial class probability to obtain a second class probability error;

and determining the first gradient by adopting a specified gradient descent method through a convolution layer positioned next to the specified fully-connected layer in the convolution neural network model based on the second category probability error.

Optionally, after determining the third gradient as a gradient parameter for training the convolutional neural network model, the method further includes:

determining the product of the gradient length of the third gradient and a designated coefficient to obtain a moving step length, and moving the model parameter of the designated full-connection layer to the gradient direction of the third gradient by the moving step length, wherein the designated coefficient is any preset coefficient;

passing the third gradient to a last convolutional layer of the designated fully-connected layer to pass gradient parameters.

Optionally, when the model parameters included in the convolutional neural network model are initial model parameters, the initial model parameters are any preset parameters.

In a second aspect, there is provided a gradient parameter determination apparatus, the apparatus comprising:

the convolutional neural network model comprises a receiving module, a calculating module and a training module, wherein the receiving module is used for receiving a first gradient transmitted by a next convolutional layer of a specified fully-connected layer through the specified fully-connected layer in the convolutional neural network model to be trained, the specified fully-connected layer is located at a specified position among a plurality of convolutional layers included in the convolutional neural network model, and the next convolutional layer of the specified fully-connected layer is close to an output layer of the convolutional neural network model;

a first determining module, configured to determine a second gradient through the designated fully-connected layer, where the second gradient is determined based on a first class probability error, where the first class probability error is an error between a first prediction class probability and an initial class probability, and the first prediction class probability is obtained by performing recognition processing on a training image through multiple layers located above the designated fully-connected layer in the convolutional neural network model;

the operation module is used for carrying out summation operation on the first gradient received by the receiving module and the second gradient determined by the first determination module to obtain a third gradient;

and the second determining module is used for determining the third gradient obtained by the operation module as a gradient parameter for training the convolutional neural network model.

Optionally, the first determining module is configured to:

Optionally, the apparatus further comprises:

the recognition processing module is used for recognizing the training image through all layers included by the convolutional neural network model to obtain a second prediction type probability;

a third determining module, configured to determine a difference between the second prediction class probability and the initial class probability to obtain a second class probability error;

and a fourth determining module, configured to determine the first gradient by using a specified gradient descent method through a next convolutional layer located in the specified fully-connected layer in the convolutional neural network model based on the second class probability error.

Optionally, the apparatus further comprises:

a fifth determining module, configured to determine a product between a gradient length of the third gradient and a specified coefficient, to obtain a moving step length, and move the model parameter of the specified fully-connected layer by the moving step length in a gradient direction of the third gradient, where the specified coefficient is any preset coefficient;

and the transfer module is used for transferring the third gradient to the convolution layer above the specified fully-connected layer so as to transfer gradient parameters.

In a third aspect, there is provided a gradient parameter determination apparatus, the apparatus comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to:

In a fourth aspect, a computer-readable storage medium having instructions stored thereon is provided, wherein the instructions when executed by a processor implement the steps of:

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:

the disclosed embodiments add a designated fully-connected layer between the plurality of convolutional layers included in the convolutional neural network model. And identifying the training image through a plurality of layers above the appointed fully-connected layer to obtain a first prediction class probability, determining an error between the first prediction class probability and the initial class probability through the appointed fully-connected layer, and determining a second gradient based on the error. And when the appointed full-connection layer receives a first gradient transmitted by a next convolutional layer close to an output layer of the convolutional neural network model, summing the first gradient and a second gradient to obtain a third gradient, and determining the third gradient as a gradient parameter for training the convolutional neural network model. After the first gradient and the second gradient are subjected to summation operation, the gradient parameter is enhanced, so that the determined gradient parameter can be transmitted more deeply, and the convergence speed of the algorithm is increased.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

FIG. 1 is a flow chart illustrating a gradient parameter determination method according to an exemplary embodiment.

FIG. 2A is a flow diagram illustrating a gradient parameter determination method according to another exemplary embodiment.

Fig. 2B is a schematic diagram of the connection relationship between layers in a convolutional neural network model according to the embodiment of fig. 2A.

Fig. 3A is a block diagram illustrating a gradient parameter determination apparatus according to an exemplary embodiment.

Fig. 3B is a block diagram illustrating another gradient parameter determination apparatus according to an example embodiment.

Fig. 3C is a block diagram illustrating another gradient parameter determination apparatus according to an example embodiment.

Fig. 4 is a block diagram illustrating a gradient parameter determination apparatus 400 according to an example embodiment.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

Before explaining the embodiments of the present disclosure in detail, terms referred to in the embodiments of the present disclosure are briefly described:

a convolutional neural network model: is a feedforward neural network, generally composed of a plurality of convolutional layers and a plurality of fully-connected layers. Of course, the convolutional neural network model includes, among other things, a plurality of activation layers and a plurality of pooling layers. In a particular implementation, a back propagation algorithm may be used to train the convolutional neural network model.

Prediction class probability: the probability that the training image belongs to the preset category is determined. Wherein, this preset classification can be customized by technical staff according to actual demand, for example, this preset classification can include "cat", "dog", "bear", "lion", "tiger" etc.. The prediction class probability is obtained by carrying out recognition processing on a training image through a convolutional neural network model to be trained.

Initial class probability: the initial class probability can be generally set by technicians according to actual requirements in a self-defined mode, and the initial class probability can also be generally called the true class probability of the training image.

Model parameters: the model parameters of the convolutional neural network model generally include convolutional kernel of convolutional layer, weight matrix of full connection layer, and the like, and are mainly used for recognition processing of training images.

Next, an application scenario of the embodiment of the present disclosure will be described. At present, in order to improve the accuracy of the convolutional neural network model in image recognition, the number of convolutional layers is generally increased in the convolutional neural network model, so as to perform deep training on the convolutional neural network model. However, as the number of convolutional layers increases, the gradient parameters become smaller and smaller during the transmission process, so that the model parameters of the lower layer network are updated at a slow speed, and even cannot converge. Therefore, the embodiment of the disclosure provides a gradient parameter determining method, which increases a designated full-link layer at a designated position among a plurality of convolutional layers and enhances a gradient parameter through the designated full-link layer, so that the gradient parameter can be transmitted farther, the convergence speed of an algorithm is increased, and the problem that the algorithm cannot be converged due to deep training is avoided. The method provided by the embodiment of the present disclosure may be executed by a terminal, which may be a device such as a tablet computer, a computer, and the like, and the embodiment of the present disclosure is not limited thereto.

Fig. 1 is a flowchart illustrating a gradient parameter determining method according to an exemplary embodiment, which is used in a terminal, as shown in fig. 1, and includes the following steps:

in step 101, a first gradient passed by a next convolutional layer of a designated fully-connected layer is received through the designated fully-connected layer in a convolutional neural network model to be trained, the designated fully-connected layer is located at a designated position between a plurality of convolutional layers included in the convolutional neural network model, and the next convolutional layer of the designated fully-connected layer is close to an output layer of the convolutional neural network model.

In step 102, a second gradient is determined through the designated fully-connected layer, where the second gradient is determined based on a first class probability error, where the first class probability error is an error between a first prediction class probability and an initial class probability, and the first prediction class probability is obtained by performing recognition processing on a training image through multiple layers located above the designated fully-connected layer in the convolutional neural network model.

In step 103, the first gradient and the second gradient are summed to obtain a third gradient.

In step 104, the third gradient is determined as a gradient parameter for training the convolutional neural network model.

In the disclosed embodiment, a designated fully-connected layer is added between the plurality of convolutional layers included in the convolutional neural network model. And identifying the training image through a plurality of layers above the appointed fully-connected layer to obtain a first prediction class probability, determining an error between the first prediction class probability and the initial class probability through the appointed fully-connected layer, and determining a second gradient based on the error. And when the appointed full-connection layer receives a first gradient transmitted by a next convolutional layer close to an output layer of the convolutional neural network model, summing the first gradient and a second gradient to obtain a third gradient, and determining the third gradient as a gradient parameter for training the convolutional neural network model. After the first gradient and the second gradient are subjected to summation operation, the gradient parameter is enhanced, so that the determined gradient parameter can be transmitted more deeply, and the convergence speed of the algorithm is increased.

Optionally, determining a second gradient through the specified fully-connected layer includes:

determining a difference between the first prediction class probability and the initial class probability to obtain a first class probability error;

and determining the first gradient by adopting a specified gradient descent method through a convolution layer next to the specified fully-connected layer in the convolution neural network model based on the second class probability error.

determining the product of the gradient length of the third gradient and a designated coefficient to obtain a moving step length, and moving the model parameter of the designated full-connected layer to the gradient direction of the third gradient by the moving step length, wherein the designated coefficient is any preset coefficient;

passing the third gradient to a convolutional layer that is immediately above the designated fully-connected layer to pass gradient parameters.

All the above optional technical solutions can be combined arbitrarily to form optional embodiments of the present disclosure, and the embodiments of the present disclosure are not described in detail again.

Fig. 2A is a diagram illustrating a gradient parameter determining method according to another exemplary embodiment, where as shown in fig. 2A, the gradient parameter determining method is applied in a terminal, and the gradient parameter determining method may include the following implementation steps:

in step 201, the training image is identified through all layers included in the convolutional neural network model, so as to obtain a second prediction class probability.

Wherein the training image may be pre-stored in the terminal by a technician. When the convolutional neural network model needs to be trained, the terminal acquires the training image from the local. As described above, the convolutional neural network model mainly includes a plurality of convolutional layers and a plurality of fully-connected layers. Referring to fig. 2B, in the training process, the terminal inputs the acquired training image from the input layer, and performs recognition processing on the training image through all layers included in the convolutional neural network model to obtain a second prediction type probability, and in a specific implementation, the output layer of the convolutional neural network model outputs the second prediction type probability. Wherein the all layers include all convolutional layers and all fully-connected layers.

Of course, the active layer and the pooling layer are also included in all the layers, and since the active layer and the pooling layer include constants and do not include model parameters, the active layer and the pooling layer are not emphasized or described herein.

The implementation process of performing recognition processing on the training image through all layers included in the convolutional neural network model may refer to related technologies, which is not limited in the embodiment of the present disclosure.

In addition, in practice, the input layer may be regarded as a convolutional layer of the first layer of the convolutional neural network model, and the output layer may be regarded as a fully-connected layer of the last layer of the convolutional neural network model. To facilitate the distinction between the upper and lower layers, they are generally referred to as input layers and output layers.

In step 202, the difference between the second predicted class probability and the initial class probability is determined, resulting in a second class probability error.

In practical implementation, after determining the second prediction class probability, the terminal may compare the second prediction class probability with the initial class probability to determine whether training of the convolutional neural network model is required to be continued. For example, when the terminal determines that the difference between the second prediction class probability and the initial class probability is greater than or equal to a predetermined threshold, it indicates that the capability of the convolutional neural network model identification is not in accordance with the actual requirement, in which case, the terminal continues to adjust the model parameters of the convolutional neural network model by an iterative method based on the obtained error of the second class probability, as described below.

On the contrary, if the difference between the second prediction class probability and the initial class probability is smaller than the preset threshold, it indicates that the capability of the convolutional neural network model identification still meets the actual requirement, that is, it indicates that the convolutional neural network model can accurately identify the class of the image, and in this case, it can be determined that the training of the convolutional neural network model is completed.

The preset threshold may be set by a technician in a user-defined manner according to actual needs, or may be set by a terminal in a default manner, which is not limited in the embodiment of the present disclosure.

It should be noted that, the above-mentioned determining whether to complete the training according to the difference between the second prediction class probability and the initial class probability is only an example, in another embodiment, it may also be determined whether to complete the training according to the number of iterations, for example, when the number of iterations reaches a preset number, it is determined that the training of the convolutional neural network model is completed, otherwise, the training of the convolutional neural network model is continued.

The preset times may be set by a technician in a user-defined manner according to actual needs, or may also be set by the terminal in a default manner, which is not limited in the embodiment of the present disclosure.

In step 203, the first gradient is determined by a specified gradient descent method through a convolutional layer next to the specified fully-connected layer in the convolutional neural network model based on the second class probability error.

If it is determined according to the method that the training of the convolutional neural network model is not completed, the terminal needs to adjust the model parameters of the convolutional neural network model by specifying a gradient descent method based on the second class probability error.

That is, after the terminal determines the second class probability error through the convolutional neural network model, the terminal reversely propagates the second class probability error to the output layer of the convolutional neural network model, and determines the gradient parameter of the output layer through the output layer by using a specified gradient descent method. The output layer then adjusts the model parameters of the output layer using the gradient parameters and passes the determined gradient parameters to the last convolutional layer of the output layer.

It should be noted that, in a specific implementation, the specified Gradient Descent method may be an SGD (Stochastic Gradient Descent) method. Of course, the specific gradient descent method may be other gradient descent methods, and the implementation of the present disclosure is not limited thereto.

And after the last convolution layer of the output layer receives the gradient parameter, continuously adopting a specified gradient descent method based on the gradient parameter, determining the gradient parameter again, and adjusting the model parameter of the convolution layer based on the determined gradient parameter again. Thereafter, the last convolutional layer of the output layer continues to pass the determined gradient parameter to the last convolutional layer of the output layer.

The process is performed as described above until the gradient parameter is passed to the next convolutional layer of the designated fully-connected layer, the next convolutional layer of the designated fully-connected layer determines a first gradient using a designated gradient descent method based on the passed gradient parameter, and the first gradient is given to the designated fully-connected layer.

For example, referring to fig. 2B, after the convolutional layer next to the designated fully-connected layer, i.e., the convolutional layer of the 20 th layer receives the gradient parameter passed by the convolutional layer of the 21 st layer, the first gradient is determined by a designated gradient descent method based on the passed gradient parameter. Then, adjusting the model parameters of the convolutional layer of the 20 th layer based on the first gradient through the convolutional layer of the 20 th layer, and transmitting the first gradient to the specified fully-connected layer.

In practical implementation, a technician may add a designated fully-connected layer at a designated position between the plurality of convolutional layers according to actual needs, for example, add a designated fully-connected layer between the convolutional layer of the 20 th layer and the convolutional layer of the 19 th layer. Generally, the designated position is a position where the gradient parameter is smaller than a certain threshold value, that is, in order to enable the smaller gradient parameter to continue to be transmitted to the lower layer, a designated fully-connected layer may be added at the position to perform enhancement processing on the smaller gradient parameter. In practical applications, this specified fully-connected layer is also commonly referred to as a branch supervisor.

It should be noted that, the number of the designated positions is not limited in the embodiment of the present disclosure, that is, in an actual implementation process, the designated fully-connected layer may be added at a plurality of designated positions between the plurality of convolutional layers, for example, the designated fully-connected layer may be added between the convolutional layer of the 100 th layer and the convolutional layer of the 99 th layer, and the designated fully-connected layer may be added between the convolutional layer of the 20 th layer and the convolutional layer of the 19 th layer.

In addition, it should be noted that, the number of the designated full-connection layers added at the designated position is also not limited in the embodiment of the present disclosure, for example, 3 designated full-connection layers may be connected in series at the designated position.

Referring to fig. 2B, fig. 2B exemplarily shows a connection relationship between layers included in a convolutional neural network model in which a plurality of convolutional layers are included, assuming that a designated fully-connected layer is added between a convolutional layer of a 20 th layer and a convolutional layer of a 19 th layer, and the number of the designated fully-connected layers 21a is 3.

In step 204, a first gradient passed by a next convolutional layer of a given fully-connected layer in the convolutional neural network model to be trained is received.

As described above, the designated fully-connected layer is located at a designated position between convolutional layers included in the convolutional neural network model, and a next convolutional layer of the designated fully-connected layer is close to an output layer of the convolutional neural network model.

In step 205, a second gradient is determined through the designated fully-connected layer, where the second gradient is determined based on a first class probability error, where the first class probability error is an error between a first prediction class probability and an initial class probability, and the first prediction class probability is obtained by performing recognition processing on a training image through multiple layers located above the designated fully-connected layer in the convolutional neural network model.

Since the first gradient is passed through multiple layers, the first gradient is usually small, and if the downward pass is continued, the algorithm may not be converged, and therefore, the first gradient needs to be adjusted to re-determine the gradient parameters of the downward pass.

To this end, the designated fully-connected layer, upon receiving the first gradient, determines a second gradient. In a specific implementation, the terminal performs recognition processing on the training image through a plurality of layers above the designated fully-connected layer in the convolutional neural network model to obtain the first prediction class probability, determines a difference between the first prediction class probability and the initial class probability to obtain the first class probability error, and determines the second gradient through the designated fully-connected layer by using a designated gradient descent method based on the first class probability error.

For example, referring to fig. 2B, the terminal performs recognition processing on the training image through the multiple layers above the designated fully-connected layer 21a and close to the input layer to obtain the first prediction class probability. Calculating a first class probability error between the first prediction class probability and the initial class probability through the designated fully-connected layer 21a, reversely propagating the error, and determining the second gradient by the designated fully-connected layer 21a based on the first class probability error by using a designated gradient descent method.

In fact, as can be seen from the above description, the specified fully-connected layer is equivalent to the output layer of the convolutional neural network model, i.e., in the embodiment of the present disclosure, two class probability errors, i.e., a first class probability error and a second class probability error, respectively, need to be determined. The first class of probabilistic errors is determined by a designated fully-connected layer disposed between the plurality of convolutional layers, and the second class of probabilistic errors is determined by an output layer of the convolutional neural network model.

In step 206, the first gradient and the second gradient are summed to obtain a third gradient.

After the first gradient and the second gradient are subjected to summation operation, the first gradient can be enhanced, namely, the backward-propagated gradient parameter is enhanced by the appointed full-connection layer, so that the problem of gradient parameter reduction along with depth training is solved.

It should be noted that, regarding the implementation process of summing the first gradient and the second gradient, reference may be made to a gradient algorithm, which is not limited by the embodiment of the present disclosure.

In step 207, the third gradient is determined as a gradient parameter for training the convolutional neural network model.

That is, in the subsequent gradient parameter transferring process, the terminal determines the third gradient as the gradient parameter to be transferred. In this way, the gradient parameters can be well propagated to the lower layers of the network, and the convergence speed of the convolutional neural network model is accelerated.

Therefore, the gradient parameter determination method provided by the embodiment of the disclosure is realized. Further, to facilitate deep understanding, the embodiments of the present disclosure also provide the following step 208 and step 209.

In step 208, a product between the gradient length of the third gradient and a predetermined coefficient is determined to obtain a moving step, and the model parameter of the designated fully-connected layer is moved by the moving step toward the gradient direction of the third gradient, where the predetermined coefficient is any preset coefficient.

As previously described, this third gradient is passed during the subsequent gradient parameter passes. In practical implementation, the model parameter of the specified fully-connected layer may be adjusted based on the third gradient, that is, after the third gradient is determined, the gradient length of the third gradient may be multiplied by a specified coefficient, so as to obtain the moving step. And moving the moving step length to the gradient direction of the third gradient according to the model parameter of the specified fully-connected layer, so as to realize the adjustment of the model parameter of the specified fully-connected layer.

It should be noted that, when the model parameters included in the convolutional neural network model are initial model parameters, the initial model parameters are any preset parameters.

That is, in the embodiment of the present disclosure, since a designated fully-connected layer may be added at a designated position between convolutional layers included in the convolutional neural network model, the convolutional neural network model may be deeply trained without limiting the number of convolutional layers included in the convolutional neural network model. Therefore, the initial model parameters of the convolutional neural network model, which may be any parameters, may not be limited here.

Similarly, the specific coefficient may not be limited in the embodiment of the present disclosure, that is, the specific coefficient may be any coefficient set in advance.

In step 209, the third gradient is passed to the convolutional layer that is immediately above the designated fully-connected layer to pass gradient parameters.

After the third gradient is obtained by adjusting the first gradient through the designated fully-connected layer, the third gradient may be continuously transmitted to the lower layer of the network, for example, please continue to refer to fig. 2B, and the designated fully-connected layer 21a may transmit the third gradient to the last convolutional layer of the designated fully-connected layer, i.e., to the 19 th layer.

Further, after the convolution layer of the 19 th layer receives the third gradient, a fourth gradient is determined by a specified gradient descent method based on the third gradient. Then, based on the fourth gradient, model parameters of the 19 th convolutional layer are adjusted, and the fourth gradient is transferred to the 18 th convolutional layer. And completing one adjustment of the parameters of the convolutional neural network model until the gradient parameters are transmitted to an input layer.

Further, the convolutional neural network model continues to perform recognition processing on the training image based on the adjusted model parameters, and adjusts the model parameters included in the convolutional neural network model again according to the execution process. As described above, the training of the convolutional neural network model is determined to be completed until the difference between the second prediction class probability and the initial class probability is smaller than a certain preset threshold, or when the iteration number reaches a preset number.

Fig. 3A is a block diagram illustrating a gradient parameter determination apparatus according to an exemplary embodiment. Referring to fig. 3A, the apparatus includes a receiving module 310, a first determining module 312, and an operating module 314 and a second determining module 316.

The receiving module 310 is configured to receive, through a designated fully-connected layer in the convolutional neural network model to be trained, a first gradient transferred by a next convolutional layer of the designated fully-connected layer, where the designated fully-connected layer is located at a designated position between a plurality of convolutional layers included in the convolutional neural network model, and the next convolutional layer of the designated fully-connected layer is close to an output layer of the convolutional neural network model.

The first determining module 312 is configured to determine a second gradient through the designated fully-connected layer, where the second gradient is determined based on a first class probability error, where the first class probability error is an error between a first prediction class probability and an initial class probability, and the first prediction class probability is obtained by performing recognition processing on a training image through multiple layers of the convolutional neural network model located above the designated fully-connected layer.

The operation module 314 is configured to perform a summation operation on the first gradient received by the receiving module 310 and the second gradient determined by the first determining module 312 to obtain a third gradient.

A second determining module 316, configured to determine the third gradient obtained by the operation module 314 as a gradient parameter for training the convolutional neural network model.

Optionally, the first determining module 312 is configured to:

Optionally, referring to fig. 3B, the apparatus further includes:

the recognition processing module 318 is configured to perform recognition processing on the training image through all layers included in the convolutional neural network model to obtain a second prediction class probability;

a third determining module 320, configured to determine a difference between the second prediction class probability and the initial class probability to obtain a second class probability error;

a fourth determining module 322, configured to determine the first gradient by using a specified gradient descent method through a convolutional layer next to the specified fully-connected layer in the convolutional neural network model based on the second class probability error.

Optionally, referring to fig. 3C, the apparatus further includes:

a fifth determining module 324, configured to determine a product between the gradient length of the third gradient and a specified coefficient, to obtain a moving step length, and move the model parameter of the specified fully-connected layer by the moving step length in the gradient direction of the third gradient, where the specified coefficient is any preset coefficient;

a passing module 326 for passing the third gradient to a convolutional layer above the designated fully-connected layer to pass gradient parameters.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Fig. 4 is a block diagram illustrating a gradient parameter determination apparatus 400 according to an example embodiment. For example, the apparatus 400 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

Referring to fig. 4, the apparatus 400 may include one or more of the following components: processing components 402, memory 404, power components 406, multimedia components 408, audio components 410, input/output (I/O) interfaces 412, sensor components 414, and communication components 416.

The processing component 402 generally controls overall operation of the apparatus 400, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 402 may include one or more processors 420 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 402 can include one or more modules that facilitate interaction between the processing component 402 and other components. For example, the processing component 402 can include a multimedia module to facilitate interaction between the multimedia component 408 and the processing component 402.

The memory 404 is configured to store various types of data to support operations at the apparatus 400. Examples of such data include instructions for any application or method operating on the device 400, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 404 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

Power supply components 406 provide power to the various components of device 400. The power components 406 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power supplies for the apparatus 400.

The multimedia component 408 includes a screen that provides an output interface between the device 400 and the user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 408 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the apparatus 400 is in an operation mode, such as a photographing mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 410 is configured to output and/or input audio signals. For example, audio component 410 includes a Microphone (MIC) configured to receive external audio signals when apparatus 400 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 404 or transmitted via the communication component 416. In some embodiments, audio component 410 also includes a speaker for outputting audio signals.

The I/O interface 412 provides an interface between the processing component 402 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor component 414 includes one or more sensors for providing various aspects of status assessment for the apparatus 400. For example, the sensor assembly 414 may detect an open/closed state of the apparatus 400, the relative positioning of the components, such as a display and keypad of the apparatus 400, the sensor assembly 414 may also detect a change in the position of the apparatus 400 or a component of the apparatus 400, the presence or absence of user contact with the apparatus 400, orientation or acceleration/deceleration of the apparatus 400, and a change in the temperature of the apparatus 400. The sensor assembly 414 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 414 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 414 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 416 is configured to facilitate wired or wireless communication between the apparatus 400 and other devices. The apparatus 400 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 416 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 416 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 400 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the methods provided by the embodiments illustrated in fig. 1 or fig. 2A and described above.

In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 404 comprising instructions, executable by the processor 420 of the apparatus 400 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

A non-transitory computer readable storage medium having instructions therein which, when executed by a processor of a mobile terminal, enable the mobile terminal to perform a gradient parameter determination method, the method comprising:

receiving a first gradient transmitted by a next convolutional layer of a specified fully-connected layer through the specified fully-connected layer in the convolutional neural network model to be trained, wherein the specified fully-connected layer is located at a specified position among a plurality of convolutional layers included in the convolutional neural network model, and the next convolutional layer of the specified fully-connected layer is close to an output layer of the convolutional neural network model;

determining a second gradient through the designated fully-connected layer, wherein the second gradient is determined based on a first class probability error, the first class probability error is an error between a first prediction class probability and an initial class probability, and the first prediction class probability is obtained by performing recognition processing on a training image through a plurality of layers positioned above the designated fully-connected layer in the convolutional neural network model;

summing the first gradient and the second gradient to obtain a third gradient;

the third gradient is determined as a gradient parameter for training the convolutional neural network model.

Optionally, the determining a second gradient through the specified fully-connected layer includes:

Optionally, after determining the third gradient as the gradient parameter for training the convolutional neural network model, the method further includes:

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method for image recognition based on a convolutional neural network model is applied to a terminal, and the method comprises the following steps:

identifying the training image through all layers included in the convolutional neural network model to be trained to obtain a second prediction category probability;

determining a difference value between the second prediction category probability and the initial category probability to obtain a second category probability error;

determining a first gradient by adopting a specified gradient descent method through a next convolutional layer positioned in a specified fully-connected layer in the convolutional neural network model on the basis of the second category probability error, wherein the specified fully-connected layer is positioned at a specified position among a plurality of convolutional layers included in the convolutional neural network model, and the next convolutional layer of the specified fully-connected layer is close to an output layer of the convolutional neural network model;

receiving, by the designated fully-connected layer, the first gradient communicated by the next convolutional layer;

determining a second gradient through the designated fully-connected layer, wherein the second gradient is determined based on a first class probability error, the first class probability error is an error between a first prediction class probability and the initial class probability, and the first prediction class probability is obtained by performing recognition processing on a training image through a plurality of layers above the designated fully-connected layer in the convolutional neural network model;

determining the third gradient as a gradient parameter for training the convolutional neural network model;

and training the convolutional neural network model based on the third gradient, and performing image recognition on the image to be recognized based on the trained convolutional neural network model.

2. The method of claim 1, wherein said determining a second gradient through said specified fully-connected layer comprises:

3. The method of claim 1, wherein after determining the third gradient as a gradient parameter for training the convolutional neural network model, further comprising:

4. The method of claim 3, wherein when the convolutional neural network model includes model parameters that are initial model parameters, the initial model parameters are any parameters that are set in advance.

5. An apparatus for image recognition based on a convolutional neural network model, the apparatus being a terminal, the apparatus comprising:

the recognition processing module is used for recognizing the training images through all layers included in the convolutional neural network model to be trained to obtain a second prediction category probability;

a third determining module, configured to determine a difference between the second prediction category probability and the initial category probability to obtain a second category probability error;

a fourth determining module, configured to determine, based on the second category probability error, a first gradient by using a specified gradient descent method through a next convolutional layer located in a specified fully-connected layer in the convolutional neural network model, where the specified fully-connected layer is located at a specified position between multiple convolutional layers included in the convolutional neural network model, and the next convolutional layer of the specified fully-connected layer is close to an output layer of the convolutional neural network model;

a receiving module, configured to receive, through the designated fully-connected layer, the first gradient delivered by the next convolutional layer;

a first determining module, configured to determine a second gradient through the designated fully-connected layer, where the second gradient is determined based on a first class probability error, where the first class probability error is an error between a first prediction class probability and the initial class probability, and the first prediction class probability is obtained by performing recognition processing on a training image through multiple layers of the convolutional neural network model located above the designated fully-connected layer;

a second determining module, configured to determine the third gradient obtained by the operation module as a gradient parameter for training the convolutional neural network model;

the apparatus also includes means for performing the steps of:

6. The apparatus of claim 5, wherein the first determination module is to:

7. The apparatus of claim 5, wherein the apparatus further comprises:

8. The apparatus of claim 7, wherein when the convolutional neural network model includes model parameters that are initial model parameters, the initial model parameters are any parameters that are set in advance.

9. An apparatus for image recognition based on a convolutional neural network model, the apparatus being a terminal, the apparatus comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to perform the method of any of the preceding claims 1-4.

10. A computer-readable storage medium having instructions stored thereon, wherein the instructions, when executed by a processor, implement the method of any of claims 1-4.