CN107194464B

CN107194464B - Training method and device of convolutional neural network model

Info

Publication number: CN107194464B
Application number: CN201710274339.4A
Authority: CN
Inventors: 万韶华
Original assignee: Beijing Xiaomi Mobile Software Co Ltd
Current assignee: Beijing Xiaomi Mobile Software Co Ltd
Priority date: 2017-04-25
Filing date: 2017-04-25
Publication date: 2021-06-01
Anticipated expiration: 2037-04-25
Also published as: CN107194464A

Abstract

The disclosure relates to a training method and a device of a convolutional neural network model, belonging to the technical field of image processing, wherein the method comprises the following steps: and respectively identifying the stored training images through a convolutional neural network model to be trained to obtain a plurality of prediction category probability vectors. And determining the difference value between the prediction class probability vector and the initial class probability vector of the training image to obtain the class probability error vector of the training image for each training image in the plurality of training images. Because the initial class probability vector comprises a plurality of initial class probabilities determined based on the preset disturbance probability, the real class of the training image and the class proportion of the corresponding class, the algorithm convergence speed can be accelerated when the convolutional neural network model is trained based on the class probability error vectors of the training images and the training images, and the accuracy of the trained convolutional neural network model for image identification is ensured.

Description

Training method and device of convolutional neural network model

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to a training method and apparatus for a convolutional neural network model.

Background

With the rapid development of image processing technology, convolutional neural network models are widely used in image recognition, for example, if an image to be recognized is input into a trained convolutional neural network model, the class of the image can be recognized by the convolutional neural network model. For example, an image of a cat is input into a convolutional neural network model which is trained, and the image can be identified as a cat by the convolutional neural network model. In order to successfully implement image recognition, it is usually necessary to train the convolutional neural network model in advance based on a large number of training images, for example, to train the convolutional neural network model in advance based on a large number of animal training images. At present, how to train the convolutional neural network model to ensure the accuracy of image recognition becomes a hotspot of research in the field.

Disclosure of Invention

In order to overcome the problems in the related art, the present disclosure provides a training method and apparatus for a convolutional neural network model.

In a first aspect, a method for training a convolutional neural network model is provided, the method including:

respectively identifying a plurality of stored training images through a convolutional neural network model to be trained to obtain a plurality of prediction category probability vectors, wherein each prediction category probability vector comprises a plurality of prediction category probabilities, and each prediction category probability is the probability that the corresponding training image belongs to each preset category in a plurality of preset categories;

for each training image in the plurality of training images, determining a difference value between a prediction class probability vector and an initial class probability vector of the training image to obtain a class probability error vector of the training image, wherein the initial class probability vector comprises a plurality of initial class probabilities, and the plurality of initial class probabilities are determined based on a preset disturbance probability, a real class of the training image and a class proportion of a corresponding class;

training the convolutional neural network model based on the class probability error vectors of the plurality of training images and the plurality of training images.

Optionally, before determining the difference between the prediction class probability vector and the initial class probability vector of the training image, the method further includes:

for each preset category in the plurality of preset categories, determining a category proportion of the preset category;

determining an initial category probability corresponding to the preset category through the following formula based on the preset disturbance probability, the real category of the training image and the category proportion of the preset category;

P(k)＝λ*δ_y(k)+(1-λ)*p₀(k)；

wherein P (k) represents an initial class probability corresponding to the preset class, λ represents the preset disturbance probability, and p₀(k) A class ratio representing the preset class, wherein the δ is the same when the real class of the training image is the same as the preset class_y(k) Is 1, when the real class of the training image is different from the preset class, δ_y(k) Is 0.

Optionally, training the convolutional neural network model based on the class probability error vectors of the plurality of training images and the plurality of training images comprises:

determining an average class probability error vector of the class probability error vectors for the plurality of training images;

training the convolutional neural network model based on the average class probability error vector and the plurality of training images.

Optionally, training the convolutional neural network model based on the mean class probability error vector and the plurality of training images comprises:

determining the square sum of each element in the category probability error vector of each training image to obtain a plurality of square sums;

when the average value of the square sums is larger than or equal to a preset threshold value, adjusting model parameters included in the convolutional neural network model based on the average class probability error vector;

and respectively carrying out recognition processing on the training images through the convolutional neural network after model parameters are adjusted to obtain a plurality of prediction type probability vectors, returning to the step of determining the difference value between the prediction type probability vector and the initial type probability vector of the training image for each training image in the training images to obtain the type probability error vector of the training image until the average value of a plurality of square sums determined through the type probability error vector of each training image is smaller than the preset threshold value.

In a second aspect, there is provided an apparatus for training a convolutional neural network model, the apparatus comprising:

the recognition processing module is used for respectively recognizing the stored training images through a convolutional neural network model to be trained to obtain a plurality of prediction category probability vectors, each prediction category probability vector comprises a plurality of prediction category probabilities, and each prediction category probability is the probability that the corresponding training image belongs to each preset category in a plurality of preset categories;

a first determining module, configured to determine, for each training image in the multiple training images, a difference between a prediction class probability vector and an initial class probability vector of the training image identified by the identifying and processing module to obtain a class probability error vector of the training image, where the initial class probability vector includes multiple initial class probabilities, and the multiple initial class probabilities are determined based on a preset perturbation probability, a true class of the training image, and a class proportion of a corresponding class;

a training module, configured to train the convolutional neural network model based on the class probability error vectors of the plurality of training images determined by the first determining module and the plurality of training images.

Optionally, the apparatus further comprises:

a second determining module, configured to determine, for each of the multiple preset categories, a category proportion of the preset category;

a third determining module, configured to determine, based on the preset disturbance probability, the real category of the training image, and the category proportion of the preset category, an initial category probability corresponding to the preset category according to the following formula;

P(k)＝λ*δ_y(k)+(1-λ)*p₀(k)；

wherein P (k) represents an initial value corresponding to the preset categoryClass probability, λ represents the preset disturbance probability, p₀(k) A class ratio representing the preset class, wherein the δ is the same when the real class of the training image is the same as the preset class_y(k) Is 1, when the real class of the training image is different from the preset class, δ_y(k) Is 0.

Optionally, the training module comprises:

a determining sub-module for determining an average class probability error vector of the class probability error vectors of the plurality of training images;

and the training submodule is used for training the convolutional neural network model based on the average class probability error vector and the training images.

Optionally, the training submodule is configured to:

In a third aspect, an apparatus for training a convolutional neural network model is provided, the apparatus comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to:

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects: after the stored training images are respectively identified through a convolutional neural network model to be trained to obtain a plurality of prediction category probability vectors, determining the difference value between the prediction category probability vector and the initial category probability vector of each training image to obtain the category probability error vector of each training image. The initial category probability vectors of the training images are obtained by determining the initial category probabilities included in the initial category probability vectors of the training images based on the preset disturbance probability, the real categories corresponding to the training images and the category proportion corresponding to the categories, so that the algorithm convergence speed can be increased when the convolutional neural network model is trained based on the category probability error vectors of the training images and the training images, and the accuracy of the trained convolutional neural network model for image recognition is guaranteed.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

FIG. 1 is a flow diagram illustrating a method of training a convolutional neural network model in accordance with an exemplary embodiment;

FIG. 2 is a flow diagram illustrating a method of training a convolutional neural network model in accordance with another exemplary embodiment;

FIG. 3A is a block diagram illustrating a convolutional neural network model training apparatus in accordance with an exemplary embodiment;

FIG. 3B is a block diagram illustrating a convolutional neural network model training apparatus in accordance with another exemplary embodiment;

FIG. 4 is a block diagram illustrating a convolutional neural network model training apparatus 400, according to an example embodiment.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

Before explaining and explaining the embodiments of the present disclosure in detail, terms related to the embodiments of the present disclosure are briefly described:

a convolutional neural network model: is a feed-forward neural network, generally consisting of one or more convolutional layers and a top fully-connected layer, although, in addition, the convolutional neural network model also includes associated weights and pooling layers. In a particular implementation, a back propagation algorithm may be used to train the convolutional neural network model.

Prediction class probability vector: comprising a plurality of prediction class probabilities, i.e. one prediction class probability per element in a prediction class probability vector. Each prediction class probability is the probability that the corresponding training image belongs to each of a plurality of preset classes. The plurality of preset categories can be set by technicians according to actual needs in a user-defined manner, for example, the plurality of preset categories can include "cat", "dog", "bear", "lion", "tiger" and the like, and then a certain prediction category probability can be the probability that the corresponding training image belongs to the "cat". And each prediction type probability vector is obtained by identifying and processing the corresponding training image through the convolutional neural network model to be trained.

Initial class probability vector: comprising a plurality of initial class probabilities, i.e. one for each element in the initial class probability vector. In the embodiment of the present disclosure, the multiple initial class probabilities are determined based on the preset disturbance probability, the true class of the training image, and the class proportion of the corresponding class, and the specific implementation process is described in step 202 in the embodiment of fig. 2 below.

The class proportion is as follows: the proportion of each preset category in the plurality of preset categories.

Model parameters: the model parameters of the convolutional neural network model generally include convolutional kernel of convolutional layer, weight matrix of full connection layer, and the like, and are mainly used for recognition processing of training images.

Next, an application scenario of the embodiment of the present disclosure will be explained. At present, in order to make the prediction class recognized by the trained convolutional neural network model in the image recognition process be the same as the true class of the image, in the training process, for each training image in a plurality of training images, the initial class probability corresponding to the true class in the initial class probability vector of the training image is usually set to be 1, and the other initial class probabilities are set to be 0. However, when the convolutional neural network model is deeply trained after the setting, the algorithm is prone to not be converged quickly, and after the training is finished, the generalization capability of the convolutional neural network model is prone to being damaged, so that the recognition capability of the trained convolutional neural network model is affected. Therefore, the embodiment of the disclosure provides a training method of a convolutional neural network model, which can accelerate the convergence speed of the algorithm and improve the generalization capability of the convolutional neural network model, thereby ensuring the accuracy of the trained convolutional neural network model for image recognition. The training method of the convolutional neural network model is executed by a terminal, and the terminal can train the convolutional neural network model to be trained based on a plurality of training images. The terminal may be a terminal such as a computer, a tablet computer, and the like, which is not limited in this disclosure.

Fig. 1 is a flowchart illustrating a method for training a convolutional neural network model according to an exemplary embodiment, where the method for training the convolutional neural network model is used in a terminal, as shown in fig. 1, and includes the following steps.

In step 101, a plurality of stored training images are respectively identified through a convolutional neural network model to be trained to obtain a plurality of prediction category probability vectors, each prediction category probability vector includes a plurality of prediction category probabilities, and each prediction category probability is a probability that a corresponding training image belongs to each preset category in a plurality of preset categories.

In step 102, for each training image in the plurality of training images, determining a difference between a prediction class probability vector and an initial class probability vector of the training image to obtain a class probability error vector of the training image, where the initial class probability vector includes a plurality of initial class probabilities, and the plurality of initial class probabilities are determined based on a preset perturbation probability, a true class of the training image, and a class ratio of a corresponding class.

In step 103, the convolutional neural network model is trained based on the class probability error vectors of the training images and the training images.

In the embodiment of the disclosure, after the stored training images are respectively identified by the convolutional neural network model to be trained to obtain a plurality of prediction class probability vectors, a difference value between the prediction class probability vector and the initial class probability vector of each training image is determined to obtain a class probability error vector of each training image. The initial category probability vectors of the training images are obtained by determining the initial category probabilities included in the initial category probability vectors of the training images based on the preset disturbance probability, the real categories corresponding to the training images and the category proportion corresponding to the categories, so that the algorithm convergence speed can be increased when the convolutional neural network model is trained based on the category probability error vectors of the training images and the training images, and the accuracy of the trained convolutional neural network model for image recognition is guaranteed.

for each preset category in the plurality of preset categories, determining the category proportion of the preset category;

based on the preset disturbance probability, the real category of the training image and the category proportion of the preset category, determining an initial category probability corresponding to the preset category through the following formula;

P(k)＝λ*δ_y(k)+(1-λ)*p₀(k)；

wherein P (k) represents an initial class probability corresponding to the predetermined class, λ represents the predetermined perturbation probability, and p₀(k) Representing a class ratio of the preset class, wherein the delta is larger than the delta when the real class of the training image is the same as the preset class_y(k) Is 1, when the real class of the training image is different from the preset class, the delta is_y(k) Is 0.

Optionally, training the convolutional neural network model based on the class probability error vectors of the plurality of training images and the plurality of training images includes:

Optionally, training the convolutional neural network model based on the average class probability error vector and the plurality of training images includes:

All the above optional technical solutions can be combined arbitrarily to form optional embodiments of the present disclosure, and the embodiments of the present disclosure are not described in detail again.

Fig. 2 is a flowchart illustrating a training method of a convolutional neural network model according to another exemplary embodiment, where the training method of the convolutional neural network model is used in a terminal, as shown in fig. 2, and the training method of the convolutional neural network model includes the following steps:

in step 201, a plurality of stored training images are respectively identified by a convolutional neural network model to be trained, so as to obtain a plurality of prediction class probability vectors.

In an actual implementation process, a technician may store a plurality of training images in the terminal according to actual requirements, where the training images are used for training a convolutional neural network model to be trained, for example, the training images may be animal training images, and the number of the training images is usually several million.

In a specific implementation, the terminal may store the plurality of training images in a folder or a list. Next, taking an example that the terminal stores the plurality of training images in a folder, the plurality of training images may be stored in one folder, and the folder may include a plurality of subfolders, where each training image may correspond to one subfolder.

In a specific implementation, each subfolder may be named by an image name of a corresponding training image, and the subfolder may further store image information of the corresponding training image, for example, the image information may be information such as a real category of the training image. Therefore, the terminal can acquire the training image and the real category of the training image from the corresponding subfolder according to the image name of the training image.

Of course, it should be noted that the manner in which the terminal stores the plurality of training images is merely exemplary, and in another embodiment, the terminal may also store the plurality of training images in other manners, which is not limited in this disclosure.

When a convolutional neural network model to be trained needs to be trained, a terminal initializes model parameters of the convolutional neural network model to be trained. And then, the terminal acquires the stored training images, inputs the training images into the convolutional neural network model to be trained, and identifies the training images through the convolutional neural network model to be trained to obtain a plurality of prediction type probability vectors. Wherein each prediction class probability vector of the plurality of prediction class probability vectors corresponds to each training image of the plurality of training images one-to-one.

It should be noted that, the terminal may perform recognition processing on the training images through the convolutional neural network model to be trained, which may include operations of convolution, pooling, activation, and the like, and a specific implementation process thereof may refer to related technologies, which is not described in detail in this disclosure.

As described above, each prediction class probability vector includes a plurality of prediction class probabilities, where each prediction class probability is a probability that the corresponding training image belongs to each of a plurality of preset classes.

For example, the plurality of preset categories may be 1000 preset categories, and the 1000 preset categories may include "cat", "dog", "bear", "lion", "tiger", and the like. In this case, after the stored training images are respectively identified by the convolutional neural network model to be trained, each prediction type probability vector of the obtained prediction type probability vectors includes 1000 prediction type probabilities, for example, the prediction type probability vector of a training image obtained by the identification process is { x }₁,x₂,x₃,x₄,…,x₁₀₀₀In which x₁、x₂、x₃、x₄、x₁₀₀₀Both represent prediction class probabilities.

In step 202, for each of the plurality of training images, a difference between the prediction class probability vector and the initial class probability vector of the training image is determined, and a class probability error vector of the training image is obtained.

It should be noted that, before determining the difference between the prediction class probability vector and the initial class probability vector of the training image, the initial class probability vector of the training image needs to be determined. As described above, the initial class probability vector includes a plurality of initial class probabilities, and the plurality of initial class probabilities are determined based on a preset disturbance probability, a true class of the training image, and a class ratio of a corresponding class.

The preset disturbance probability may be set by a technician in a customized manner according to actual needs, for example, the preset disturbance probability may be set to 0.05. That is, in the embodiment of the present disclosure, the class of the training image is disturbed by the preset disturbance probability, and the convolutional neural network model is trained based on the disturbed initial class probability, so that the problem that the algorithm cannot be converged quickly and the problem that the generalization capability of the convolutional neural network model is insufficient can be avoided.

In addition, as described above, the terminal may acquire the real category of the training image from the subfolder corresponding to the training image according to the image name of the training image.

In a specific implementation, the determining, by the terminal, the initial class probability of the training image based on the preset disturbance probability, the real class of the training image, and the class proportion of the corresponding class may include the following implementation processes (1) to (2):

(1) for each preset category in the plurality of preset categories, determining a category proportion of the preset category.

For example, if the predetermined category is "cat" and the predetermined categories include 500 "cats", the category ratio of the predetermined category "cat" in the predetermined categories is 0.5.

In a specific implementation, the class ratio may also be referred to as a prior probability. That is, in the embodiment of the present disclosure, the terminal determines the initial class probability vector of the training image by calculating the prior probabilities of 1000 preset classes in the training data, so as to improve the positive effect of class disturbance to the greatest extent.

(2) Determining an initial class probability corresponding to the preset class through the following formula (1) based on the preset disturbance probability, the real class of the training image and the class proportion of the preset class;

P(k)＝λ*δ_y(k)+(1-λ)*p₀(k)； (1)

Further, before determining the initial class probability corresponding to the preset class through formula (1) based on the preset disturbance probability, the real class of the training image and the class ratio of the preset class, δ needs to be determined according to the real class of the training image_y(k) In practical implementation, the value of_y(k) The value of (c) can be determined by the following equation (2):

wherein y represents the real class of the training image, and k represents the predetermined class, i.e. when the real class of the training image is the same as the predetermined class, the delta is_y(k) Is 1, when the real class of the training image is different from the preset class, the delta is_y(k) Is 0.

For example, if the real category of the training image is "dog", when the predetermined category is "dog", the δ is_y(k) Is 1, when the preset category is "cat", the delta_y(k) Is 0.

The terminal may determine the initial class probability corresponding to each preset class through the implementation process, so as to obtain a plurality of initial class probabilities, and then the terminal may combine the obtained plurality of initial class probabilities into an initial class probability vector of the training image, where the initial class probability vector is, for example, { y }₁,y₂,y₃,y₄,…,y₁₀₀₀}。

And the terminal determines the difference between the prediction class probability vector and the initial class probability vector of the training image to obtain the class probability error vector of the training image. For example, if the prediction class probability vector of the training image is { x }₁,x₂,x₃,x₄,…,x₁₀₀₀The initial class probability vector is { y }₁,y₂,y₃,y₄,…,y₁₀₀₀And then the category probability error vector of the training image is { y }₁-x₁,y₂-x₂,y₃-x₃,y₄-x₄,…,y₁₀₀₀-x₁₀₀₀}。

In step 203, an average class probability error vector of the class probability error vectors for the plurality of training images is determined.

After obtaining the class probability error vector of each training image, the terminal may train the convolutional neural network model according to the class probability error vectors of the training images and the training images, and the specific implementation process may include the step 203 and the step 204.

That is, the terminal first determines an average class probability error vector of the class probability error vectors of the plurality of training images. In a specific implementation, the terminal may add respective elements at corresponding positions in the plurality of class probability error vectors, and then determine an average value of the added respective elements, so as to obtain an average class probability error vector of the class probability error vectors of the plurality of training images.

In step 204, the convolutional neural network model is trained based on the average class probability error vector and the plurality of training images.

In a specific implementation, the training of the convolutional neural network model by the terminal based on the average class probability error vector and the training images may include the following 2041-2043 implementation steps:

2041: and determining the square sum of each element in the class probability error vector of each training image to obtain a plurality of square sums.

For example, if the class probability error vector of a training image is { y }₁-x₁,y₂-x₂,y₃-x₃,y₄-x₄,…,y₁₀₀₀-x₁₀₀₀The sum of squares of each element in the class probability error vector of the training image is (y)₁-x₁)²+(y₂-x₂)²+(y₃-x₃)²+(y₄-x₄)²+,…,+(y₁₀₀₀-x₁₀₀₀)². Similarly, according to the method, a sum of squares may be determined for each of the plurality of training images.

It should be noted that, here, the description is only given by taking the determination of the sum of squares of each element in the class probability error vector of each training image as an example, in another embodiment, after the determination of the sum of squares of each element in the class probability error vector of each training image, the sum of squares may be further processed, for example, the sum of squares may be further processed, and the like.

2042: and when the average value of the plurality of square sums is greater than or equal to a preset threshold value, adjusting model parameters included by the convolutional neural network model based on the average class probability error vector.

The preset threshold may be set by a technician in a user-defined manner according to actual needs, or may be set by the terminal in a default manner, which is not limited in the embodiment of the present disclosure.

If the average value of the plurality of square sums is greater than or equal to the preset threshold, it indicates that the difference between the predicted class probability of the output training image and the initial class probability is large, in this case, the average class probability error vector needs to be propagated back to the convolutional neural network model, so as to adjust the model parameters of the convolutional neural network model based on the average class probability error vector.

In a specific implementation, the terminal may use an SGD (Stochastic Gradient Descent) algorithm to adjust model parameters of the convolutional neural network model based on the average class probability error vector, and a specific implementation process of the terminal may refer to related technologies and is not described in detail herein.

It should be noted that, here, the description is only given by taking an example of using the SGD algorithm to adjust the model parameters of the convolutional neural network model based on the average class probability error vector, in another embodiment, the terminal may also use another specified gradient descent algorithm to adjust the model parameters of the convolutional neural network model based on the average class probability error vector, which is not limited in the embodiment of the present disclosure.

2043: and respectively carrying out recognition processing on the training images through the convolutional neural network after model parameters are adjusted to obtain a plurality of prediction type probability vectors, returning to the step of determining the difference value between the prediction type probability vector and the initial type probability vector of the training image for each training image in the training images to obtain the type probability error vector of the training image until the average value of a plurality of square sums determined through the type probability error vector of each training image is smaller than the preset threshold value.

And after the terminal adjusts the model parameters included by the convolutional neural network model based on the average class probability error vector, respectively carrying out recognition processing on the training images through the convolutional neural network adjusted by the model parameters, namely iterating the training process.

In the training process, when the average value of the plurality of square sums determined by the category probability error vector of each training image is smaller than the preset threshold value, the output prediction category probability is very close to the initial category probability, that is, the current convolutional neural network model can be considered to be capable of accurately identifying the training image. Therefore, the terminal finishes the training of the convolutional neural network model, and determines the model parameters at the moment as the model parameters of the convolutional neural network model which finishes the training, thereby realizing the training of the convolutional neural network model.

For example, if the preset threshold is 0.1, when the average value of the plurality of square sums determined by the class probability error vector of each training image is 0.08, it indicates that the plurality of predicted class probability vectors obtained by performing the recognition processing on the plurality of training images by the convolutional neural network model at this time are very close to the initial class probability vector, and therefore, the training of the convolutional neural network model is ended, and the model parameter at this time is determined as the model parameter of the convolutional neural network model having been trained.

It should be noted that, here, only the example that whether the training of the convolutional neural network model is completed is determined according to the average value of the plurality of square sums is described, in another embodiment, the terminal may further determine whether the training of the convolutional neural network model is completed according to the number of iterations, for example, the terminal may count the number of iterations in the training process, and when the number of iterations reaches a preset number, end the training process and determine that the training of the convolutional neural network model is completed. The preset times can be set by technicians according to actual requirements in a self-defined mode.

Here, it should be noted that, by the

above steps

203 and 204, it is implemented together: and training the convolutional neural network model based on the class probability error vectors of the training images and the training images.

FIG. 3A is a block diagram illustrating a training apparatus for a convolutional neural network model, according to an example embodiment. Referring to fig. 3A, the apparatus includes a recognition processing module 310, a first determination module 320, and a training module 330.

The recognition processing module 310 is configured to perform recognition processing on the stored multiple training images through a convolutional neural network model to be trained, so as to obtain multiple prediction category probability vectors, where each prediction category probability vector includes multiple prediction category probabilities, and each prediction category probability is a probability that a corresponding training image belongs to each preset category in multiple preset categories;

a first determining module 320, configured to determine, for each training image in the plurality of training images, a difference between a prediction class probability vector of the training image identified by the identifying processing module 310 and an initial class probability vector to obtain a class probability error vector of the training image, where the initial class probability vector includes a plurality of initial class probabilities, and the plurality of initial class probabilities are determined based on a preset perturbation probability, a true class of the training image, and a class proportion of a corresponding class;

a training module 330, configured to train the convolutional neural network model based on the class probability error vectors of the plurality of training images determined by the first determining module 320 and the plurality of training images.

Optionally, referring to fig. 3B, the apparatus further includes:

a second determining module 340, configured to determine, for each preset category in the plurality of preset categories, a category proportion of the preset category;

a third determining module 350, configured to determine, based on the preset disturbance probability, the real category of the training image, and the category proportion of the preset category, an initial category probability corresponding to the preset category according to the following formula;

P(k)＝λ*δ_y(k)+(1-λ)*p₀(k)；

wherein P (k) represents an initial class probability corresponding to the predetermined class, λ represents the predetermined perturbation probability, and p₀(k) Representing a class proportion of the preset class, wherein, when the real class of the training image is the same as the preset class,delta. the_y(k) Is 1, when the real class of the training image is different from the preset class, the delta is_y(k) Is 0.

Optionally, the training module 330 includes:

Optionally, the training submodule is configured to:

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

FIG. 4 is a block diagram illustrating a convolutional neural network model training apparatus 400, according to an example embodiment. For example, the apparatus 400 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

Referring to fig. 4, the apparatus 400 may include one or more of the following components: processing components 402, memory 404, power components 406, multimedia components 408, audio components 410, input/output (I/O) interfaces 412, sensor components 414, and communication components 416.

The processing component 402 generally controls overall operation of the apparatus 400, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 402 may include one or more processors 420 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 402 can include one or more modules that facilitate interaction between the processing component 402 and other components. For example, the processing component 402 can include a multimedia module to facilitate interaction between the multimedia component 408 and the processing component 402.

The memory 404 is configured to store various types of data to support operations at the apparatus 400. Examples of such data include instructions for any application or method operating on the device 400, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 404 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

Power supply components 406 provide power to the various components of device 400. The power components 406 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power supplies for the apparatus 400.

The multimedia component 408 includes a screen that provides an output interface between the device 400 and the user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 408 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the apparatus 400 is in an operation mode, such as a photographing mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 410 is configured to output and/or input audio signals. For example, audio component 410 includes a Microphone (MIC) configured to receive external audio signals when apparatus 400 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 404 or transmitted via the communication component 416. In some embodiments, audio component 410 also includes a speaker for outputting audio signals.

The I/O interface 412 provides an interface between the processing component 402 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor component 414 includes one or more sensors for providing various aspects of status assessment for the apparatus 400. For example, the sensor assembly 414 may detect an open/closed state of the apparatus 400, the relative positioning of the components, such as a display and keypad of the apparatus 400, the sensor assembly 414 may also detect a change in the position of the apparatus 400 or a component of the apparatus 400, the presence or absence of user contact with the apparatus 400, orientation or acceleration/deceleration of the apparatus 400, and a change in the temperature of the apparatus 400. The sensor assembly 414 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 414 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 414 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 416 is configured to facilitate wired or wireless communication between the apparatus 400 and other devices. The apparatus 400 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 416 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 416 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 400 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 404 comprising instructions, executable by the processor 420 of the apparatus 400 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

A non-transitory computer readable storage medium, wherein instructions of the storage medium, when executed by a processor of a mobile terminal, enable the mobile terminal to perform the above training method of the convolutional neural network model, the method comprising:

training the convolutional neural network model based on the class probability error vectors of the training images and the training images.

P(k)＝λ*δ_y(k)+(1-λ)*p₀(k)；

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method for image recognition based on a convolutional neural network model, the method comprising:

respectively identifying a plurality of stored training images through a convolutional neural network model to be trained to obtain a prediction category probability vector corresponding to each training image in the plurality of training images, wherein each prediction category probability vector comprises a plurality of prediction category probabilities, and each prediction category probability is the probability that the corresponding training image belongs to each preset category in a plurality of preset categories;

for each training image in the plurality of training images, determining a difference value between a prediction class probability vector and an initial class probability vector of the training image to obtain a class probability error vector of the training image, wherein the initial class probability vector comprises a plurality of initial class probabilities, the plurality of initial class probabilities are determined based on a preset disturbance probability, a real class of the training image and a class proportion of a corresponding class, and the preset disturbance probability is used for disturbing the class of the training image;

training the convolutional neural network model based on class probability error vectors of the plurality of training images and the plurality of training images;

and carrying out image recognition on the image to be recognized based on the trained convolutional neural network model.

2. The method of claim 1, wherein prior to determining the difference between the prediction class probability vector and the initial class probability vector for the training image, further comprising:

P(k)＝λ*δ_y(k)+(1-λ)*p₀(k)；

3. The method of claim 1, wherein training the convolutional neural network model based on the class probability error vectors for the plurality of training images and the plurality of training images comprises:

4. The method of claim 3, wherein training the convolutional neural network model based on the mean class probability error vector and the plurality of training images comprises:

5. An apparatus for image recognition based on a convolutional neural network model, the apparatus comprising:

the recognition processing module is used for respectively recognizing a plurality of stored training images through a convolutional neural network model to be trained to obtain a prediction category probability vector corresponding to each training image in the plurality of training images, each prediction category probability vector comprises a plurality of prediction category probabilities, and each prediction category probability is the probability that the corresponding training image belongs to each preset category in a plurality of preset categories;

a first determining module, configured to determine, for each training image in the plurality of training images, a difference between a prediction class probability vector and an initial class probability vector of the training image identified by the identifying and processing module to obtain a class probability error vector of the training image, where the initial class probability vector includes a plurality of initial class probabilities, the plurality of initial class probabilities are determined based on a preset perturbation probability, a true class of the training image, and a class proportion of a corresponding class, and the preset perturbation probability is used to perturb the class of the training image;

a training module, configured to train the convolutional neural network model based on the class probability error vectors of the plurality of training images determined by the first determining module and the plurality of training images; and carrying out image recognition on the image to be recognized based on the trained convolutional neural network model.

6. The apparatus of claim 5, wherein the apparatus further comprises:

P(k)＝λ*δ_y(k)+(1-λ)*p₀(k)；

7. The apparatus of claim 5, wherein the training module comprises:

8. The apparatus of claim 7, wherein the training submodule is to:

9. An apparatus for image recognition based on a convolutional neural network model, the apparatus comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to: