CN110909784B

CN110909784B - Training method and device of image recognition model and electronic equipment

Info

Publication number: CN110909784B
Application number: CN201911122421.0A
Authority: CN
Inventors: 范音
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2019-11-15
Filing date: 2019-11-15
Publication date: 2022-09-02
Anticipated expiration: 2039-11-15
Also published as: CN110909784A

Abstract

The embodiment of the invention provides a training method and a device of an image recognition model and electronic equipment, wherein the method comprises the following steps: inputting the obtained labeled image sample and the obtained unlabeled image sample into a feature extraction layer of the CNN model, and respectively extracting the features of the labeled image sample and the unlabeled image sample; inputting the characteristics of the marked image sample and the characteristics of the unmarked image sample into a loss layer of the CNN model to be trained to obtain a loss value of a loss function in the CNN model to be trained; and adjusting parameters in the CNN model to be trained based on the loss value of the loss function to obtain the trained CNN model. Therefore, the CNN model is trained by adopting a part of marked image samples and a part of unmarked image samples, and the CNN model does not need to be trained by adopting a large number of marked image samples, so that the number of manually marked images can be reduced, the workload of manual marking is reduced, and the efficiency of model training is improved.

Description

Training method and device for image recognition model and electronic equipment

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to a training method and apparatus for an image recognition model, and an electronic device.

Background

With the development of the deep learning technology, the application of the machine vision technology is greatly promoted, for example, after the deep learning technology is applied to the field of machine vision, the performance of machine vision tasks such as image recognition, target detection, instance segmentation and the like can be remarkably improved.

When the machine vision technology based on image recognition is applied, generally, an image recognition model is trained by using a labeled image as a training sample, and then a machine vision task is processed by using the trained image recognition model. Generally, the more marked image samples, the better the trained image recognition model is, and the higher the accuracy of the machine vision task processing is.

However, the inventor finds that the prior art has at least the following problems in the process of implementing the invention:

in the prior art, in order to train the image recognition model, a large number of images are often required to be manually labeled to obtain an image sample, and manual labeling is troublesome, so that the workload of manual labeling is large, and the efficiency of model training is low.

Disclosure of Invention

The embodiment of the invention aims to provide a training method and device for an image recognition model and electronic equipment, so as to reduce the number of manually marked images, reduce the workload of manual marking and improve the efficiency of model training. The specific technical scheme is as follows:

in one aspect of the embodiments of the present invention, an embodiment of the present invention provides a training method for an image recognition model, where the method includes:

acquiring a marked image sample and an unmarked image sample;

inputting the marked image sample and the unmarked image sample into a feature extraction layer of a CNN (Convolutional Neural Networks) model, and respectively extracting the features of the marked image sample and the features of the unmarked image sample;

inputting the characteristics of the marked image sample and the characteristics of the unmarked image sample into a loss layer of the CNN model to be trained to obtain a loss value of a loss function in the CNN model to be trained;

and adjusting parameters in the CNN model to be trained based on the loss value of the loss function to obtain the trained CNN model.

Optionally, the annotated image sample is an image sample annotated with a category, and an annotated object in the annotated image sample is different from an annotated image sample;

before obtaining a loss value of a loss function of the CNN model to be trained by inputting the features of the labeled image sample and the features of the unlabeled image sample into a loss layer of the CNN model to be trained, the training method of the image recognition model further comprises:

determining the weight of the labeled image sample category in a weight vector corresponding to a to-be-trained CNN model based on the category of the labeled image sample of each category, wherein the weight vector corresponding to the to-be-trained CNN model comprises the weight corresponding to each category, and the weight vector corresponding to the to-be-trained CNN model is at least a preset initial weight vector;

obtaining a loss value of a loss function of the CNN model to be trained by inputting the features of the labeled image sample and the features of the unlabeled image sample into a loss layer of the CNN model to be trained, including:

and inputting the characteristics of the labeled image sample of each category, the weight of the category to which the labeled image sample belongs, the weights of the categories and the characteristics of the unlabeled image sample into a loss layer of the CNN model to be trained to obtain a loss value of a loss function of the CNN model to be trained.

Optionally, the obtaining a loss value of a loss function in the CNN model by inputting the feature of the labeled image sample of each category, the weight of the category to which the labeled image sample belongs, the weights of the categories, and the feature of the unlabeled image sample into a loss layer of the CNN model to be trained includes:

the characteristic f of the ith labeled image sample is measured _i Characteristic f of ith labeled image sample _i Weight w of the class li _li Weight w of jth category _j And the feature f of the u-th unlabelled image sample in the unlabelled image samples _u Inputting to a loss layer of the CNN model to be trained, such that the loss layer passes through the following formula:

calculating a loss value L of a loss function in the CNN model to be trained, wherein n is the total number of the marked image samples, K is the total number of the types of the marked image samples, U is the total number of the unmarked image samples, w _li And w _j Respectively representing the li-th column vector and the j-th column vector in the preset weight vector W, wherein li is more than or equal to 1 and is less than or equal to K.

Optionally, before adjusting parameters in the CNN model to be trained based on the loss value of the loss function to obtain the trained CNN model, the method further includes:

judging whether the CNN model to be trained is converged or not based on the loss value of the loss function;

if yes, taking the CNN model to be trained as the CNN model after training;

and if not, executing a loss value based on the loss function, adjusting parameters in the CNN model to be trained to obtain the trained CNN model, taking the trained CNN model as the CNN model to be trained, returning to the step of inputting the labeled image sample and the unlabeled image sample into a feature extraction layer of the CNN model to be trained, and respectively extracting the features of the labeled image sample and the unlabeled image sample.

In another aspect of the present invention, an embodiment of the present invention further provides an apparatus for training an image recognition model, where the apparatus includes:

the sample acquisition module is used for acquiring an annotated image sample and an unlabeled image sample;

the characteristic identification module is used for inputting the marked image sample and the unmarked image sample into a characteristic extraction layer of the CNN model to be trained and respectively extracting the characteristic of the marked image sample and the characteristic of the unmarked image sample;

the loss value calculation module is used for inputting the characteristics of the marked image samples and the characteristics of the unmarked image samples into a loss layer of the CNN to obtain the loss value of the loss function in the CNN model to be trained;

and the parameter adjusting module is used for adjusting parameters in the CNN model to be trained based on the loss value of the loss function to obtain the trained CNN model.

optionally, the training apparatus for image recognition model further includes: a weight determination module to:

optionally, the loss value calculating module is specifically configured to:

and inputting the characteristics of the labeled image sample of each category, the weight of the category to which the labeled image sample belongs, the weights of the categories and the characteristics of the unlabeled image sample into a loss layer of the CNN model to be trained to obtain a loss value of a loss function in the CNN model to be trained.

Optionally, the loss value calculating module is specifically configured to:

the characteristic f of the ith labeled image sample _i Characteristic f of ith labeled image sample _i Weight w of the class li _li Weight w of jth category _j And the feature f of the u-th unlabelled image sample in the unlabelled image samples _u Inputting to a loss layer of the CNN model to be trained, so that the loss layer is formed by the following formula:

calculating a loss value L of a loss function in the CNN model to be trained, wherein n is the total number of the marked image samples, K is the total number of the types of the marked image samples, U is the total number of the unmarked image samples, and w _li And w _j Respectively representing the li-th column vector and the j-th column vector in the preset weight vector W, wherein li is more than or equal to 1 and is less than or equal to K.

Optionally, the training apparatus for image recognition model further includes:

the judging module is used for judging whether the CNN model to be trained is converged or not based on the loss value of the loss function; if yes, triggering the model determining module, and if not, triggering the parameter adjusting module;

the triggering model determining module is used for taking the CNN model to be trained as the trained CNN model;

and the parameter adjusting module is also used for taking the trained CNN model as the CNN model to be trained and triggering the characteristic identifying module.

In another aspect of the present invention, an embodiment of the present invention further provides an electronic device, which includes a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;

a memory for storing a computer program;

and a processor for implementing the steps of the method for training the image recognition model when executing the program stored in the memory.

In yet another aspect of the present invention, the present invention further provides a computer-readable storage medium, in which instructions are stored, and when the instructions are executed on a computer, the computer is caused to execute any one of the above-mentioned training methods for an image recognition model.

In yet another aspect of the present invention, the embodiment of the present invention further provides a computer program product containing instructions, which when run on a computer, causes the computer to execute any one of the above-mentioned training methods for image recognition models.

According to the training method and device for the image recognition model and the electronic equipment, after the marked image sample and the unmarked image sample are obtained, the marked image sample and the unmarked image sample are input into the feature extraction layer of the CNN model to be trained, so that the features of the marked image sample and the features of the unmarked image sample can be respectively extracted; then inputting the characteristics of the marked image sample and the characteristics of the unmarked image sample into a loss layer of the CNN model to be trained, so as to obtain a loss value of a loss function in the CNN model; and then adjusting parameters in the CNN model to be trained based on the loss value of the loss function to obtain the trained CNN model.

Therefore, the CNN model can be trained by adopting a part of labeled image samples and a part of unlabeled image samples without adopting a large number of labeled image samples, so that the number of manually labeled images can be reduced, the workload of manual labeling is reduced, the efficiency of model training is improved, the difficulty in acquiring the labeled image samples can be reduced, and the difficulty in training the CNN model is reduced. Of course, it is not necessary for any product or method of practicing the invention to achieve all of the above advantages at the same time.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.

Fig. 1 is a flowchart of a first implementation manner of a training method for an image recognition model according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a training process of the training method of the image recognition model shown in FIG. 1;

FIG. 3 is a flowchart of a second implementation manner of a training method for an image recognition model according to an embodiment of the present invention;

FIG. 4 is a schematic structural diagram of an apparatus for training an image recognition model according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention.

In order to solve the problems in the prior art, embodiments of the present invention provide a training method and apparatus for an image recognition model, and an electronic device, so as to reduce the number of manually labeled images, reduce the workload of manual labeling, and improve the efficiency of model training.

First, a training method of an image recognition model according to an embodiment of the present invention is described below, as shown in fig. 1, which is a flowchart of a first implementation manner of the training method of an image recognition model according to an embodiment of the present invention, and the method may include:

and S110, acquiring marked image samples and unmarked image samples.

In some examples, a training method of an image recognition model according to an embodiment of the present invention may be applied to an electronic device, which may include a personal computer, a smart phone, a server, and the like.

Before using the electronic device to perform image recognition, a user may first train a CNN model set in the electronic device to obtain a trained CNN model, and then may use the trained CNN model to perform image recognition.

When a user trains a CNN model preset in the electronic device, an annotated image sample and an unlabelled image sample may be set first, so that the electronic device may acquire the annotated image sample and the unlabelled image sample.

In still other examples, in order to ensure the recognition accuracy of the trained CNN model, when the labeled image sample and the unlabeled image sample are set, the labeled object in the labeled image sample and the unlabeled image sample may be made different, so that the content of the labeled image sample and the content of the unlabeled image sample may be completely different.

It is understood that the CNN model to be trained herein may be a CNN model that is trained for the first time, or may be a CNN model that is trained again after being trained for multiple times.

And S120, inputting the marked image sample and the unmarked image sample into a feature extraction layer of the CNN model, and respectively extracting the features of the marked image sample and the features of the unmarked image sample.

After the marked image sample and the unmarked image sample are obtained, the marked image sample and the unmarked image sample can be input into a feature extraction layer of a to-be-trained CNN model, and the to-be-trained CNN model can respectively perform feature recognition on the marked image sample and the unmarked image sample, so that the features of the marked image sample and the features of the unmarked image sample can be obtained.

It can be understood that the CNN model to be trained may be a preset CNN model that has not been trained, or may be a CNN model that has undergone at least one training.

In still other examples, the feature extraction layer may be one layer or multiple layers, and the number of the feature extraction layers may be set according to actual needs, which is not limited herein.

S130, inputting the characteristics of the marked image samples and the characteristics of the unmarked image samples into a loss layer of the CNN model to be trained, and obtaining the loss value of the loss function in the CNN model to be trained.

And S140, adjusting parameters in the CNN model to be trained based on the loss value of the loss function to obtain the trained CNN model.

After obtaining the features of the labeled image sample and the features of the unlabeled image sample, the features of the labeled image sample and the features of the unlabeled image sample may be input to a loss layer of the CNN model to be trained, so that the loss layer may calculate a loss value of a loss function in the CNN model to be trained based on the features of the labeled image sample and the features of the unlabeled image sample. After the loss layer outputs the loss value of the loss function of the to-be-trained CNN model, the parameters in the to-be-trained CNN model may be adjusted based on the loss value of the loss function, so as to achieve the purpose of training the to-be-trained CNN model.

In some examples, when the loss value of the loss function of the CNN model to be trained is calculated based on the features of the labeled image sample and the features of the unlabeled image sample, the loss value of the loss function of the CNN model to be trained may be obtained by performing calculation through formulas of various loss functions based on the features of the labeled image sample and the features of the unlabeled image sample. The formula of the loss function is not limited here, as long as the loss value of the loss function of the CNN model to be trained can be calculated based on the features of the labeled image samples and the features of the unlabeled image samples. And will not be described in detail herein.

For a more clear description of the embodiment of the present invention, the description is made with reference to the training process shown in fig. 2, as shown in fig. 2, the labeled image sample 210 and the unlabeled image sample 220 may be input into the CNN model 230 to be trained, and the feature extraction layer of the CNN model 230 to be trained may output the feature 240 of the labeled image sample and the feature 250 of the unlabeled image sample, respectively. The features 240 of the labeled image sample and the features 250 of the unlabeled image sample may then be input to a loss layer of the CNN model to be trained, which may calculate a loss value 260 of a loss function in the CNN model to be trained 230 based on the features 240 of the labeled image sample and the features 250 of the unlabeled image sample. Finally, after obtaining the loss value 260, the loss value 260 may be used to adjust parameters in the CNN model 230 to be trained, so as to obtain the trained CNN model.

According to the training method for the image recognition model, provided by the embodiment of the invention, after the marked image sample and the unmarked image sample are obtained, the marked image sample and the unmarked image sample are input into the feature extraction layer of the CNN model to be trained, so that the features of the marked image sample and the features of the unmarked image sample can be respectively extracted; then inputting the characteristics of the marked image sample and the characteristics of the unmarked image sample into a loss layer of the CNN model to be trained to obtain a loss value of a loss function in the CNN model; and then adjusting parameters in the CNN model to be trained based on the loss value of the loss function to obtain the trained CNN model.

Therefore, the CNN model can be trained by adopting a part of labeled image samples and a part of unlabeled image samples without adopting a large number of labeled image samples, so that the number of artificially labeled images can be reduced, the workload of artificial labeling is reduced, the efficiency of model training is improved, the difficulty in obtaining the labeled image samples can be reduced, and the difficulty in training the CNN model is reduced.

On the basis of the training method for the image recognition model shown in fig. 1, an embodiment of the present invention further provides a possible implementation manner, as shown in fig. 3, which is a flowchart of a second implementation manner of the training method for the image recognition model according to the embodiment of the present invention, where the method may include:

and S110, acquiring marked image samples and unmarked image samples.

Wherein, the marked objects in the marked image samples are different from the unmarked image samples.

And S120, inputting the marked image sample and the unmarked image sample into a feature extraction layer of the CNN model to be trained, and respectively extracting the features extracted by the marked image sample and the features of the unmarked image sample.

In some examples, if the CNN model to be trained is a multi-classification model that identifies different classes of target objects from the input image to be identified, the labeled image samples are image samples that label classes of objects in the image. For example: if a certain image sample contains a sample of people and vehicles, the target objects of the image sample containing the people category and the vehicles category are marked.

If the CNN model to be trained is used for identifying whether the image to be identified contains the two classification models of the vehicle, the marked image sample is the image sample marked whether the image to be identified contains the vehicle. For example: if a certain image sample contains a vehicle, marking the image sample as 1, namely containing the vehicle; if another image sample does not contain a vehicle, the image sample is marked as 0, that is, no vehicle is contained.

S131, determining the weight of the labeled image sample category in the weight vector corresponding to the to-be-trained CNN model based on the category of the labeled image sample of each category, wherein the weight vector corresponding to the to-be-trained CNN model comprises the weight corresponding to each category, and the weight vector corresponding to the to-be-trained CNN model is at least a preset initial weight vector;

s132, inputting the characteristics of the labeled image sample of each category, the weight of the category to which the labeled image sample belongs, the weights of the categories and the characteristics of the unlabeled image sample into a loss layer of the CNN model to be trained, and obtaining the loss value of the loss function in the CNN model to be trained.

S150, judging whether the CNN model to be trained is converged or not based on the loss value of the loss function; if yes, step S160 is performed, and if no, step S140 is performed.

S160, taking the CNN model to be trained as the trained CNN model;

and S140, adjusting parameters in the CNN model to be trained based on the loss value of the loss function to obtain the trained CNN model. And performs step S170.

And S170, taking the trained CNN model as a CNN model to be trained, and returning to execute the step S120.

In the embodiment shown in fig. 3, after obtaining the labeled image samples of different categories, the labeled image samples and the unlabeled image samples may be input into the feature extraction layer of the CNN model to be trained, so that the features of each labeled image sample and the features of the unlabeled image samples may be obtained.

The features of the labeled image sample and the unlabeled image sample based on each category may then be input to the loss layer of the CNN model to be trained to obtain the loss value of the loss function in the CNN model to be trained.

In some examples, when calculating the loss value of the loss function of the to-be-trained CNN model, the weight of the labeled image sample may be determined in the weight vector corresponding to the to-be-trained CNN model based on the category of the labeled image sample of each category, and then the loss value of the loss function of the to-be-trained CNN model is calculated based on the feature of the labeled image sample of each category, the weight of the category to which the labeled image sample belongs, the weights of multiple categories, and the feature of the unlabeled image sample.

In some examples, when the CNN model to be trained is the set initial CNN model, the CNN model to be trained is not yet trained, and the weight vector corresponding to the CNN model to be trained may be the preset initial weight vector.

When the CNN model to be trained is a trained CNN model, the weight vector corresponding to the CNN model to be trained may be: and based on the loss value obtained by the last training of the CNN model to be trained, adopting a gradient descent algorithm to calculate the weight vector obtained by derivation calculation of the weight vector corresponding to the CNN model before the last training of the CNN model to be trained.

For example, assuming that the CNN model to be trained is the CNN model after the 10 th training, the weight vector corresponding to the CNN model after the 10 th training may be a weight vector obtained by deriving the weight vector corresponding to the CNN model before the 9 th training by using a gradient descent algorithm based on the loss value after the 9 th training of the CNN model.

In some examples, when a loss value of a loss function in the CNN model to be trained is obtained by inputting the feature of the labeled image sample of each category, the weight of the category to which the labeled image sample belongs, the weights of a plurality of categories, and the feature of the unlabeled image sample into a loss layer of the CNN model to be trained, the feature f of the ith labeled image sample may be used _i Characteristic f of ith labeled image sample _i Weight w of the class li _li Weight w of jth category _j And the characteristic f of the u & ltth & gt unlabelled image sample in the unlabelled image sample _u Inputting to a loss layer of the CNN model to be trained, so that the loss layer is formed by the following formula:

and calculating a loss value L of a loss function in the CNN model to be trained.

Wherein n is the total number of labeled image samples, K is the total number of the types of labeled image samples, U is the total number of unlabeled image samples, w _li And w _j Respectively representing the lih column vector and the jth column vector in the preset weight vector W, wherein li is more than or equal to 1 and less than or equal to K.

It will be understood that f _i And f _u The features of the labeled image sample and the features of the unlabeled image sample may be represented in the form of vectors, and thus, the formulaF in (1) _i ·f _u Can reflect the similarity degree between the ith labeled image sample and the u th unlabeled image sample when f is _i ·f _u The smaller, the more similar the ith annotated image sample to the u-th annotated image sample can be said to be, the smaller the loss value.

In still other examples, when a loss value of a loss function of the CNN model to be trained is obtained by inputting the feature of the labeled image sample of each category, the weight of the category to which the labeled image sample belongs, the weights of a plurality of categories, and the feature of the unlabeled image sample into a loss layer of the CNN model to be trained, the feature f of the ith labeled image sample may be used _i Characteristic f of ith labeled image sample _i Weight w of the class _i Weight w of jth category _j And the characteristic f of the u & ltth & gt unlabelled image sample in the unlabelled image sample _u Inputting to a loss layer of the CNN model to be trained, so that the loss layer is formed by the following formula:

calculating a loss value L of a loss function in the CNN model to be trained, wherein theta _yi ＝arccos(f _i ·w _i ) S and m are respectively the preset hyper-parameters in the loss function, for example, s may take a value of 32, m may take a value of 0.5, yi represents the category to which the ith labeled image sample belongs, yi is greater than or equal to 1 and less than or equal to K, and theta _j ＝arccos(f _i ·w _j )，θ _u ＝arccos(f _i ·f _u )。

It will be appreciated that the two loss function equations herein are simply: for example, the loss value of the loss function of the CNN model to be trained is obtained through calculation, but not limited to this, and when the loss value of the loss function of the CNN model to be trained is calculated based on the features of the labeled image samples of each category, the weights of the categories to which the labeled image samples belong, the weights of the multiple categories, and the features of the unlabeled image samples, other loss function formulas may also be used.

In some examples, the CNN model to be trained may be a CNN model that is trained for the first time, or may be a CNN model that is to be trained again after being trained for multiple times, and after being trained for multiple times, in order to determine whether the CNN model after being trained for multiple times is a CNN model that is trained completely, whether the CNN model to be trained converges may be determined based on a loss value of the loss function.

When the CNN model to be trained is a CNN model to be trained again after being trained for multiple times, the CNN model to be trained may be the CNN model obtained by training through the image recognition model training method of the embodiment of the present invention, that is, the CNN model to be trained is the CNN model obtained by performing parameter adjustment on the CNN model to be trained last time.

In still other examples, when determining whether the CNN model to be trained converges based on the loss value of the loss function, the loss value of the loss function of the last CNN model to be trained may be obtained first;

and then comparing the loss value of the loss function of the CNN model to be trained last time with the loss value of the loss function of the CNN model to be trained.

When the difference between the loss value of the loss function of the last CNN model to be trained and the loss value of the loss function of the CNN model to be trained is small, for example, the difference is smaller than a preset threshold, it may be said that the CNN model to be trained converges, and further, it may be said that the recognition result of the CNN model to be trained performing image recognition is substantially the same as the recognition result of the last CNN model to be trained performing image recognition, and therefore, the CNN model to be trained may be used as the trained CNN model.

When the difference between the loss value of the loss function of the CNN model to be trained last time and the loss value of the loss function of the CNN model to be trained is large, it is indicated that the CNN model to be trained does not converge, and training needs to be continued, so that the CNN model to be trained can be subjected to parameter adjustment based on the loss value of the loss function of the CNN model to be trained, and the trained CNN model is obtained. And then, taking the trained CNN model as the CNN model to be trained, and returning to execute the step S120.

By the embodiment of the invention, whether the CNN model to be trained can be used as the CNN model after training can be determined in time, so that the CNN model to be trained can be prevented from being trained with unobvious effect for many times, the training time is prevented from being overlong, and the time overhead in the training process can be saved.

In some examples, when selecting the unlabeled image sample and the labeled image sample, the class of the labeled image sample is to have a greater distance, e.g., the distance between the classes of the labeled image sample is greater than a preset distance threshold, and the unlabeled image sample is also to have a greater distance from the labeled image sample, e.g., the distance between the class of the unlabeled image sample and the class of the labeled image sample is greater than a preset distance threshold. Therefore, the trained CNN model can reduce the intra-class distance and expand the inter-class distance.

Corresponding to the above method embodiment, an embodiment of the present invention further provides a training apparatus for an image recognition model, as shown in fig. 4, which is a schematic structural diagram of the training apparatus for an image recognition model according to the embodiment of the present invention, and the training apparatus for an image recognition model may include:

a sample obtaining module 410, configured to obtain an annotated image sample and an annotated image sample;

the feature identification module 420 is configured to input the labeled image sample and the unlabeled image sample into a feature extraction layer of the CNN model to be trained, and respectively extract features of the labeled image sample and features of the unlabeled image sample;

a loss value calculating module 430, configured to obtain a loss value of a loss function in the CNN model to be trained by inputting the features of the labeled image sample and the features of the unlabeled image sample into a loss layer of the CNN model to be trained;

and a parameter adjusting module 440, configured to adjust parameters in the CNN model to be trained based on the loss value of the loss function, to obtain the trained CNN model.

According to the training device for the image recognition model, provided by the embodiment of the invention, after the marked image sample and the unmarked image sample are obtained, the marked image sample and the unmarked image sample are input into the feature extraction layer of the CNN model to be trained, so that the features of the marked image sample and the features of the unmarked image sample can be respectively extracted; then inputting the characteristics of the marked image sample and the characteristics of the unmarked image sample into a loss layer of the CNN model to be trained, so as to obtain a loss value of a loss function in the CNN model; and then adjusting parameters in the CNN model to be trained based on the loss value of the loss function to obtain the trained CNN model.

In some examples, the annotated image sample is an image sample annotated with a category, and the annotated object in the annotated image sample is different from the unlabeled image sample;

in some examples, the training device for the image recognition model may further include: a weight determination module to:

determining the weight of the labeled image sample category in a weight vector corresponding to the CNN model to be trained based on the category of the labeled image sample of each category, wherein the weight vector corresponding to the CNN model to be trained comprises the weight corresponding to each category, and the weight vector corresponding to the CNN model to be trained is at least a preset initial weight vector;

in some examples, the loss value calculation module 430 is specifically configured to:

the characteristic f of the ith labeled image sample _i Characteristic f of ith labeled image sample _i Weight w of the class li _li Weight w of jth category _j And the feature f of the u-th unlabelled image sample in the unlabelled image samples _u Inputting to a loss layer of the CNN model to be trained, such that the loss layer passes through the following formula:

In some examples, the training device for the image recognition model may further include:

the parameter adjusting module 440 is further configured to use the trained CNN model as a CNN model to be trained, and trigger the feature identifying module 420.

An embodiment of the present invention further provides an electronic device, as shown in fig. 5, which includes a processor 501, a communication interface 502, a memory 503 and a communication bus 504, where the processor 501, the communication interface 502 and the memory 503 complete mutual communication through the communication bus 504,

a memory 503 for storing a computer program;

the processor 501, when executing the program stored in the memory 503, is configured to execute the training method of the image recognition model in any of the above embodiments, for example, the following steps may be implemented:

acquiring an annotated image sample and an unlabelled image sample, wherein an annotated object in the annotated image sample is different from the unlabelled image sample;

inputting the marked image sample and the unmarked image sample into a feature extraction layer of the CNN model, and respectively extracting the features of the marked image sample and the features of the unmarked image sample;

According to the electronic equipment provided by the embodiment of the invention, after the marked image sample and the unmarked image sample are obtained, the marked image sample and the unmarked image sample are input into the feature extraction layer of the CNN model to be trained, so that the features of the marked image sample and the features of the unmarked image sample can be respectively extracted; then inputting the characteristics of the marked image sample and the characteristics of the unmarked image sample into a loss layer of the CNN model to be trained, so as to obtain a loss value of a loss function in the CNN model; and then adjusting parameters in the CNN model to be trained based on the loss value of the loss function to obtain the trained CNN model.

The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this is not intended to represent only one bus or type of bus.

The communication interface is used for communication between the electronic equipment and other equipment.

The Memory may include a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, or discrete hardware components.

In another embodiment of the present invention, there is also provided a computer-readable storage medium, which stores instructions that, when executed on a computer, cause the computer to execute the training method of the image recognition model in any one of the above embodiments, for example, the following steps can be implemented:

The computer-readable storage medium provided by the embodiment of the invention can be used for inputting the labeled image sample and the unlabeled image sample into the feature extraction layer of the CNN model to be trained after the labeled image sample and the unlabeled image sample are obtained, so that the features of the labeled image sample and the features of the unlabeled image sample can be respectively extracted; then inputting the characteristics of the marked image sample and the characteristics of the unmarked image sample into a loss layer of the CNN model to be trained, so as to obtain a loss value of a loss function in the CNN model; and then adjusting parameters in the CNN model to be trained based on the loss value of the loss function to obtain the trained CNN model.

Therefore, the CNN model can be trained by adopting a part of labeled image samples and a part of unlabeled image samples without adopting a large number of labeled image samples, so that the number of manually labeled images can be reduced, the workload of manual labeling is reduced, the efficiency of model training is improved, the difficulty in acquiring the labeled image samples can be reduced, and the difficulty in training the CNN model is reduced.

In another embodiment of the present invention, there is also provided a computer program product containing instructions, which when run on a computer, causes the computer to execute the training method of the image recognition model in any one of the above embodiments, for example, the following steps can be implemented:

According to the computer program product containing the instructions, after the marked image sample and the unmarked image sample are obtained, the marked image sample and the unmarked image sample are input into the feature extraction layer of the CNN model to be trained, so that the features of the marked image sample and the features of the unmarked image sample can be respectively extracted; then inputting the characteristics of the marked image sample and the characteristics of the unmarked image sample into a loss layer of the CNN model to be trained, so as to obtain a loss value of a loss function in the CNN model; and then adjusting parameters in the CNN model to be trained based on the loss value of the loss function to obtain the trained CNN model.

In the above embodiments, all or part of the implementation may be realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the system embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and reference may be made to the partial description of the method embodiment for relevant points.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. A method for training an image recognition model, the method comprising:

acquiring a marked image sample and an unmarked image sample;

inputting the marked image sample and the unmarked image sample into a feature extraction layer of a Convolutional Neural Network (CNN) model to be trained, and respectively extracting the features of the marked image sample and the features of the unmarked image sample;

inputting the characteristics of the labeled image sample of each category, the weight of the category to which the labeled image sample belongs, the weights of a plurality of categories and the characteristics of the unlabeled image sample into a loss layer of the CNN model to be trained to obtain a loss value of a loss function in the CNN model to be trained;

adjusting parameters in the CNN model to be trained based on the loss value of the loss function to obtain a trained CNN model;

the obtaining a loss value of a loss function in the to-be-trained CNN model by inputting the features of the labeled image sample of each category, the weight of the category to which the labeled image sample belongs, the weights of a plurality of categories, and the features of the unlabeled image sample into a loss layer of the to-be-trained CNN model includes:

the characteristic f of the ith labeled image sample _i The feature f of the ith labeled image sample _i Weight w of the class li _li Weight w of jth category _j And the feature f of the u-th unlabelled image sample in the unlabelled image samples _u Inputting to a loss layer of the CNN model to be trained, so that the loss layer is formed by the following formula:

computing stationThe loss value L of the loss function in the CNN model to be trained is described, wherein n is the total number of the labeled image samples, K is the total number of the types of the labeled image samples, U is the total number of the unlabeled image samples, and w is _li And said w _j Respectively setting a li-th column vector and a j-th column vector in the preset weight vector W, wherein li is more than or equal to 1 and is less than or equal to K;

alternatively, the first and second electrodes may be,

the obtaining a loss value of a loss function in the to-be-trained CNN model by inputting the feature of the labeled image sample of each category, the weight of the category to which the labeled image sample belongs, the weights of the categories, and the feature of the unlabeled image sample into a loss layer of the to-be-trained CNN model includes:

the characteristic f of the ith labeled image sample _i The characteristic f of the ith labeled image sample _i Weight w of the class li _li Weight w of jth category _j And the feature f of the u-th unlabelled image sample in the unlabelled image samples _u Inputting to a loss layer of the CNN model to be trained, so that the loss layer is formed by the following formula:

calculating a loss value L of a loss function in the CNN model to be trained, wherein the theta _yi ＝arccos(f _i ·w _i ) S and m are respectively a preset hyper-parameter in the loss function, yi represents the category of the ith labeled image sample, yi is more than or equal to 1 and less than or equal to K, and theta _j ＝arccos(f _i ·w _j )，θ _u ＝arccos(f _i ·f _u )。

2. The method of claim 1, wherein the labeled image sample is an image sample labeled with a category, and wherein the labeled object in the labeled image sample is different from the unlabeled image sample;

before the obtaining a loss value of a loss function in the CNN model to be trained by inputting the features of the labeled image sample and the features of the unlabeled image sample into a loss layer of the CNN model to be trained, the method further includes:

determining the weight of the labeled image sample in the class in the weight vector corresponding to the to-be-trained CNN model based on the class of the labeled image sample in each class, wherein the weight vector corresponding to the to-be-trained CNN model comprises the weight corresponding to each class, and the weight vector corresponding to the to-be-trained CNN model is at least a preset initial weight vector.

3. The method according to claim 1, wherein before the adjusting parameters in the CNN model to be trained based on the loss values of the loss function to obtain the trained CNN model, the method further comprises:

if yes, taking the CNN model to be trained as a trained CNN model;

and if not, executing a step of adjusting parameters in the CNN model to be trained based on the loss value of the loss function to obtain a trained CNN model, taking the trained CNN model as the CNN model to be trained, and returning to execute the step of inputting the labeled image sample and the unlabeled image sample into a feature extraction layer of the CNN model to be trained and respectively extracting the features of the labeled image sample and the unlabeled image sample.

4. An apparatus for training an image recognition model, the apparatus comprising:

the characteristic identification module is used for inputting the marked image sample and the unmarked image sample into a characteristic extraction layer of a CNN model to be trained, and respectively extracting the characteristic of the marked image sample and the characteristic of the unmarked image sample;

a loss value calculation module, configured to input the feature of the labeled image sample of each category, the weight of the category to which the labeled image sample belongs, the weights of multiple categories, and the feature of the unlabeled image sample to a loss layer of the to-be-trained CNN model, so as to obtain a loss value of a loss function in the to-be-trained CNN model;

a parameter adjusting module, configured to adjust a parameter in the CNN model to be trained based on the loss value of the loss function, to obtain a trained CNN model;

the loss value calculation module is specifically configured to:

the characteristic f of the ith labeled image sample is measured _i The feature f of the ith labeled image sample _i Weight w of the class li _li Weight w of jth category _j And the feature f of the u-th unlabelled image sample in the unlabelled image samples _u Inputting to a loss layer of the CNN model to be trained, so that the loss layer is formed by the following formula:

calculating a loss value L of a loss function in the CNN model to be trained, wherein n is the total number of the labeled image samples, K is the total number of the types of the labeled image samples, U is the total number of the unlabeled image samples, and w is _li And said w _j Respectively setting a li-th column vector and a j-th column vector in the preset weight vector W, wherein li is more than or equal to 1 and is less than or equal to K;

alternatively, the first and second electrodes may be,

the loss value calculation module is specifically configured to:

the characteristic f of the ith labeled image sample _i The characteristic f of the ith labeled image sample _i Weight w of the class li _li Weight w of jth category _j And the feature f of the u-th unlabelled image sample in the unlabelled image samples _u Inputting to a lossy layer of the CNN model to be trained, such that the lossy layer passes through the following formula:

5. The apparatus of claim 4, wherein the annotated image sample is an image sample annotated with a category, and wherein an annotated object in the annotated image sample is different from the unlabeled image sample;

the device further comprises: a weight determination module to:

and determining the weight of the labeled image sample in the category in the weight vector corresponding to the to-be-trained CNN model based on the category of the labeled image sample of each category, wherein the weight vector corresponding to the to-be-trained CNN model comprises the weight corresponding to each category, and the weight vector corresponding to the to-be-trained CNN model is at least a preset initial weight vector.

6. The apparatus of claim 4, further comprising:

the judging module is used for judging whether the CNN model to be trained is converged or not based on the loss value of the loss function; if yes, triggering a model determining module, and if not, triggering the parameter adjusting module;

the triggering model determining module is used for taking the CNN model to be trained as a trained CNN model;

the parameter adjusting module is further configured to use the trained CNN model as the CNN model to be trained, and trigger the feature recognition module.

7. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;

a memory for storing a computer program;

a processor for implementing the method steps of any of claims 1 to 3 when executing a program stored in the memory.