CN114091594A

CN114091594A - Model training method and device, equipment and storage medium

Info

Publication number: CN114091594A
Application number: CN202111347391.0A
Authority: CN
Inventors: 蔡晓聪; 侯军; 伊帅
Original assignee: Beijing Sensetime Technology Development Co Ltd
Current assignee: Beijing Sensetime Technology Development Co Ltd
Priority date: 2021-11-15
Filing date: 2021-11-15
Publication date: 2022-02-25

Abstract

The embodiment of the application discloses a model training method, a device, equipment and a storage medium, wherein the method comprises the following steps: acquiring a training sample set; the training sample set at least comprises difficult samples with labels of unknown classes; the unknown category is other categories except a positive category or a negative category; performing supervised training on a model to be trained by utilizing a quantized cross entropy loss function based on the training sample set; wherein the quantized cross-entropy loss functions comprise loss functions adapted to predict respective correspondences of the positive class, the negative class, and the unknown class; and obtaining a trained target model under the condition that the training result shows that the confidence coefficient of the predicted difficult sample is between two preset confidence coefficient threshold values.

Description

Model training method and device, equipment and storage medium

Technical Field

The present application relates to the field of computer vision, and relates to, but is not limited to, a model training method, apparatus, device, and storage medium.

Background

The image classification problem is a basic problem in deep learning and a basic problem in computer vision, and is commonly used in image recognition (image classification).

In the use of an actual scenario, the input of the model may not only be a well-labeled negative exemplar or a well-labeled positive exemplar, but also a well-labeled exemplar may appear as the input of the model (i.e., the exemplar cannot be explicitly labeled as a negative category or a positive category). Since the model is learned in the training stage based on the negative and positive samples with definite labels, when the samples with indefinite labels are encountered as input in the actual use process, the model cannot effectively predict the samples of the class, and may be randomly classified into the positive class or the negative class, so that the precision (precision) of the final negative class and the positive class is affected.

Disclosure of Invention

The embodiment of the application provides a model training method, a model training device, model training equipment and a storage medium.

The technical scheme of the embodiment of the application is realized as follows:

in a first aspect, an embodiment of the present application provides a model training method, including:

acquiring a training sample set; the training sample set at least comprises difficult samples with labels of unknown classes; the unknown class is other than a positive class or a negative class;

performing supervised training on a model to be trained by utilizing a quantized cross entropy loss function based on the training sample set; wherein the quantized cross-entropy loss functions comprise loss functions adapted to predict respective correspondences of the positive class, the negative class, and the unknown class;

and obtaining a trained target model under the condition that the training result shows that the confidence coefficient of the predicted difficult sample is between two preset confidence coefficient threshold values.

In some possible embodiments, the two confidence thresholds include a first threshold and a second threshold greater than the first threshold, the training sample set further includes positive samples labeled as positive classes and negative samples labeled as negative classes, the method further includes: and obtaining the trained target model under the condition that the training result shows that the predicted confidence degree of the negative sample meets the first threshold value and/or the predicted confidence degree of the positive sample meets the second threshold value.

Therefore, by setting the first threshold and the second threshold, the difficult samples with undefined labels can be identified, and the accuracy of the positive and negative samples can be ensured.

In some possible embodiments, the supervised training of the model to be trained using the quantized cross-entropy loss function based on the training sample set includes: inputting the training sample set into the model to be trained to obtain the prediction confidence of each sample in the training sample set; determining cross-entropy losses for the set of training samples using the quantized cross-entropy loss function based on the prediction confidence and corresponding label for each of the samples; and carrying out back propagation training on the model to be trained on the basis of the cross entropy loss.

And then, continuously predicting the classification result and the confidence coefficient of each sample by using the model after the parameters are updated until the model parameters are converged, and finishing the whole training process.

In some possible embodiments, the quantized cross-entropy loss function comprises a first function for predicting the positive class correspondence, a second function for the negative class correspondence, and a third function for the unknown class correspondence; determining cross-entropy losses for the set of training samples using the quantized cross-entropy loss function based on the prediction confidence and the respective label for each of the samples, comprising: for the samples labeled as the positive class in the training sample set, determining a first loss between a prediction confidence of the respective samples and the positive class by using the first function; for samples in the training sample set labeled as the negative class, determining a second loss between the prediction confidence of the respective sample and the negative class using the second function; for samples in the training sample set labeled as the unknown class, determining a third loss between the prediction confidence of the respective sample and the unknown class using the third function; determining a cross-entropy penalty for the set of training samples based on the first penalty, the second penalty, and the third penalty.

In this way, the first loss between the prediction confidence of the model for the positive sample output and the positive category is determined through the first function, the second loss between the prediction confidence of the model for the negative sample output and the negative category is determined through the second function, and the third loss between the prediction confidence of the model for the difficult sample output and the unknown category is determined through the third function, so that the cross entropy loss of the training sample set can be further accurately calculated, and the parameters of the model can be adjusted based on the cross entropy loss.

In some possible embodiments, the method further comprises: acquiring an image to be processed; carrying out classification prediction on the image to be processed by utilizing the trained target model to obtain the confidence coefficient of the image to be processed; and determining a classification result of the image to be processed based on the confidence of the image to be processed and the two confidence thresholds.

Therefore, the confidence coefficient output by the trained target model is combined with the two confidence coefficient thresholds, so that whether the classification result of the image to be processed is a positive type, a negative type or an unknown type can be accurately judged, and the influence on the accuracy of the final negative type and the accuracy of the final positive type caused by chaotic prediction of the image of the unknown type is avoided.

In some possible embodiments, the two confidence thresholds include a first threshold and a second threshold greater than the first threshold, and the determining the classification result of the image to be processed based on the confidence of the image to be processed and two preset confidence thresholds includes: determining that the classification result of the image to be processed is the negative category when the confidence of the image to be processed is smaller than the first threshold; determining that the classification result of the image to be processed is the unknown class if the confidence of the image to be processed is greater than the first threshold and less than the second threshold; and determining that the classification result of the image to be processed is the positive class under the condition that the confidence coefficient of the image to be processed is greater than a second threshold value.

In this way, by setting two confidence threshold values and simultaneously combining the confidence output by the target model obtained by training based on the negative sample, the positive sample and the difficult sample together, whether the classification result of the image to be processed is a positive type, a negative type or an unknown type can be accurately judged.

In some possible embodiments, the method further comprises: determining the first function based on the quantized positive category and the confidence degree predicted value of the model to be trained on the positive category; determining the second function based on the quantized negative category and the confidence degree predicted value of the model to be trained on the negative category; determining the third function based on the quantized unknown classes and the respective confidence degree predicted values of the model to be trained on the positive class and the negative class; wherein the quantized unknown class is between the quantized negative class and the quantized positive class.

In this way, the trained target model can predict the confidence of the negative sample class to a value close to 0, predict the confidence of the positive sample to a value close to 1, and predict the confidence of the sample labeled as the unknown class, that is, the sample labeled with an unknown initial label, at an intermediate position.

In a second aspect, an embodiment of the present application provides a model training apparatus, including a first obtaining module, a training module, and a first determining module, where:

the first acquisition module is used for acquiring a training sample set; the training sample set at least comprises difficult samples with labels of unknown classes; the unknown class is other than a positive class or a negative class;

the training module is used for carrying out supervised training on a model to be trained by utilizing a quantized cross entropy loss function based on the training sample set; wherein the quantized cross-entropy loss functions comprise loss functions adapted to predict respective correspondences of the positive class, the negative class, and the unknown class;

the first determining module is used for obtaining a trained target model under the condition that the training result shows that the confidence degree of the predicted hard sample is between two preset confidence degree threshold values.

In a third aspect, an embodiment of the present application provides an electronic device, which includes a memory and a processor, where the memory stores a computer program that is executable on the processor, and the processor implements the steps in the model training method when executing the program.

In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps in the model training method described above.

The beneficial effects brought by the technical scheme provided by the embodiment of the application at least comprise:

in the embodiment of the application, firstly, a training sample set is obtained; then, based on the training sample set, performing supervised training on a model to be trained by utilizing a quantized cross entropy loss function; finally, under the condition that the training result shows that the predicted confidence coefficient of the difficult sample is between two preset confidence coefficient threshold values, a trained target model is obtained; therefore, the difficult samples with labels not in the positive type or the negative type are merged into the training sample set, and the quantified cross entropy loss function is set, so that the confidence degree of the difficult samples can be predicted in the middle of the confidence degrees of the positive and negative samples by the trained target model, the trained target model can be used for resisting the samples with undefined labels possibly encountered in practical use, the samples can not be predicted as the positive samples or the negative samples by the model, and the confused prediction of the difficult samples is avoided in the prediction stage.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings can be obtained by those skilled in the art without inventive efforts, wherein:

fig. 1 is a schematic flowchart of a model training method according to an embodiment of the present disclosure;

FIG. 2 is a schematic structural diagram of a model provided in an embodiment of the present application;

FIG. 3 is a schematic flow chart illustrating a model training method according to an embodiment of the present disclosure;

FIG. 4 is a schematic flow chart illustrating a model training method according to an embodiment of the present disclosure;

FIG. 5 is a schematic flow chart illustrating a model training method according to an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of a model training apparatus according to an embodiment of the present disclosure;

fig. 7 is a hardware entity diagram of an electronic device according to an embodiment of the present disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. The following examples are intended to illustrate the present application but are not intended to limit the scope of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.

It should be noted that the terms "first \ second \ third" referred to in the embodiments of the present application are only used for distinguishing similar objects and do not represent a specific ordering for the objects, and it should be understood that "first \ second \ third" may be interchanged under predetermined orders or sequences where possible, so that the embodiments of the present application described herein can be implemented in an order other than that illustrated or described herein.

It will be understood by those within the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which embodiments of the present application belong. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

The scheme provided by the embodiment of the application relates to the technical field of deep learning, and for facilitating understanding of the scheme of the embodiment of the application, terms related to the related technology are briefly explained at first:

artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science, attempting to understand the essence of intelligence and producing a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like. The embodiment of the application relates to a machine learning technology.

Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.

And II, classification: indicating that there are two categories in the classification task, e.g. we want to identify whether a picture is a cat or not. That is, a classifier is trained to input a picture, represented by the feature vector x, and to output whether a cat is present or not, represented by y being 0 or 1. The second class of classification assumes that each sample is set to one and only one label, 0 or 1.

Confidence interval: and under a certain confidence coefficient, taking the measured value as the center, and obtaining the value range of the true value. The range of values of the true value (reliability range) under a certain probability is called a confidence interval. The probability is called confidence probability or confidence.

Confidence coefficient: centered on the measured value, the probability that a true value will occur within a certain range. Generally, 95% is set as a setting value of the confidence level (confidence level) in a normal case.

In the prediction process of the two-class image based on deep learning in the related art, the classification network model outputs a confidence level for representing the confidence level that the classification network model predicts the sample as positive (class 1), and the value of the confidence level is usually between 0 and 1. In the actual project, a classification confidence threshold is set, and the confidence output by the classification network model is combined to judge whether the prediction result of the two-classification image is a positive classification or a negative classification.

However, when a sample with an ambiguous label (ground truth) is input in the actual use process, the model cannot effectively predict the sample, and may be randomly classified into a positive class or a negative class, that is, the classification network model has a chaotic prediction on the sample with the ambiguous label in the prediction stage, so that the accuracy of the final negative class and the final positive class is affected.

The embodiment of the application provides a model training method which is applied to electronic equipment. The electronic device includes, but is not limited to, a mobile phone, a laptop, a tablet and a web-enabled device, a multimedia device, a streaming media device, a mobile internet device, a wearable device, or other types of devices. The functions implemented by the method can be implemented by calling program code by a processor in an electronic device, and the program code can be stored in a computer storage medium. The processor may be configured to perform the processing of the image classification process and the memory may be configured to store data required and data generated during the image classification process.

Fig. 1 is a schematic flow chart of a model training method provided in an embodiment of the present application, and as shown in fig. 1, the method at least includes the following steps:

step S110, acquiring a training sample set;

here, the training sample set includes at least a hard sample whose label is an unknown class; the unknown class is a class other than a positive class or a negative class. The positive category is a label of a positive sample, i.e., a positive valid sample, and the negative category is a label of a negative sample, i.e., a negative valid sample. The hard sample cannot be determined as the middle sample of the positive sample or the negative sample.

It should be noted that each sample in the training sample set is labeled with a label. In practical application, target images meeting task requirements can be screened out from a training set according to classification tasks to be executed by a model to be trained and used as a training sample set, and each target image in the training sample set is labeled with a label of an image category to which the target image belongs.

For example, if the model to be trained is used to perform a cat and dog classification task, the training set may include a cat, a dog, and a target image that resembles both a cat and a dog as samples, and each sample is labeled as one of a cat class, a dog class, or an unknown class.

In some possible embodiments, the method further comprises: in the case where the positive class is 1 and the negative class is 0, it is determined that the unknown class is 0.5. That is, in the case where the label of the positive sample is class 1 and the label of the negative sample is class 0 in the training sample set, it is determined that the label of the difficult sample is class 0.5. In this way, the trained target model can predict the confidence of the negative sample class to a value close to 0, predict the confidence of the positive sample to a value close to 1, and predict the confidence of the sample labeled as the unknown class to a value close to 0.5, that is, predict the confidence of the sample labeled as the unknown class at an intermediate position.

Step S120, based on the training sample set, performing supervised training on a model to be trained by utilizing a quantized cross entropy loss function;

here, the model to be trained is a neural network model composed of a deep network and a classifier of two classes.

In one possible embodiment, the model to be trained is used for a two-classification task, also referred to as a classification model. In a possible embodiment, the network structure of the model is as shown in fig. 2, and the model 20 includes a feature extraction network 21, a fully-connected layer 22 and an output layer 23, where the feature extraction network 21 is configured to extract features of a sample image 201 to obtain a feature map corresponding to the sample image 201, the fully-connected layer 22 is configured to obtain a classification result and a confidence 202 of the prediction based on the feature map, and the output layer 23 is configured to output the classification result and the confidence 202 of the prediction.

In implementation, the full connection layer 22 may process the feature distribution of the sample image 201 to obtain a predicted classification distribution corresponding to the feature distribution of the sample image 201, and determine an average of the predicted classification distributions as a classification result. The confidence level can be determined according to the variance of the classification result, that is, the variance of the prediction result is calculated and the calculated variance is used as the confidence level of the model output.

Here, the quantized cross entropy loss function includes a loss function suitable for predicting the respective corresponding loss functions of the positive class, the negative class, and the unknown class, that is, the improved cross entropy loss function can support a training process of a hard sample labeled as an unknown class, and at the same time, a model to be trained can be trained in the same training manner as the original cross entropy loss function.

It should be noted that the general form of the cross entropy loss function is shown in formula (1):

wherein L is cross entropy loss, y represents the label of the training sample, log is a logarithmic function,

the confidence on the positive class for this training sample predicted by the model,

the confidence on the negative class for this training sample predicted by the model.

In the case where the label of the positive exemplar is set to be class 1 (i.e., positive class) and the label of the negative exemplar is set to be class 0 (i.e., negative class), the above formula (1) may be equivalent to the following formula (2):

since the general form of the cross-entropy loss function described above only supports training of positive and negative classes, and not difficult samples with ambiguous labels. Therefore, in the training process, the label of the normal positive and negative samples is kept unchanged, and for the difficult samples with ambiguous labels, the labels of the difficult samples are set to be unknown classes and used for training the model together with the positive and negative samples. Meanwhile, the cross entropy loss function of the quantization is set as follows:

and when the value of the label y of the training sample is M (a numerical value larger than 0 and smaller than 1), the classification loss of the difficult sample can be calculated.

And step S130, obtaining a trained target model under the condition that the training result shows that the confidence of the predicted difficult sample is between two preset confidence threshold values.

Here, the two preset confidence thresholds respectively represent a minimum confidence that the positive sample is predicted to be in the positive category and a maximum confidence that the negative sample is predicted to be in the negative category. Taking a positive class of 0 and a negative class of 1 as an example, two confidence thresholds of 0.3 and 0.7, respectively, may be set.

The trained target model can predict the confidence coefficient of the difficult sample between the confidence coefficient of the negative sample and the confidence coefficient of the positive sample, so that the classification result of the difficult sample is predicted into unknown classes except for the positive class and the negative class, and the final prediction accuracy on the negative class and the positive class is prevented from being influenced.

In the embodiment of the application, a hard sample with a label not in a positive category or a negative category is trained together with a positive sample and a negative sample, and a quantified cross entropy loss function is set, so that the confidence degree of the hard sample can be predicted in the middle of the confidence degrees of the positive sample and the negative sample by a trained target model, the trained target model can be used for resisting samples with undefined labels possibly encountered in practical use, the samples can not be predicted as the positive sample or the negative sample by the model, and the chaotic prediction of the hard samples in a prediction stage is avoided.

Fig. 3 is a schematic flow chart of a model training method provided in an embodiment of the present application, and as shown in fig. 3, the method at least includes the following steps:

step S310, acquiring a training sample set;

here, the training sample set includes at least a hard sample whose label is an unknown class; the unknown class is a class other than a positive class or a negative class.

Step S320, performing supervised training on a model to be trained by utilizing a quantized cross entropy loss function based on the training sample set;

here, the quantized cross-entropy loss function comprises a loss function adapted to predict respective correspondences of the positive class, the negative class, and the unknown class.

Step S330, under the condition that the training result shows that the confidence degree of the difficult sample is between a first threshold value and a second threshold value, obtaining a trained target model;

here, taking the positive category as 0 and the negative category as 1 as an example, the first threshold and the second threshold may be set to 0.3 and 0.7, respectively. The trained target model can predict the confidence of the hard sample in the confidence of the positive and negative samples, and can be used for resisting the sample with ambiguous label which is possibly encountered in practical use.

Step S340, obtaining the trained target model when the training result indicates that the predicted confidence of the negative sample satisfies the first threshold, and/or the predicted confidence of the positive sample satisfies the second threshold.

Here, through the training process of steps S310 to S330, the final trained target model can predict the confidence distribution of negative samples to 0 to 0.1 as much as possible, predict the confidence distribution of hard samples with unclear labels to 0.45 to 0.55 as much as possible, and predict the confidence distribution of positive samples to 0.9 to 1.0 as much as possible.

In the embodiment of the application, through setting the first threshold and the second threshold, the difficult sample with an undefined label can be identified, so that the accuracy rate of the positive and negative samples can be ensured. The method and the device for predicting the positive and negative type samples can effectively solve the problem that the model possibly encounters in the use of an actual scene that the label is not clear, learning of original positive and negative types is not influenced, and meanwhile the sample with the unclear label does not influence the accuracy rate of the positive and negative type samples.

Based on fig. 1 and fig. 4 are schematic flow diagrams of a model training method provided in an embodiment of the present application, and as shown in fig. 4, the step S120 or the step S320 "performing supervised training on a model to be trained by using a quantized cross entropy loss function based on the training sample set" may be implemented by the following steps:

step S410, inputting the training sample set into the model to be trained to obtain the prediction confidence of each sample in the training sample set;

here, the prediction confidence of each sample is the confidence that the corresponding sample predicted by the model to be trained belongs to a certain class.

Step S420, determining cross entropy loss of the training sample set by using the quantized cross entropy loss function based on the prediction confidence and the corresponding label of each sample;

here, the cross-entropy penalty is a mathematical representation used in the training process to measure the degree of error in the classifier output. The larger the prediction error degree is, the larger the value of the cross entropy loss is, so that the magnitude of the cross entropy loss can be used for representing the difference degree between the prediction result and the real value, and the performance of the model for executing the classification task can be measured. In one embodiment, the cross-entropy loss may be calculated based on the predicted classification results, the confidence, the label, and a cross-entropy loss function. The cross entropy loss function is a quantized cross entropy loss function, and can support a training process of a hard sample with an unknown label.

In some possible embodiments, the quantized cross-entropy loss function comprises a first function for predicting the positive class correspondence, a second function for the negative class correspondence, and a third function for the unknown class correspondence; the cross-entropy loss of the training sample set may be obtained by:

step 4201, for a sample labeled as the positive class in the training sample set, determining a first loss between a prediction confidence of the corresponding sample and the positive class by using the first function;

here, the positive class is quantized to 1, and the first function may be determined based on the quantized positive class and a confidence prediction value of the model to be trained on the positive class, for example, the first function is in the above formula (3)

The confidence prediction value of the training sample on the positive class predicted by the model.

Step 4202, for the sample labeled as the negative class in the training sample set, determining a second loss between the prediction confidence of the corresponding sample and the negative class using the second function;

here, the negative class is quantized to 0, and the second function may be determined based on the quantized negative class and a confidence prediction value of the model to be trained on the negative class, for example, the second function is in the above formula (3)

Step 4203, for the sample labeled as the unknown class in the training sample set, determining a third loss between the prediction confidence of the corresponding sample and the unknown class using the third function;

here, the unknown class is quantized to M, which is a number greater than 0 and less than 1, e.g., 0.5. The third function may be determined based on the quantized unknown class and the respective confidence prediction values of the model to be trained on the positive class and the negative class, for example, the third function is shown in the above formula (3)

Step 4204, determining a cross-entropy penalty for the set of training samples based on the first penalty, the second penalty, and the third penalty.

In this way, the first loss between the prediction confidence of the model for the positive sample output and the positive class is determined through the first function, the second loss between the prediction confidence of the model for the negative sample output and the negative class is determined through the second function, and the third loss between the prediction confidence of the model for the difficult sample output and the unknown class is determined through the third function, so that the cross entropy loss of the training sample set can be further accurately calculated.

In the implementation, the prediction confidence of each sample output by the model to be trained in one training process is input into the loss function corresponding to the class to which the corresponding sample belongs, the classification loss of the corresponding sample is calculated, and then the classification losses calculated by each sample in the training sample set are summed, so that the cross entropy loss of the training sample set can be obtained.

And step S430, carrying out back propagation training on the model to be trained based on the cross entropy loss.

And updating the parameters of the model to be trained based on the cross entropy loss, and continuously predicting the classification result and the confidence coefficient of each sample by using the model with the updated parameters, thereby realizing the back propagation training process.

In practical application, the partial derivatives of the cross entropy loss can be solved through algorithms such as gradient descent and the like, so that the parameters in the model, such as the parameters of the feature extraction network and the parameters of the full connection layer in the model, are adjusted by using the partial derivatives, and the training is stopped when the parameters of the model converge, so that the trained model is obtained.

In the embodiment of the application, the classification loss of each sample predicted by the model to be trained in one training process is calculated through a cross entropy function, then the classification losses calculated by each sample are summed, so that the cross entropy loss of a training sample set can be obtained, then the parameters of the model to be trained are updated based on the cross entropy loss, and the training is stopped until the parameters of the model converge. Therefore, the trained target model can be used for resisting the sample with the undefined label, which is possibly encountered in practical use, and the learning of the original positive and negative classes is not influenced, and meanwhile, the sample with the undefined label does not influence the accuracy rate of the sample with the positive and negative classes.

Based on fig. 1, fig. 5 is a schematic flowchart of a model training method provided in an embodiment of the present application, and as shown in fig. 5, after a trained target model is obtained, the method further includes the following steps:

step S140, acquiring an image to be processed;

here, in some implementations, the image to be processed may be an image acquisition device provided on the electronic device, such as an image acquired by a camera module in real time; in other implementation manners, the image to be processed may be an image that is transmitted to the electronic device by other devices in an instant messaging manner for classification prediction; in still other implementation manners, the electronic device may also call the local album and obtain the to-be-processed image therefrom in response to the task processing instruction, which is not limited in this embodiment of the application.

S150, carrying out classification prediction on the image to be processed by using the trained target model to obtain the confidence coefficient of the image to be processed;

here, the target model is trained based on a negative sample, a positive sample and a hard sample; the label of the negative sample is a negative class, the label of the positive sample is a positive class, and the label of the hard sample is an unknown class between the positive class and the negative class, for example, the positive class is 1, the negative class is 0, and the unknown class is a class between 0 and 1, that is, the label of the hard sample is neither the positive class nor the negative class.

The target model may adopt a Neural Network model for performing a classification task, such as a CNN (Convolutional Neural Network), a VGG (Visual Geometry Group), a ResNet (Residual Neural Network), a google Network model, and the like, and the structure of the model is not limited in the embodiments of the present application.

In implementation, the embodiment of the application realizes image classification prediction by combining a model composed of a depth network and a classifier of two classifications. In the prediction process, the model outputs a confidence level of the image to be processed, which is used to represent the confidence level that the model predicts the image to be processed as a positive class, and the value of the confidence level is usually between 0 and 1.

It should be noted that, in the related art, since the model is learned in the training stage based on the explicitly labeled negative and positive samples, when the explicitly labeled sample is encountered as an input in the actual use process, the model cannot effectively predict the sample, and may be randomly classified into the positive or negative class.

In the embodiment of the application, unknown classes between the positive classes and the negative classes are set for the labels of the difficult samples, and the difficult samples and the positive and negative samples are used as training data in the training process, so that the confidence degree of the difficult samples can be predicted between the confidence degrees of the positive and negative samples by a target model which is finally trained.

Step S160, determining a classification result of the image to be processed based on the confidence of the image to be processed and two preset confidence thresholds.

Here, the two preset confidence thresholds respectively represent a minimum confidence for determining that the image to be processed is in a positive category and a maximum confidence for determining that the image to be processed is in a negative category.

In an actual item, the two confidence thresholds include a first threshold and a second threshold that is greater than the first threshold. Taking the classification labels as 0 and 1, i.e. a positive class of 1 and a negative class of 0, a first threshold of 0.3 and a second threshold of 0.7, or a first threshold of 0.4 and a second threshold of 0.6, may be set.

In some embodiments, in a case that the confidence of the image to be processed is smaller than a first threshold, determining that the classification result of the image to be processed is the negative class; in some embodiments, in a case that the confidence of the image to be processed is greater than a first threshold and less than a second threshold, determining that the classification result of the image to be processed is the unknown class; in some embodiments, in a case that the confidence of the image to be processed is greater than a second threshold, the classification result of the image to be processed is determined to be the positive class.

Therefore, by setting two confidence threshold values and combining the confidence output by the model obtained by training based on the negative sample, the positive sample and the difficult sample, whether the classification result of the image to be processed is a positive type, a negative type or an unknown type can be accurately judged, the chaotic prediction of the image to be processed of the unknown type in a prediction stage is avoided, and the accuracy rate of image classification can be ensured.

In the embodiment of the application, firstly, an image to be processed is obtained; then, carrying out classification prediction on the image to be processed by using the trained target model to obtain the confidence of the image to be processed; finally, determining a classification result of the image to be processed based on the confidence coefficient of the image to be processed and two preset confidence coefficient thresholds; therefore, by setting two confidence threshold values and combining the confidence output by the target model obtained by training based on the negative sample, the positive sample and the difficult sample, whether the classification result of the image to be processed is a positive type, a negative type or an unknown type can be accurately judged, and the influence on the accuracy of the final negative type and the accuracy of the final positive type caused by chaotic prediction of the image of the unknown type is avoided.

Based on the foregoing embodiments, an image classification apparatus is further provided in an embodiment of the present application, where the apparatus includes modules and units included in the modules, and may be implemented by a processor in an electronic device; of course, the implementation can also be realized through a specific logic circuit; in the implementation process, the Processor may be a Central Processing Unit (CPU), a microprocessor Unit (MPU), a Digital Signal Processor (DSP), a Field Programmable Gate Array (FPGA), or the like.

Fig. 6 is a schematic structural diagram of a model training apparatus provided in an embodiment of the present application, and as shown in fig. 6, the apparatus 600 includes a first obtaining module 610, a training module 620, and a first determining module 630, where:

the first obtaining module 610 is configured to obtain a training sample set; the training sample set at least comprises difficult samples with labels of unknown classes; the unknown class is other than a positive class or a negative class;

the training module 620 is configured to perform supervised training on a model to be trained by using a quantized cross entropy loss function based on the training sample set; wherein the quantized cross-entropy loss functions comprise loss functions adapted to predict respective correspondences of the positive class, the negative class, and the unknown class;

the second determining module 630 is configured to obtain a trained target model when the training result indicates that the confidence of the predicted hard sample is between two preset confidence thresholds.

In some possible embodiments, the two confidence thresholds include a first threshold and a second threshold greater than the first threshold, the training sample set further includes positive samples labeled as positive classes and negative samples labeled as negative classes, and the apparatus further includes a second determining module for obtaining the trained target model if the training result indicates that the predicted confidence of the negative samples satisfies the first threshold and/or the predicted confidence of the positive samples satisfies the second threshold.

In some possible embodiments, the training module includes a first training submodule, a first determination submodule, and a second training submodule, wherein: the first training submodule is used for inputting the training sample set into the model to be trained to obtain the prediction confidence of each sample in the training sample set; the first determining submodule is used for determining cross entropy loss of the training sample set by using the quantized cross entropy loss function based on the prediction confidence coefficient and the corresponding label of each sample; and the second training submodule is used for carrying out back propagation training on the model to be trained on the basis of the cross entropy loss.

In some possible embodiments, the quantized cross-entropy loss function comprises a first function for predicting the positive class correspondence, a second function for the negative class correspondence, and a third function for the unknown class correspondence; the first determining submodule comprises a first determining unit, configured to determine, for a sample labeled as the positive class in the training sample set, a first loss between a prediction confidence of the corresponding sample and the positive class by using the first function; a second determining unit, configured to determine, for the samples labeled as the negative class in the training sample set, a second loss between the prediction confidence of the corresponding sample and the negative class by using the second function; a third determining unit, configured to determine, for a sample labeled as the unknown class in the training sample set, a third loss between the prediction confidence of the corresponding sample and the unknown class by using the third function; a fourth determining unit, configured to determine a cross-entropy loss of the training sample set based on the first loss, the second loss, and the third loss.

In some possible embodiments, the apparatus further includes a second acquiring module, configured to acquire an image to be processed; the prediction module is used for carrying out classification prediction on the image to be processed by utilizing the trained target model to obtain the confidence coefficient of the image to be processed; and the third determining module is used for determining the classification result of the image to be processed based on the confidence coefficient of the image to be processed and the two preset confidence coefficient thresholds.

In some possible embodiments, the two confidence thresholds include a first threshold and a second threshold greater than the first threshold, and the third determination module includes: a fifth determining unit, configured to determine that the classification result of the to-be-processed image is the negative category when the confidence of the to-be-processed image is smaller than the first threshold; a sixth determining unit, configured to determine that the classification result of the to-be-processed image is the unknown class if the confidence of the to-be-processed image is greater than the first threshold and smaller than the second threshold; a seventh determining unit, configured to determine that the classification result of the to-be-processed image is the positive class if the confidence of the to-be-processed image is greater than the second threshold.

In some possible embodiments, the apparatus further includes a fourth determining module, configured to determine the first function based on the quantized positive class and a confidence prediction value of the model to be trained on the positive class; a fifth determining module, configured to determine the second function based on the quantized negative category and a confidence prediction value of the model to be trained on the negative category; a sixth determining module, configured to determine the third function based on the quantized unknown class and respective confidence prediction values of the model to be trained on the positive class and the negative class; wherein the quantized unknown class is between the quantized negative class and the quantized positive class.

Here, it should be noted that: the above description of the apparatus embodiments, similar to the above description of the method embodiments, has similar beneficial effects as the method embodiments. For technical details not disclosed in the embodiments of the apparatus of the present application, reference is made to the description of the embodiments of the method of the present application for understanding.

It should be noted that, in the embodiment of the present application, if the above-mentioned software function module is implemented and sold or used as a stand-alone product, it may also be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present application may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for enabling an electronic device (which may be a smartphone with a camera, a tablet computer, etc.) to execute all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read Only Memory (ROM), a magnetic disk, or an optical disk. Thus, embodiments of the present application are not limited to any specific combination of hardware and software.

Correspondingly, the present application provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the steps in any of the model training methods in the foregoing embodiments. Correspondingly, in an embodiment of the present application, a chip is further provided, where the chip includes a programmable logic circuit and/or program instructions, and when the chip runs, the chip is configured to implement the steps in any of the model training methods in the foregoing embodiments. Correspondingly, in an embodiment of the present application, there is also provided a computer program product, which is used to implement the steps in the model training method in any one of the above embodiments when the computer program product is executed by a processor of an electronic device.

Based on the same technical concept, the embodiment of the present application provides an electronic device, which is used for implementing the model training method described in the above method embodiment. Fig. 7 is a hardware entity diagram of an electronic device according to an embodiment of the present application, as shown in fig. 7, the electronic device 700 includes a memory 710 and a processor 720, the memory 710 stores a computer program that can be executed on the processor 720, and the processor 720 executes the computer program to implement steps in any of the model training methods according to the embodiment of the present application.

The Memory 710 is configured to store instructions and applications executable by the processor 720, and may also buffer data (e.g., image data, audio data, voice communication data, and video communication data) to be processed or already processed by the processor 720 and modules in the electronic device, and may be implemented by a FLASH Memory (FLASH) or a Random Access Memory (RAM).

Processor 720, when executing the program, performs the steps in any of the model training methods described above. The processor 720 generally controls the overall operation of the electronic device 700.

The Processor may be at least one of an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a Central Processing Unit (CPU), a controller, a microcontroller, and a microprocessor. It is understood that the electronic device implementing the above-mentioned processor function may be other electronic devices, and the embodiments of the present application are not particularly limited.

The computer storage medium/Memory may be a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Programmable Read Only Memory (EPROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a magnetic Random Access Memory (FRAM), a Flash Memory (Flash Memory), a magnetic surface Memory, an optical Disc, or a Compact Disc Read-Only Memory (CD-ROM), and the like; and may be various electronic devices such as mobile phones, computers, tablet devices, personal digital assistants, etc., including one or any combination of the above-mentioned memories.

Here, it should be noted that: the above description of the storage medium and device embodiments is similar to the description of the method embodiments above, with similar advantageous effects as the method embodiments. For technical details not disclosed in the embodiments of the storage medium and apparatus of the present application, reference is made to the description of the embodiments of the method of the present application for understanding.

It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present application. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. It should be understood that, in the various embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application. The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of the unit is only a logical functional division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms. The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units; can be located in one place or distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiments of the present application.

In addition, all functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit. Alternatively, the integrated units described above in the present application may be stored in a computer-readable storage medium if they are implemented in the form of software functional modules and sold or used as independent products. Based on such understanding, the technical solutions of the embodiments of the present application may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing an automatic test line of a device to perform all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a removable storage device, a ROM, a magnetic or optical disk, or other various media that can store program code. The methods disclosed in the several method embodiments provided in the present application may be combined arbitrarily without conflict to obtain new method embodiments. The features disclosed in the several method or apparatus embodiments provided in the present application may be combined arbitrarily, without conflict, to arrive at new method embodiments or apparatus embodiments. The above description is only for the embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of model training, the method comprising:

acquiring a training sample set; the training sample set at least comprises difficult samples with labels of unknown classes; the unknown category is other categories except a positive category or a negative category;

2. The method of claim 1, wherein the two confidence thresholds comprise a first threshold and a second threshold that is greater than the first threshold, the training sample set further comprises positive samples labeled as the positive class and negative samples labeled as the negative class, the method further comprising:

and obtaining the trained target model under the condition that the training result shows that the predicted confidence degree of the negative sample meets the first threshold value and/or the predicted confidence degree of the positive sample meets the second threshold value.

3. The method of claim 1 or 2, wherein supervised training of a model to be trained using the quantized cross-entropy loss function based on the set of training samples comprises:

inputting the training sample set into the model to be trained to obtain the prediction confidence of each sample in the training sample set;

determining cross-entropy losses for the set of training samples using the quantized cross-entropy loss function based on the prediction confidence and corresponding label for each of the samples;

and carrying out back propagation training on the model to be trained on the basis of the cross entropy loss.

4. The method of claim 3, wherein the quantized cross-entropy loss function comprises a first function for predicting the positive class correspondence, a second function for the negative class correspondence, and a third function for the unknown class correspondence;

determining cross-entropy losses for the set of training samples using the quantized cross-entropy loss function based on the prediction confidence and the respective label for each of the samples, comprising:

for the samples labeled as the positive class in the training sample set, determining a first loss between a prediction confidence of the respective samples and the positive class by using the first function;

for samples in the training sample set labeled as the negative class, determining a second loss between the prediction confidence of the respective sample and the negative class using the second function;

for samples in the training sample set labeled as the unknown class, determining a third loss between the prediction confidence of the respective sample and the unknown class using the third function;

determining a cross-entropy penalty for the set of training samples based on the first penalty, the second penalty, and the third penalty.

5. The method of any of claims 1 to 4, further comprising:

acquiring an image to be processed;

carrying out classification prediction on the image to be processed by utilizing the trained target model to obtain the confidence coefficient of the image to be processed;

and determining a classification result of the image to be processed based on the confidence of the image to be processed and the two confidence thresholds.

6. The method of claim 5, wherein the two confidence thresholds include a first threshold and a second threshold that is greater than the first threshold, and wherein determining the classification result for the image to be processed based on the confidence of the image to be processed and the two confidence thresholds comprises:

determining that the classification result of the image to be processed is the negative category when the confidence of the image to be processed is smaller than the first threshold;

determining that the classification result of the image to be processed is the unknown class if the confidence of the image to be processed is greater than the first threshold and less than the second threshold;

and determining that the classification result of the image to be processed is the positive class if the confidence of the image to be processed is greater than the second threshold.

7. The method of any of claims 1 to 6, further comprising:

determining the first function based on the quantized positive category and the confidence degree predicted value of the model to be trained on the positive category;

determining the second function based on the quantized negative category and the confidence degree predicted value of the model to be trained on the negative category;

determining the third function based on the quantized unknown classes and the respective confidence degree predicted values of the model to be trained on the positive class and the negative class; wherein the quantized unknown class is between the quantized negative class and the quantized positive class.

8. A model training apparatus comprising a second obtaining module, a training module, and a second determining module, wherein:

the first acquisition module is used for acquiring a training sample set; the training sample set at least comprises difficult samples with labels of unknown classes; the unknown category is other categories except a positive category or a negative category;

9. An electronic device comprising a memory and a processor, the memory storing a computer program operable on the processor, wherein the processor implements the steps of the method of any one of claims 1 to 7 when executing the program.

10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.