CN114332529A - Training method and device for image classification model, electronic equipment and storage medium - Google Patents

Training method and device for image classification model, electronic equipment and storage medium Download PDF

Info

Publication number
CN114332529A
CN114332529A CN202111575684.4A CN202111575684A CN114332529A CN 114332529 A CN114332529 A CN 114332529A CN 202111575684 A CN202111575684 A CN 202111575684A CN 114332529 A CN114332529 A CN 114332529A
Authority
CN
China
Prior art keywords
recognized
picture
classification
pictures
loss
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111575684.4A
Other languages
Chinese (zh)
Inventor
黄海斌
曾子琦
马重阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dajia Internet Information Technology Co Ltd
Original Assignee
Beijing Dajia Internet Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dajia Internet Information Technology Co Ltd filed Critical Beijing Dajia Internet Information Technology Co Ltd
Priority to CN202111575684.4A priority Critical patent/CN114332529A/en
Publication of CN114332529A publication Critical patent/CN114332529A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

The disclosure relates to a training method and device of an image classification model, electronic equipment and a storage medium, and relates to the technical field of computers. The method comprises the following steps: acquiring a plurality of pictures to be identified; respectively inputting the multiple pictures to be recognized into the initial image classification model to obtain classification results corresponding to the multiple pictures to be recognized; based on the classification result, obtaining respective classification judgment information of a plurality of pictures to be recognized; determining target loss based on respective classification judgment information of at least two pictures to be recognized with the same classification result and respective initial characteristics of the at least two pictures to be recognized; and iteratively updating the parameters of the initial image classification model based on the target loss to obtain a target image classification model. According to the scheme disclosed by the invention, the target image classification model with higher prediction accuracy can be obtained, and then the image is classified based on the target image classification model, so that the accuracy of image classification can be improved.

Description

Training method and device for image classification model, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a method and an apparatus for training an image classification model, an electronic device, and a storage medium.
Background
At present, neural networks are widely applied in the fields of image classification, image recognition and the like. Specifically, a plurality of pictures and classification labels corresponding to the pictures can be obtained, and a certain neural network model is trained; and then inputting a certain picture to be recognized into the trained neural network model to obtain the category corresponding to the picture to be recognized.
However, the above-mentioned process of training the neural network model may require acquiring a large number of different types of pictures, and may require manually assigning corresponding classification labels to the large number of different types of pictures. Thus, when the number of training samples is insufficient or the accuracy of manually assigning the classification labels is low, the training efficiency of the neural network model may be affected, and the accuracy of image classification may be affected.
Disclosure of Invention
The present disclosure provides a training method, an apparatus, an electronic device and a storage medium for an image classification model, which solve the technical problem that in the training process of a neural network model, when the number of training samples is insufficient or the accuracy of manually assigning classification labels is low, the training efficiency of the neural network model may be affected, and thus the accuracy of image classification is affected.
The technical scheme of the embodiment of the disclosure is as follows:
according to a first aspect of the embodiments of the present disclosure, a training method of an image classification model is provided. The method can comprise the following steps: acquiring a plurality of pictures to be identified; respectively inputting the multiple pictures to be recognized into the initial image classification model to obtain classification results corresponding to the multiple pictures to be recognized; based on the classification result, acquiring respective classification judgment information of the multiple pictures to be recognized, wherein the classification judgment information of each picture to be recognized is used for representing whether the classification result corresponding to each picture to be recognized is correct or not; determining target loss based on respective classification judgment information of at least two pictures to be recognized with the same classification result and respective initial characteristics of the at least two pictures to be recognized, wherein the initial characteristics of each picture to be recognized in the at least two pictures to be recognized are obtained by inputting each picture to be recognized into the initial image classification model and then performing characteristic recognition; and iteratively updating the parameters of the initial image classification model based on the target loss to obtain a target image classification model.
Optionally, the obtaining of the respective classification discrimination information of the multiple pictures to be recognized based on the classification result specifically includes: inputting a target picture into the initial image classification model, and performing feature recognition to obtain initial features of the target picture, wherein the target picture is a picture with the corresponding real result identical to the classification result corresponding to a first picture to be recognized, and the first picture to be recognized is one of the pictures to be recognized; and when the similarity between the initial feature of the first to-be-identified picture and the initial feature of the target picture is greater than or equal to a similarity threshold value, acquiring first classification judgment information, wherein the first classification judgment information is used for representing that the classification judgment information of the first to-be-identified picture is correct.
Optionally, the training method of the image classification model further includes: and when the similarity between the initial feature of the first to-be-identified picture and the initial feature of the target picture is smaller than the similarity threshold, acquiring second classification judgment information, wherein the second classification judgment information is used for representing that the classification judgment information of the first to-be-identified picture is a classification error.
Optionally, the training method of the image classification model further includes: obtaining a first loss, wherein the first loss is used for representing the inconsistency degree between the real result of each identified picture in at least one identified picture and the predicted result of each identified picture in the initial image classification model; the determining the target loss based on the respective classification judgment information of the at least two pictures to be recognized with the same classification result and the respective initial features of the at least two pictures to be recognized specifically includes: determining a second loss according to the respective classification judgment information of the at least two pictures to be recognized and the respective initial characteristics of the at least two pictures to be recognized; determining a third loss according to the respective classification judgment information of the at least two pictures to be recognized; determining a sum of the first loss, the second loss, and the third loss as the target loss.
Optionally, the determining the second loss according to the classification and discrimination information of the at least two pictures to be recognized and the initial features of the at least two pictures to be recognized specifically includes: determining a distance function between the initial features of a first to-be-identified picture and the initial features of a second to-be-identified picture, wherein the distance function is used for representing the degree of inconsistency between the initial features of the first to-be-identified picture and the initial features of the second to-be-identified picture, the first to-be-identified picture is one of the at least two to-be-identified pictures, and the second to-be-identified picture is a picture of the at least two to-be-identified pictures except the first to-be-identified picture; and when the classification judgment information of the first picture to be recognized is the same as the classification judgment information of the second picture to be recognized, determining the distance function as the second loss.
Optionally, the training method of the image classification model further includes: and when the classification discrimination information of the first picture to be recognized is different from the classification discrimination information of the second picture to be recognized, determining a difference value between a preset constant and the distance function as a second loss.
Optionally, the determining, according to the respective classification and judgment information of the at least two pictures to be recognized, the third loss specifically includes: when the classification discrimination information of a first to-be-recognized picture is the same as the classification discrimination information of a second to-be-recognized picture, determining a first loss threshold as the third loss, wherein the first to-be-recognized picture is one of the at least two to-be-recognized pictures, and the second to-be-recognized picture is a picture other than the first to-be-recognized picture in the at least two to-be-recognized pictures; and when the classification judgment information of the first picture to be recognized is different from the classification judgment information of the second picture to be recognized, determining a second loss threshold as a third loss, wherein the second loss threshold is greater than the first loss threshold.
Optionally, the obtaining the first loss specifically includes: acquiring the at least one identified picture and a real result of each identified picture in the at least one identified picture; inputting a target identified picture into the initial image classification model to determine a target probability, the target probability being a probability that the target identified picture is predicted to be a target true result, the target identified picture being one of the at least one identified picture, the target true result being a true result of the target identified picture; determining the loss corresponding to the target recognized picture based on the target probability; determining the sum of losses corresponding to the at least one identified picture as the first loss, and acquiring the first loss.
According to a second aspect of the embodiments of the present disclosure, a training apparatus for an image classification model is provided. The apparatus may include: the device comprises an acquisition module, a processing module and a determination module; the acquisition module is configured to acquire a plurality of pictures to be identified; the processing module is configured to input the multiple pictures to be recognized into an initial image classification model respectively so as to obtain classification results corresponding to the multiple pictures to be recognized; the obtaining module is further configured to obtain respective classification judgment information of the multiple pictures to be recognized based on the classification result, wherein the classification judgment information of each picture to be recognized is used for representing whether the classification result corresponding to each picture to be recognized is correct or not; the determining module is further configured to determine a target loss based on respective classification discrimination information of at least two pictures to be recognized with the same classification result and respective initial features of the at least two pictures to be recognized, wherein the initial features of each picture to be recognized in the at least two pictures to be recognized are obtained by performing feature recognition after each picture to be recognized is input into the initial image classification model; the processing module is further configured to iteratively update parameters of the initial image classification model based on the target loss, resulting in a target image classification model.
Optionally, the classification discrimination information of each picture to be recognized includes classification correctness or classification mistake; the processing module is specifically configured to input a target picture into the initial image classification model, perform feature recognition, and obtain an initial feature of the target picture, where the target picture is a picture whose corresponding real result is the same as a classification result corresponding to a first to-be-recognized picture, and the first to-be-recognized picture is one of the multiple to-be-recognized pictures; the determining module is further configured to obtain first classification judgment information when the similarity between the initial feature of the first to-be-recognized picture and the initial feature of the target picture is greater than or equal to a similarity threshold, wherein the first classification judgment information is used for representing that the classification judgment information of the first to-be-recognized picture is classified correctly.
Optionally, the determining module is further configured to, when the similarity between the initial feature of the first to-be-identified picture and the initial feature of the target picture is smaller than the similarity threshold, obtain second classification judgment information, where the second classification judgment information is used to characterize that the classification judgment information of the first to-be-identified picture is a classification error.
Optionally, the obtaining module is further configured to obtain a first loss, where the first loss is used to characterize a degree of inconsistency between a real result of each of the at least one identified picture and a predicted result of each of the identified pictures in the initial image classification model; the determining module is specifically configured to determine a second loss according to the classification discrimination information of each of the at least two pictures to be recognized and the initial features of each of the at least two pictures to be recognized; the determining module is specifically configured to determine a third loss according to the respective classification discrimination information of the at least two pictures to be recognized; the determination module is specifically further configured to determine a sum of the first loss, the second loss, and the third loss as the target loss.
Optionally, the determining module is further specifically configured to determine a distance function between an initial feature of a first to-be-identified picture and an initial feature of a second to-be-identified picture, where the distance function is used to characterize a degree of inconsistency between the initial feature of the first to-be-identified picture and the initial feature of the second to-be-identified picture, the first to-be-identified picture is one of the at least two to-be-identified pictures, and the second to-be-identified picture is a picture of the at least two to-be-identified pictures other than the first to-be-identified picture; the determining module is further specifically configured to determine the distance function as the second loss when the classification discrimination information of the first to-be-recognized picture is the same as the classification discrimination information of the second to-be-recognized picture.
Optionally, the determining module is further specifically configured to determine, when the classification discrimination information of the first to-be-recognized picture is different from the classification discrimination information of the second to-be-recognized picture, a difference between a preset constant and the distance function as a second loss.
Optionally, the determining module is further configured to determine, when the classification discrimination information of a first to-be-recognized picture is the same as the classification discrimination information of a second to-be-recognized picture, a first loss threshold as the third loss, where the first to-be-recognized picture is one of the at least two to-be-recognized pictures, and the second to-be-recognized picture is a picture other than the first to-be-recognized picture of the at least two to-be-recognized pictures; the determining module is specifically configured to determine a second loss threshold as a third loss when the classification discrimination information of the first to-be-recognized picture is different from the classification discrimination information of the second to-be-recognized picture, where the second loss threshold is greater than the first loss threshold.
Optionally, the obtaining module is specifically configured to obtain the at least one identified picture and a true result of each identified picture in the at least one identified picture; the determination module is further configured to input a target identified picture to the initial image classification model to determine a target probability, the target probability being a probability that the target identified picture is predicted as a target true result, the target identified picture being one of the at least one identified picture, the target true result being a true result of the target identified picture; the determining module is further configured to determine a loss corresponding to the target recognized picture based on the target probability; the determining module is further configured to determine a sum of losses corresponding to the at least one identified picture as the first loss; the obtaining module is specifically further configured to obtain the first loss.
According to a third aspect of embodiments of the present disclosure, there is provided an electronic device, which may include: a processor and a memory configured to store processor-executable instructions; wherein the processor is configured to execute the instructions to implement the method of training an optional image classification model according to any one of the first aspect above.
According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium having instructions stored thereon, which, when executed by an electronic device, enable the electronic device to perform the method for training an optional image classification model according to any one of the first aspect.
According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product comprising computer instructions which, when run on an electronic device, cause the electronic device to perform a method of training an optional image classification model as in any of the first aspects.
The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:
based on any one of the above aspects, in the present disclosure, the electronic device may acquire a plurality of pictures to be recognized, and input the plurality of pictures to be recognized into the initial image classification model respectively, so as to obtain classification results corresponding to the plurality of pictures to be recognized; then, the electronic device obtains the classification discrimination information of the multiple pictures to be recognized based on the classification result, that is, whether the classification result corresponding to the multiple pictures to be recognized is correct is represented. The electronic device may determine a target loss based on the classification discrimination information of each of the at least two to-be-recognized pictures having the same classification result and the initial features of each of the at least two to-be-recognized pictures, which may be understood as determining a difference between the at least two to-be-recognized pictures based on the classification discrimination information and the initial features, and further determining a current loss of the initial image classification model; and then, iteratively updating the parameters of the initial image classification model based on the target loss to obtain a target image classification model. In the embodiment of the disclosure, the electronic device may obtain respective classification discrimination information of a plurality of pictures to be recognized to which no classification label is added, and since the classification discrimination information of each picture to be recognized in the plurality of pictures to be recognized is used to represent whether a classification result corresponding to each picture to be recognized is correct, and further, in combination with respective initial features of the at least two pictures to be recognized, the electronic device may determine a current loss (i.e., a target loss) of an initial image classification model, and iteratively update parameters of the initial image classification model based on the target loss, so as to obtain a target image classification model with higher prediction accuracy. The training efficiency of the image classification model can be improved, and the accuracy of image classification is further improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.
Fig. 1 is a flowchart illustrating a training method of an image classification model according to an embodiment of the present disclosure;
FIG. 2 is a flowchart illustrating a method for training an image classification model according to another embodiment of the present disclosure;
FIG. 3 is a flowchart illustrating a method for training an image classification model according to another embodiment of the present disclosure;
FIG. 4 is a flowchart illustrating a method for training an image classification model according to another embodiment of the present disclosure;
FIG. 5 is a flowchart illustrating a method for training an image classification model according to another embodiment of the present disclosure;
FIG. 6 is a flowchart illustrating a method for training a classification model of an image according to an embodiment of the present disclosure;
FIG. 7 is a flowchart illustrating a method for training a classification model of an image according to an embodiment of the present disclosure;
FIG. 8 is a flowchart illustrating a method for training a classification model according to another embodiment of the present disclosure;
fig. 9 is a schematic structural diagram illustrating a training apparatus for an image classification model according to an embodiment of the present disclosure;
fig. 10 is a schematic structural diagram illustrating a training apparatus for a further image classification model provided in an embodiment of the present disclosure.
Detailed Description
In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, and/or components.
The data to which the present disclosure relates may be data that is authorized by a user or sufficiently authorized by parties.
As described in the background art, when the number of training samples is insufficient or the accuracy of assigning classification labels manually is low, the training efficiency of the neural network model may be affected, and thus the accuracy of image classification may be affected.
Based on this, an embodiment of the present disclosure provides a training method for an image classification model, where an electronic device may obtain respective classification determination information of multiple to-be-recognized pictures to which classification labels are not added, and since the classification determination information of each to-be-recognized picture in the multiple to-be-recognized pictures is used to characterize whether a classification result corresponding to each to-be-recognized picture is correct, and then, in combination with respective initial features of the at least two to-be-recognized pictures, the electronic device may determine a current loss (i.e., a target loss) of an initial image classification model, and iteratively update parameters of the initial image classification model based on the target loss, so as to obtain a target image classification model with higher prediction accuracy. The training efficiency of the image classification model can be improved, and the accuracy of image classification is further improved.
The embodiment of the disclosure provides a training method and device for an image classification model, an electronic device and a storage medium, which are applied to an image classification scene. When the electronic device acquires a plurality of pictures to be identified, parameters of the initial image classification model can be updated according to the method provided by the embodiment of the disclosure, so that the target image classification model is obtained.
The following describes an exemplary training method of an image classification model according to an embodiment of the present disclosure with reference to the accompanying drawings:
it is understood that the electronic device performing the training method of the image classification model provided in the embodiments of the present disclosure may be a mobile phone, a tablet computer, a desktop computer, a laptop computer, a handheld computer, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a cellular phone, a Personal Digital Assistant (PDA), an Augmented Reality (AR) Virtual Reality (VR) device, and other devices that can install and use a content community application, and the present disclosure does not particularly limit the specific form of the electronic device. The system can be used for man-machine interaction with a user through one or more modes of a keyboard, a touch pad, a touch screen, a remote controller, voice interaction or handwriting equipment and the like.
As shown in fig. 1, a training method of an image classification model provided by an embodiment of the present disclosure may include S101-S105.
S101, obtaining a plurality of pictures to be identified.
It should be understood that the multiple pictures to be recognized are training data to which no label is assigned, that is, when the electronic device acquires the multiple pictures to be recognized, the real results corresponding to the multiple pictures to be recognized are not acquired.
S102, respectively inputting the multiple pictures to be recognized into the initial image classification model to obtain classification results corresponding to the multiple pictures to be recognized.
It should be understood that the initial image classification model may be a completed image classification model trained by the electronic device based on at least one recognized picture and a respective corresponding real result (or classification label) of the at least one recognized picture. And the classification result corresponding to each of the multiple pictures to be recognized is the prediction result of the initial image classification model on the multiple pictures to be recognized.
S103, obtaining respective classification judgment information of the multiple pictures to be recognized based on the classification result.
The classification judgment information of each picture to be recognized is used for representing whether the classification result corresponding to each picture to be recognized is correct or not.
It can be understood that, since the plurality of pictures to be recognized are training data without real results (or classification labels), the electronic device cannot determine the real results corresponding to the plurality of pictures to be recognized respectively. By obtaining the classification discrimination information of the multiple pictures to be recognized, it can be determined whether the classification result (i.e., prediction result) obtained after each picture to be recognized in the multiple pictures to be recognized is predicted by the initial image classification model is correct.
In an implementation manner of the embodiment of the present disclosure, the classification judgment information of each of the multiple to-be-recognized pictures may be obtained after manual judgment, and the electronic device may obtain the classification judgment information of each of the multiple to-be-recognized pictures after manual judgment.
In another implementation manner of the embodiment of the present disclosure, for each picture to be recognized, the electronic device may further compare the picture to be recognized with a target picture (a real result corresponding to the target picture is the same as a classification result corresponding to the picture to be recognized), and then determine and add classification judgment information to the picture to be recognized based on the comparison result.
And S104, determining target loss based on the classification discrimination information of the at least two pictures to be recognized with the same classification result and the initial characteristics of the at least two pictures to be recognized.
And the initial characteristic of each picture to be recognized in the at least two pictures to be recognized is obtained by performing characteristic recognition after each picture to be recognized is input into the initial image classification model.
Specifically, the initial image classification model may include an initial feature extractor and an initial classifier. The electronic equipment inputs each picture to be identified into the initial feature extractor to obtain the initial feature of each picture to be identified; and then inputting the initial characteristic of each picture to be recognized into the initial classifier to obtain a classification result corresponding to each picture to be recognized.
It can be understood that there may be corresponding to-be-identified pictures with the same classification result in the above-mentioned multiple to-be-identified pictures, and when the classification results corresponding to certain two to-be-identified pictures are the same, it indicates that the two to-be-identified pictures are determined or classified into the same category via the initial image classification model. In the embodiment of the disclosure, the electronic device may select or determine the at least two pictures to be recognized from any one of a plurality of categories.
It should be understood that although the classification results of the at least two pictures to be recognized are the same, the classification determination information of the at least two pictures to be recognized may be the same or different. When the classification judgment information corresponding to the at least two pictures to be recognized is the same, the initial features of the at least two pictures to be recognized may have a certain similarity; when the classification discrimination information corresponding to the at least two pictures to be recognized is different, the initial features corresponding to the at least two pictures to be recognized may have a certain difference therebetween.
In the embodiment of the disclosure, the electronic device may determine, based on the respective classification discrimination information of the at least two pictures to be recognized and the respective initial features of the at least two pictures to be recognized, a difference between the at least two pictures to be recognized, and further determine the target loss, that is, determine the current loss of the initial image classification model.
And S105, iteratively updating parameters of the initial image classification model based on the target loss to obtain a target image classification model.
In one implementation of the embodiment of the disclosure, the electronic device iteratively updates the parameters of the initial image classification model based on the target loss until the prediction accuracy of the current image classification model is greater than or equal to the accuracy threshold, at which time the current image classification model may be determined as the target image classification model.
The technical scheme provided by the embodiment can at least bring the following beneficial effects: S101-S105 show that the electronic equipment can acquire a plurality of pictures to be recognized and respectively input the pictures to be recognized into the initial image classification model to obtain classification results corresponding to the pictures to be recognized; then, the electronic device obtains the classification discrimination information of the multiple pictures to be recognized based on the classification result, that is, whether the classification result corresponding to the multiple pictures to be recognized is correct is represented. The electronic device may determine a target loss based on the classification discrimination information of each of the at least two to-be-recognized pictures having the same classification result and the initial features of each of the at least two to-be-recognized pictures, which may be understood as determining a difference between the at least two to-be-recognized pictures based on the classification discrimination information and the initial features, and further determining a current loss of the initial image classification model; and then, iteratively updating the parameters of the initial image classification model based on the target loss to obtain a target image classification model. In the embodiment of the disclosure, the electronic device may obtain respective classification discrimination information of a plurality of pictures to be recognized to which no classification label is added, and since the classification discrimination information of each picture to be recognized in the plurality of pictures to be recognized is used to represent whether a classification result corresponding to each picture to be recognized is correct, and further, in combination with respective initial features of the at least two pictures to be recognized, the electronic device may determine a current loss (i.e., a target loss) of an initial image classification model, and iteratively update parameters of the initial image classification model based on the target loss, so as to obtain a target image classification model with higher prediction accuracy. The training efficiency of the image classification model can be improved, and the accuracy of image classification is further improved.
With reference to fig. 1, as shown in fig. 2, in an implementation manner of the embodiment of the present disclosure, the classification judgment information of each to-be-recognized picture includes a correct classification or a wrong classification, and the obtaining of the classification judgment information of each of the to-be-recognized pictures based on the classification result includes S1031 to S1032.
And S1031, inputting the target picture into the initial image classification model, and performing feature recognition to obtain initial features of the target picture.
The target picture is a picture with the same corresponding real result as the classification result corresponding to the first picture to be recognized, and the first picture to be recognized is one of the pictures to be recognized.
It should be understood that the target picture may be one of the at least one identified pictures, i.e. the target picture has a corresponding real result (or classification label). In this embodiment, the electronic device may determine, from the at least one identified picture, a picture whose true result is the same as the classification result corresponding to the first to-be-identified picture, and determine the picture as the target picture, thereby obtaining the initial feature of the target picture.
S1032, when the similarity between the initial feature of the first to-be-identified picture and the initial feature of the target picture is larger than or equal to a similarity threshold, acquiring first classification judgment information.
The first classification judgment information is used for representing that the classification judgment information of the first picture to be recognized is correct.
With reference to the description of the above embodiments, it should be understood that the electronic device may input the first to-be-recognized picture into the initial image classification model, and perform feature recognition to obtain the initial features of the first to-be-recognized picture. When the similarity between the initial feature of the first to-be-recognized picture and the initial feature of the target picture is greater than or equal to the similarity threshold, it is indicated that the initial feature of the first to-be-recognized picture is similar to the initial feature of the target picture, that is, the first to-be-recognized picture is similar to the target picture, so that the electronic device can determine that the classification result corresponding to the first to-be-recognized picture is correct, and then obtain the first classification judgment information.
The technical scheme provided by the embodiment can at least bring the following beneficial effects: from S1031 to S1032, the electronic device may input the target picture (i.e., the picture whose corresponding real result is the same as the classification result corresponding to the first to-be-recognized picture) to the initial image classification model, perform feature recognition, and obtain the initial feature of the target picture; when the similarity between the initial feature of the first to-be-identified picture and the initial feature of the target picture is greater than or equal to the similarity threshold, it is indicated that the initial feature of the first to-be-identified picture is similar to the initial feature of the target picture, that is, the first to-be-identified picture is similar to the target picture, so that the electronic device can determine that the classification result corresponding to the first to-be-identified picture is correct, and acquire first classification judgment information representing that the first to-be-identified information is correctly classified. In the embodiment of the disclosure, the electronic device may determine whether the classification result of the first to-be-recognized picture is correct based on the similarity between the target picture and the first to-be-recognized picture, and acquire the first classification judgment information when the classification result of the first to-be-recognized picture is correct. The method can accurately and effectively acquire the classification discrimination information of each picture to be recognized, improve the training efficiency of the image classification model and improve the prediction accuracy of the target image classification model.
With reference to fig. 2, as shown in fig. 3, the training method for an image classification model provided in the embodiment of the present disclosure further includes S1033.
And S1033, when the similarity between the initial feature of the first to-be-identified picture and the initial feature of the target picture is smaller than a similarity threshold, acquiring second classification judgment information.
And the second classification judgment information is used for representing that the classification judgment information of the first picture to be recognized is a classification error.
It can be understood that, when the similarity between the initial feature of the first to-be-recognized picture and the initial feature of the target picture is smaller than the similarity threshold, it indicates that the initial feature of the first to-be-recognized picture is not similar to the initial feature of the target picture, specifically, the first to-be-recognized picture is not similar to the target picture, so that the electronic device can determine that the classification result corresponding to the first to-be-recognized picture is incorrect, i.e., acquire the second classification judgment information.
The technical scheme provided by the embodiment can at least bring the following beneficial effects: s1033 shows that, when the similarity between the initial feature of the first to-be-recognized picture and the initial feature of the target picture is smaller than the similarity threshold, it indicates that the initial feature of the first to-be-recognized picture is not similar to the initial feature of the target picture, that is, the first to-be-recognized picture is not similar to the target picture, so that the electronic device can determine that the classification result corresponding to the first to-be-recognized picture is incorrect, that is, obtain the second classification judgment information that represents that the first to-be-recognized information is misclassified. In the embodiment of the disclosure, the electronic device may determine whether the classification result of the first to-be-recognized picture is correct based on the similarity between the target picture and the first to-be-recognized picture, and acquire the second classification judgment information when the classification result of the first to-be-recognized picture is wrong. The method can accurately and effectively acquire the classification discrimination information of each picture to be recognized, improve the training efficiency of the image classification model and improve the prediction accuracy of the target image classification model.
With reference to fig. 1, as shown in fig. 4, the training method for an image classification model provided by the embodiment of the present disclosure further includes S106.
And S106, obtaining a first loss.
And the first loss is used for representing the inconsistency degree between the real result of each identified picture in the at least one identified picture and the predicted result of each identified picture in the initial image classification model.
Continuing with fig. 4, determining the target loss based on the classification discrimination information of the at least two pictures to be recognized having the same classification result and the initial features of the at least two pictures to be recognized includes S1041-S1043.
S1041, determining a second loss according to the respective classification judgment information of the at least two pictures to be recognized and the respective initial characteristics of the at least two pictures to be recognized.
It should be understood that the second loss is used to characterize the degree of inconsistency between the features of the picture to be recognized, the classification discrimination information of which is correctly classified, and the features of the picture to be recognized, the classification discrimination information of which is wrongly classified.
S1042, determining a third loss according to the respective classification judgment information of the at least two pictures to be recognized.
It will be appreciated that this third loss is used to characterize the loss caused by the classification discrimination information.
It should be noted that the execution order of S1041 and S1042 is not limited in the embodiments of the present disclosure. For example, S1041 may be executed first and then S1042 may be executed, S1042 may be executed first and then S1041 may be executed, and S1041 and S1042 may be executed simultaneously, for convenience of description, in fig. 3, S1041 is executed first and then S1042 is executed.
And S1043, determining the sum of the first loss, the second loss and the third loss as a target loss.
The technical scheme provided by the embodiment can at least bring the following beneficial effects: as can be seen from S106 and S1041 to S1043, the electronic device may obtain a first loss; then determining a second loss according to the respective classification and judgment information of at least two pictures to be recognized and the respective initial characteristics of the at least two pictures to be recognized, and determining a third loss according to the respective classification and judgment information of the at least two pictures to be recognized; then, the electronic device determines a sum of the first loss, the second loss, and the third loss as a target loss. Target loss can be completely and effectively determined, and then the training efficiency of the image classification model is improved.
With reference to fig. 4, as shown in fig. 5, in an implementation manner of the embodiment of the present disclosure, the determining the second loss according to the classification judgment information of each of the at least two pictures to be recognized and the initial feature of each of the at least two pictures to be recognized includes S1041a-S1041 b.
And S1041a, determining a distance function between the initial feature of the first picture to be recognized and the initial feature of the second picture to be recognized.
The distance function is used for representing the degree of inconsistency between the initial feature of the first to-be-identified picture and the initial feature of the second to-be-identified picture, the first to-be-identified picture is one of the at least two to-be-identified pictures, and the second to-be-identified picture is a picture, except the first to-be-identified picture, of the at least two to-be-identified pictures.
And S1041b, when the classification judgment information of the first picture to be recognized is the same as the classification judgment information of the second picture to be recognized, determining the distance function as a second loss.
With reference to the above description of the embodiments, it should be understood that when the classification determination information of the first to-be-recognized picture is the same as the classification determination information of the second to-be-recognized picture, it indicates that there is a certain similarity between the initial feature of the first to-be-recognized picture and the initial feature of the second to-be-recognized picture. In this way, the electronic device may determine a distance function between the initial feature of the first to-be-recognized picture and the initial feature of the second to-be-recognized picture as a second loss.
In an optional implementation manner, the determining the distance function as the second loss specifically may include: determining that the second loss satisfies the following equation:
Losscontrastive=dist(FA,FB)
therein, LosscontrastiveRepresents the second loss, FARepresenting an initial feature of the first picture to be recognized, FBAn initial feature, dist (F) representing the second picture to be recognizedA,FB) And representing a distance function between the initial feature of the first picture to be recognized and the initial feature of the second picture to be recognized.
In particular, dist () can be understood as a certain distance function that characterizes two elements (e.g., F)AAnd FB) The difference between them, when dist (F)A,FB) The value obtained, i.e. the second Loss (Loss)contrastive) When the picture is larger, the initial characteristic (F) of the first picture to be recognized is describedA) And the initial characteristic (F) of the second picture to be recognizedB) The greater the difference between; accordingly, when the second loss is smaller, it indicates that the difference between the initial feature of the first to-be-recognized picture and the initial feature of the second to-be-recognized picture is smaller (i.e., the two initial features are more similar).
The technical scheme provided by the embodiment can at least bring the following beneficial effects: as known from S1041a-S1041b, the electronic device may determine a distance function between the initial feature of the first picture to be recognized and the initial feature of the second picture to be recognized; when the classification judgment information of the first picture to be recognized is the same as the classification judgment information of the second picture to be recognized, the initial characteristic of the first picture to be recognized and the initial characteristic of the second picture to be recognized are similar to each other; since the distance function is used for characterizing the degree of inconsistency between the initial feature of the first to-be-recognized picture and the initial feature of the second to-be-recognized picture, the electronic device may determine the distance function as the second loss. In the embodiment of the disclosure, when the initial feature of the first to-be-recognized picture and the initial feature of the second to-be-recognized picture have a certain similarity, the electronic device may determine the distance function between the initial feature of the first to-be-recognized picture and the initial feature of the second to-be-recognized picture as the second loss, and may accurately and effectively determine the second loss, thereby being capable of improving the accuracy of determining the target loss.
With reference to fig. 5, as shown in fig. 6, the training method for an image classification model provided by the embodiment of the present disclosure further includes S1041 c.
S1041c, when the classification discrimination information of the first to-be-recognized picture is different from the classification discrimination information of the second to-be-recognized picture, determining a difference between a preset constant and the distance function as a second loss.
It should be understood that the distance function is a distance function between the initial feature of the first to-be-identified picture and the initial feature of the second to-be-identified picture. When the classification discrimination information of the first to-be-recognized picture is different from the classification discrimination information of the second to-be-recognized picture, it is indicated that the initial feature of the first to-be-recognized picture and the initial feature of the second to-be-recognized picture have certain difference. In this manner, the electronic device may determine the difference between the preset constant and the distance function as the second loss.
In an optional implementation manner, the determining, as the second loss, a difference between the preset constant and the distance function may specifically include: determining that the second loss satisfies the following equation:
Losscontrastive=ε-dist(FA,FB);
therein, LosscontrastiveRepresents the second loss, FARepresenting an initial feature of the first picture to be recognized, FBAn initial feature, dist (F) representing the second picture to be recognizedA,FB) And representing a distance function between the initial feature of the first picture to be recognized and the initial feature of the second picture to be recognized, wherein epsilon represents a preset constant and is larger than 0.
The technical scheme provided by the embodiment can at least bring the following beneficial effects: as can be seen from S1041c, when the classification determination information of the first to-be-recognized picture is different from the classification determination information of the second to-be-recognized picture, it indicates that there is a certain difference between the initial feature of the first to-be-recognized picture and the initial feature of the second to-be-recognized picture. Therefore, the electronic device can determine the second loss according to the difference between the preset constant and the distance function (specifically, the distance function between the initial feature of the first to-be-identified picture and the initial feature of the second to-be-identified picture), and can accurately and effectively determine the second loss, so that the accuracy of determining the target loss can be improved.
With reference to fig. 4, as shown in fig. 7, in an implementation manner of the embodiment of the present disclosure, the determining the third loss according to the classification judgment information of each of the at least two pictures to be recognized specifically includes S1042a-S1042 b.
S1042a, determining the first loss threshold as a third loss when the classification discrimination information of the first to-be-recognized picture is the same as the classification discrimination information of the second to-be-recognized picture.
In combination with the description of the above embodiment, it should be understood that the first to-be-identified picture is one of the at least two to-be-identified pictures, and the second to-be-identified picture is a picture other than the first to-be-identified picture in the at least two to-be-identified pictures.
S1042b, when the classification determination information of the first to-be-recognized picture is different from the classification determination information of the second to-be-recognized picture, determining the second loss threshold as a third loss.
Wherein the second loss threshold is greater than the first loss threshold.
In conjunction with the above description of the embodiments, it should be understood that the classification result corresponding to the first to-be-recognized picture is the same as the classification result corresponding to the second to-be-recognized picture. When the classification judgment information of the first picture to be recognized is the same as the classification judgment information of the second picture to be recognized, the initial characteristic of the first picture to be recognized and the initial characteristic of the second picture to be recognized are similar to each other; when the classification discrimination information of the first to-be-recognized picture is different from the classification discrimination information of the second to-be-recognized picture, it is indicated that a certain difference exists between the initial feature of the first to-be-recognized picture and the initial feature of the second to-be-recognized picture.
In the embodiment of the disclosure, when there is a certain difference between the initial feature of the first to-be-identified picture and the initial feature of the second to-be-identified picture, the third loss may be larger (corresponding to the second loss threshold), and when there is a certain similarity between the initial feature of the first to-be-identified picture and the initial feature of the second to-be-identified picture, the third loss may be smaller (corresponding to the first loss threshold)
Alternatively, the first loss threshold may be 0, and the first loss threshold may be 1.
The technical scheme provided by the embodiment can at least bring the following beneficial effects: as can be seen from S1042a-S1042b, when the classification discrimination information of the first to-be-recognized picture is the same as the classification discrimination information of the second to-be-recognized picture, the electronic device may determine the first loss threshold as a third loss; otherwise, that is, when the classification discrimination information of the first to-be-recognized picture is different from the classification discrimination information of the second to-be-recognized picture, the electronic device may determine a second loss threshold as a third loss, where the second loss threshold is greater than the first loss threshold. In the embodiment of the disclosure, the electronic device may assign different values (i.e., the first loss threshold or the second loss threshold) to the third loss based on the respective classification discrimination information of the at least two pictures to be recognized, and may quickly and effectively determine the third loss, thereby improving the training efficiency of the image classification model.
In one implementation of the embodiment of the present disclosure, as shown in fig. 8 in conjunction with fig. 4, the obtaining of the first loss includes S1061-S1064.
S1061, acquiring at least one identified picture and a real result of each identified picture in the at least one identified picture.
In conjunction with the above description of the embodiment, it should be understood that the at least one recognized picture is a picture with a real result (i.e., a classification label), and the initial image classification model is trained by the electronic device based on the at least one recognized picture and the real result corresponding to the at least one recognized picture.
S1062, inputting the target recognized picture into the initial image classification model to determine the target probability.
Wherein the target probability is a probability that the target identified picture is predicted as a target true result, the target identified picture is one of the at least one identified picture, and the target true result is a true result of the target identified picture.
It should be understood that the target probability is the probability that the target recognized picture is predicted by the initial image classification model as the true result corresponding to the target recognized picture. In an embodiment of the disclosure, the electronic device may input each of the at least one identified picture to the initial image classification model to determine a probability that each identified picture corresponds to.
It can be understood that, for the target recognized picture, the electronic device inputs the target recognized picture into the initial image classification model, and multiple classifications and respective probabilities of the multiple classifications can be obtained; then, the electronic device may determine a maximum value of the probabilities of the multiple classifications as the target probability, and determine the classification corresponding to the target probability as a classification result corresponding to the target recognized picture, where the classification result is the target real result.
And S1063, determining the loss corresponding to the target recognized picture based on the target probability.
It is to be appreciated that the electronic device can determine the loss corresponding to each identified picture based on the probability corresponding to each identified picture.
S1064, determining the sum of the losses corresponding to at least one identified picture as a first loss, and acquiring the first loss.
The technical scheme provided by the embodiment can at least bring the following beneficial effects: as known from S1061-S1064, the electronic device may obtain at least one recognized picture and a real result of each of the at least one recognized picture, and the electronic device inputs the target recognized picture (i.e., one of the at least one recognized picture) into the initial image classification model to determine a target probability (i.e., a probability that the target recognized picture is predicted to be a corresponding real result of the target recognized picture). The electronic device then determines, based on the target probability, a loss corresponding to the target identified picture, so far, the electronic device may determine a loss corresponding to each of the at least one identified picture. The electronic device may then determine a sum of losses corresponding to each of the at least one identified picture as a first loss, and obtain the first loss. According to the method and the device, the electronic equipment can accurately and effectively obtain the first loss, the obtaining efficiency of the first loss is improved, the determining efficiency of the target loss is further improved, and the prediction efficiency of the target image classification model is improved.
In an implementation manner of the embodiment of the present disclosure, the determining a loss corresponding to the target recognized picture based on the target probability includes step a.
Step A, determining that the loss corresponding to the target identified picture meets the following formula:
Figure BDA0003424741570000151
therein, LossceRepresenting the corresponding loss of the identified picture of the target, c representing the total number of real results,yiThe target probability is represented.
To this end, the electronic device may determine a loss corresponding to each of the at least one recognized picture based on the formula.
The technical scheme provided by the embodiment can at least bring the following beneficial effects: as can be seen from step a, the electronic device may accurately and effectively determine the loss corresponding to the target recognized picture based on a specific formula, that is, may accurately determine the loss corresponding to each recognized picture in the at least one recognized picture. And then the determining efficiency of the first loss is improved, and the training efficiency of the image classification model is improved.
It is understood that, in practical implementation, the electronic device according to the embodiments of the present disclosure may include one or more hardware structures and/or software modules for implementing the training method of the corresponding image classification model, and these hardware structures and/or software modules may constitute an electronic device. Those of skill in the art will readily appreciate that the present disclosure can be implemented in hardware or a combination of hardware and computer software for implementing the exemplary algorithm steps described in connection with the embodiments disclosed herein. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.
Based on such understanding, the embodiment of the present disclosure further provides a training apparatus for an image classification model, and fig. 9 illustrates a schematic structural diagram of the training apparatus for an image classification model provided by the embodiment of the present disclosure. As shown in fig. 9, the training device 10 for the image classification model may include: an acquisition module 101, a processing module 102 and a determination module 103.
The acquiring module 101 is configured to acquire a plurality of pictures to be recognized.
The processing module 102 is configured to input the multiple pictures to be recognized into the initial image classification model respectively, so as to obtain classification results corresponding to the multiple pictures to be recognized respectively.
The obtaining module 101 is further configured to obtain, based on the classification result, classification judgment information of each of the multiple pictures to be recognized, where the classification judgment information of each picture to be recognized is used to represent whether the classification result corresponding to each picture to be recognized is correct.
The determining module 103 is further configured to determine a target loss based on the classification discrimination information of each of the at least two pictures to be recognized having the same classification result and the initial features of each of the at least two pictures to be recognized, where the initial features of each of the at least two pictures to be recognized are obtained by performing feature recognition after each of the at least two pictures to be recognized is input into the initial image classification model.
The processing module 102 is further configured to iteratively update parameters of the initial image classification model based on the target loss, resulting in a target image classification model.
Optionally, the classification discrimination information of each picture to be recognized includes classification correctness or classification mistake.
The processing module 102 is specifically configured to input a target picture to the initial image classification model, perform feature recognition, and obtain an initial feature of the target picture, where the target picture is a picture whose corresponding real result is the same as a classification result corresponding to a first to-be-recognized picture, and the first to-be-recognized picture is one of the multiple to-be-recognized pictures.
The determining module 103 is further configured to, when the similarity between the initial feature of the first to-be-recognized picture and the initial feature of the target picture is greater than or equal to a similarity threshold, obtain first classification judgment information, where the first classification judgment information is used to characterize that the classification judgment information of the first to-be-recognized picture is correctly classified.
Optionally, the determining module 103 is further configured to, when the similarity between the initial feature of the first to-be-identified picture and the initial feature of the target picture is smaller than the similarity threshold, obtain second classification judgment information, where the second classification judgment information is used to characterize that the classification judgment information of the first to-be-identified picture is a classification error.
Optionally, the obtaining module 101 is further configured to obtain a first loss, where the first loss is used to characterize a degree of inconsistency between a real result of each of the at least one identified picture and a predicted result of each of the identified pictures in the initial image classification model.
The determining module 103 is specifically configured to determine the second loss according to the classification and discrimination information of each of the at least two pictures to be recognized and the initial features of each of the at least two pictures to be recognized.
The determining module 103 is specifically configured to determine a third loss according to the respective classification and discrimination information of the at least two pictures to be recognized.
The determining module 103 is further configured to determine a sum of the first loss, the second loss, and the third loss as the target loss.
Optionally, the determining module 103 is further specifically configured to determine a distance function between an initial feature of a first to-be-identified picture and an initial feature of a second to-be-identified picture, where the distance function is used to characterize a degree of inconsistency between the initial feature of the first to-be-identified picture and the initial feature of the second to-be-identified picture, the first to-be-identified picture is one of the at least two to-be-identified pictures, and the second to-be-identified picture is a picture of the at least two to-be-identified pictures other than the first to-be-identified picture.
The determining module 103 is further configured to determine the distance function as the second loss when the classification discrimination information of the first to-be-recognized picture is the same as the classification discrimination information of the second to-be-recognized picture.
Optionally, the determining module is further specifically configured to determine, when the classification discrimination information of the first to-be-recognized picture is different from the classification discrimination information of the second to-be-recognized picture, a difference between a preset constant and the distance function as a second loss.
Optionally, the determining module 103 is further specifically configured to determine, when the classification determination information of the first to-be-identified picture is the same as the classification determination information of the second to-be-identified picture, the first loss threshold as the third loss, where the first to-be-identified picture is one of the at least two to-be-identified pictures, and the second to-be-identified picture is a picture other than the first to-be-identified picture in the at least two to-be-identified pictures.
The determining module 103 is further specifically configured to determine a second loss threshold as a third loss when the classification determination information of the first to-be-recognized picture is different from the classification determination information of the second to-be-recognized picture, where the second loss threshold is greater than the first loss threshold.
Optionally, the obtaining module 101 is specifically configured to obtain the at least one identified picture and a real result of each identified picture in the at least one identified picture.
The determining module 103 is further configured to input a target identified picture to the initial image classification model to determine a target probability, the target probability being a probability that the target identified picture is predicted as a target true result, the target identified picture being one of the at least one identified picture, the target true result being a true result of the target identified picture.
The determining module 103 is further configured to determine a loss corresponding to the target recognized picture based on the target probability.
The determining module 103 is further configured to determine a sum of losses corresponding to the at least one identified picture as the first loss.
The obtaining module 101 is further specifically configured to obtain the first loss.
As described above, the embodiment of the present disclosure may perform functional module division on the training apparatus of the image classification model according to the above method example. The integrated module can be realized in a hardware form, and can also be realized in a software functional module form. In addition, it should be further noted that the division of the modules in the embodiments of the present disclosure is schematic, and is only a logic function division, and there may be another division manner in actual implementation. For example, the functional blocks may be divided for the respective functions, or two or more functions may be integrated into one processing block.
The specific way in which each module executes the operation and the beneficial effects of the training apparatus for the image classification model in the foregoing embodiment have been described in detail in the foregoing method embodiment, and are not described again here.
Fig. 10 is a schematic structural diagram of another training apparatus for an image classification model provided by the present disclosure. As illustrated by the training of the image classification model, the training apparatus 20 of the image classification model may comprise at least one processor 201 and a memory 203 for storing processor-executable instructions. Wherein the processor 201 is configured to execute the instructions in the memory 203 to implement the training method of the image classification model in the above embodiments.
In addition, the training apparatus 20 for image classification models may further include a communication bus 202 and at least one communication interface 204.
The processor 201 may be a Central Processing Unit (CPU), a micro-processing unit, an ASIC, or one or more integrated circuits for controlling the execution of programs according to the present disclosure.
The communication bus 202 may include a path that conveys information between the aforementioned components.
The communication interface 204 may be any device, such as a transceiver, for communicating with other devices or communication networks, such as an ethernet, a Radio Access Network (RAN), a Wireless Local Area Network (WLAN), etc.
The memory 203 may be a read-only memory (ROM) or other type of static storage device that can store static information and instructions, a Random Access Memory (RAM) or other type of dynamic storage device that can store information and instructions, an electrically erasable programmable read-only memory (EEPROM), a compact disc read-only memory (CD-ROM) or other optical disk storage, optical disk storage (including compact disc, laser disc, optical disc, digital versatile disc, blu-ray disc, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to these. The memory may be self-contained and connected to the processing unit by a bus. The memory may also be integrated with the processing unit.
The memory 203 is used for storing instructions for executing the disclosed solution, and is controlled by the processor 201. The processor 201 is configured to execute instructions stored in the memory 203 to implement the functions of the disclosed method.
In particular implementations, processor 201 may include one or more CPUs such as CPU0 and CPU1 in fig. 10 for one embodiment.
In one embodiment, the training device 20 for image classification models may include a plurality of processors, such as the processor 201 and the processor 207 in fig. 10. Each of these processors may be a single-core (single-CPU) processor or a multi-core (multi-CPU) processor. A processor herein may refer to one or more devices, circuits, and/or processing cores for processing data (e.g., computer program instructions).
In one embodiment, the training apparatus 20 for image classification model may further include an output device 205 and an input device 206. The output device 205 is in communication with the processor 201 and may display information in a variety of ways. For example, the output device 205 may be a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display device, a Cathode Ray Tube (CRT) display device, a projector (projector), or the like. The input device 206 is in communication with the processor 201 and can accept user input in a variety of ways. For example, the input device 206 may be a mouse, a keyboard, a touch screen device, or a sensing device, among others.
Those skilled in the art will appreciate that the configuration shown in fig. 10 does not constitute a limitation of the training apparatus 20 for the image classification model, and may include more or fewer components than those shown, or combine certain components, or adopt a different arrangement of components.
In addition, the present disclosure also provides a computer-readable storage medium including instructions, which when executed by an electronic device, cause the electronic device to perform the training method of the image classification model provided in the above embodiment.
In addition, the present disclosure also provides a computer program product including instructions that, when executed by an electronic device, cause the electronic device to perform the training method of the image classification model provided in the above embodiments.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims (10)

1. A training method of an image classification model is characterized by comprising the following steps:
acquiring a plurality of pictures to be identified;
respectively inputting the multiple pictures to be recognized into an initial image classification model to obtain classification results corresponding to the multiple pictures to be recognized;
based on the classification result, acquiring respective classification judgment information of the multiple pictures to be recognized, wherein the classification judgment information of each picture to be recognized is used for representing whether the classification result corresponding to each picture to be recognized is correct or not;
determining target loss based on respective classification judgment information of at least two pictures to be recognized with the same classification result and respective initial characteristics of the at least two pictures to be recognized, wherein the initial characteristics of each picture to be recognized in the at least two pictures to be recognized are obtained by inputting each picture to be recognized into the initial image classification model and then performing characteristic recognition;
and iteratively updating parameters of the initial image classification model based on the target loss to obtain a target image classification model.
2. The method for training an image classification model according to claim 1, wherein the classification discrimination information of each picture to be recognized includes a classification correctness or a classification error, and the obtaining the classification discrimination information of each of the pictures to be recognized based on the classification result includes:
inputting a target picture into the initial image classification model, and performing feature recognition to obtain initial features of the target picture, wherein the target picture is a picture with the corresponding real result identical to the classification result corresponding to a first picture to be recognized, and the first picture to be recognized is one of the pictures to be recognized;
and when the similarity between the initial feature of the first to-be-identified picture and the initial feature of the target picture is greater than or equal to a similarity threshold value, acquiring first classification judgment information, wherein the first classification judgment information is used for representing that the classification judgment information of the first to-be-identified picture is correct.
3. The method of training an image classification model according to claim 2, the method further comprising:
and when the similarity between the initial feature of the first to-be-identified picture and the initial feature of the target picture is smaller than the similarity threshold, acquiring second classification judgment information, wherein the second classification judgment information is used for representing that the classification judgment information of the first to-be-identified picture is a classification error.
4. The method of training an image classification model according to claim 1, the method further comprising:
obtaining a first loss, wherein the first loss is used for representing the inconsistency degree between the real result of each identified picture in at least one identified picture and the predicted result of each identified picture in the initial image classification model;
the determining the target loss based on the respective classification judgment information of the at least two pictures to be recognized with the same classification result and the respective initial features of the at least two pictures to be recognized comprises the following steps:
determining a second loss according to the respective classification judgment information of the at least two pictures to be recognized and the respective initial characteristics of the at least two pictures to be recognized;
determining a third loss according to the respective classification judgment information of the at least two pictures to be recognized;
determining a sum of the first loss, the second loss, and the third loss as the target loss.
5. The method for training an image classification model according to claim 4, wherein the determining a second loss according to the classification discrimination information of each of the at least two pictures to be recognized and the initial features of each of the at least two pictures to be recognized comprises:
determining a distance function between an initial feature of a first to-be-identified picture and an initial feature of a second to-be-identified picture, wherein the distance function is used for representing the degree of inconsistency between the initial feature of the first to-be-identified picture and the initial feature of the second to-be-identified picture, the first to-be-identified picture is one of the at least two to-be-identified pictures, and the second to-be-identified picture is a picture other than the first to-be-identified picture in the at least two to-be-identified pictures;
and when the classification judgment information of the first picture to be recognized is the same as the classification judgment information of the second picture to be recognized, determining the distance function as the second loss.
6. The method of training an image classification model according to claim 5, characterized in that the method further comprises:
and when the classification discrimination information of the first picture to be recognized is different from the classification discrimination information of the second picture to be recognized, determining a difference value between a preset constant and the distance function as a second loss.
7. An apparatus for training an image classification model, comprising: the device comprises an acquisition module, a processing module and a determination module;
the acquisition module is configured to acquire a plurality of pictures to be identified;
the processing module is configured to input the multiple pictures to be recognized into an initial image classification model respectively so as to obtain classification results corresponding to the multiple pictures to be recognized;
the obtaining module is further configured to obtain respective classification judgment information of the multiple pictures to be recognized based on the classification result, wherein the classification judgment information of each picture to be recognized is used for representing whether the classification result corresponding to each picture to be recognized is correct or not;
the determining module is further configured to determine a target loss based on respective classification discrimination information of at least two pictures to be recognized with the same classification result and respective initial features of the at least two pictures to be recognized, wherein the initial features of each picture to be recognized in the at least two pictures to be recognized are obtained by performing feature recognition after each picture to be recognized is input into the initial image classification model;
the processing module is further configured to iteratively update parameters of the initial image classification model based on the target loss, resulting in a target image classification model.
8. An electronic device, characterized in that the electronic device comprises:
a processor;
a memory configured to store the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the method of training the image classification model according to any of claims 1-6.
9. A computer-readable storage medium having instructions stored thereon, wherein the instructions in the computer-readable storage medium, when executed by an electronic device, enable the electronic device to perform the method of training an image classification model according to any of claims 1-6.
10. A computer program product, characterized in that the computer program product comprises computer instructions which, when run on an electronic device, cause the electronic device to perform the method of training an image classification model according to any of claims 1-6.
CN202111575684.4A 2021-12-21 2021-12-21 Training method and device for image classification model, electronic equipment and storage medium Pending CN114332529A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111575684.4A CN114332529A (en) 2021-12-21 2021-12-21 Training method and device for image classification model, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111575684.4A CN114332529A (en) 2021-12-21 2021-12-21 Training method and device for image classification model, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114332529A true CN114332529A (en) 2022-04-12

Family

ID=81054012

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111575684.4A Pending CN114332529A (en) 2021-12-21 2021-12-21 Training method and device for image classification model, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114332529A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115035463A (en) * 2022-08-09 2022-09-09 阿里巴巴(中国)有限公司 Behavior recognition method, device, equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115035463A (en) * 2022-08-09 2022-09-09 阿里巴巴(中国)有限公司 Behavior recognition method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN109583332B (en) Face recognition method, face recognition system, medium, and electronic device
WO2019233421A1 (en) Image processing method and device, electronic apparatus, and storage medium
US20190087490A1 (en) Text classification method and apparatus
EP3905126A2 (en) Image clustering method and apparatus
WO2020253127A1 (en) Facial feature extraction model training method and apparatus, facial feature extraction method and apparatus, device, and storage medium
CN113222942A (en) Training method of multi-label classification model and method for predicting labels
EP3620982A1 (en) Sample processing method and device
WO2020090413A1 (en) Classification device, classification method, and classification program
US10162879B2 (en) Label filters for large scale multi-label classification
WO2022105121A1 (en) Distillation method and apparatus applied to bert model, device, and storage medium
CN114882321A (en) Deep learning model training method, target object detection method and device
CN111126347A (en) Human eye state recognition method and device, terminal and readable storage medium
EP4343616A1 (en) Image classification method, model training method, device, storage medium, and computer program
CN114332529A (en) Training method and device for image classification model, electronic equipment and storage medium
CN113657249B (en) Training method, prediction method, device, electronic equipment and storage medium
WO2021174814A1 (en) Answer verification method and apparatus for crowdsourcing task, computer device, and storage medium
WO2018166499A1 (en) Text classification method and device, and storage medium
CN110826616B (en) Information processing method and device, electronic equipment and storage medium
CN113159188A (en) Model generation method, device, equipment and storage medium
CN111353867A (en) Learning rate adjusting method, device, equipment and readable storage medium
JP2023152270A (en) Data labeling method by artificial intelligence, apparatus, electronic device, storage medium, and program
CN116468479A (en) Method for determining page quality evaluation dimension, and page quality evaluation method and device
CN115470900A (en) Pruning method, device and equipment of neural network model
CN114416990B (en) Method and device for constructing object relation network and electronic equipment
CN113159318B (en) Quantification method and device of neural network, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination