CN116630694A - Target classification method and system for partial multi-label images and electronic equipment - Google Patents
Target classification method and system for partial multi-label images and electronic equipment Download PDFInfo
- Publication number
- CN116630694A CN116630694A CN202310544125.XA CN202310544125A CN116630694A CN 116630694 A CN116630694 A CN 116630694A CN 202310544125 A CN202310544125 A CN 202310544125A CN 116630694 A CN116630694 A CN 116630694A
- Authority
- CN
- China
- Prior art keywords
- iteration
- mark
- label
- determining
- partial
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 48
- 230000006870 function Effects 0.000 claims description 28
- 238000012549 training Methods 0.000 claims description 27
- 239000003550 marker Substances 0.000 claims description 16
- 239000013598 vector Substances 0.000 claims description 11
- 238000004590 computer program Methods 0.000 claims description 6
- 238000013434 data augmentation Methods 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 3
- 238000012512 characterization method Methods 0.000 description 5
- 238000013145 classification model Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000011156 evaluation Methods 0.000 description 3
- 238000002372 labelling Methods 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 230000003190 augmentative effect Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 230000004931 aggregating effect Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000003416 augmentation Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
- G06V10/765—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects using rules for classification or partitioning the feature space
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/70—Labelling scene content, e.g. deriving syntactic or semantic representations
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a target classification method, a target classification system and electronic equipment for partial multiple marked images, and relates to the technical field of target classification of partial multiple marked images. The method comprises the steps of obtaining a partial multi-mark image to be identified; inputting the partial marked images to be identified into a related mark determining model, and determining related marks of all targets in the partial marked images to be identified; determining the types of all targets in the partial marked images to be identified according to the plurality of relevant marks; the relevant marks are in one-to-one correspondence with the target types. According to the method, the classifier is trained by using the contrast label disambiguation principle to obtain the multi-label determining model capable of accurately identifying the relevant labels in the unseen images, so that the accuracy of image classification is improved.
Description
Technical Field
The invention relates to the technical field of target classification of partial marked images, in particular to a target classification method, a target classification system and electronic equipment of partial marked images.
Background
Multi-label image classification aims to address image classification problems, where each image is associated with multiple label information. The problem of multi-label image classification is currently of great concern. However, multi-label image classification relies on accurate labeling of data, which is extremely difficult to achieve in real-world scenarios with limited resources. In order to alleviate the labeling pressure, a mode is adopted currently that a plurality of candidate labels marked by non-professional labeling personnel are assigned to each image, and the candidate labels comprise relevant marks which are beneficial to image classification and noise marks. The way in which candidate tag sets of such images are used for learning is defined as a problem of object classification for more marked images.
The meta-tag learning problem includes two types: the first type is to assign a confidence level to each candidate tag, and iteratively update the tag confidence level and classification model parameters in training. For example, the paper Partial Multi-Label Learning proposes optimizing the Label ranking confidence matrix by considering the Label correlation and feature prototypes, respectively, in training the classifier. The paper Feature-Induced Partial Multi-label Learning introduces a Feature-induced partial multi-label approach by considering the low rank nature of labels and Feature space. The paper Partial Multi-Label Learning with Meta Disambiguation proposes performance-adaptive estimation of confidence levels for each candidate tag on a validation set by iteratively minimizing the confidence weighted ranking loss and using a model to achieve disambiguation. The second type is a two-stage training method, which takes reliable labels from a set of candidate labels and then trains a multi-label classifier using these reliable labels. For example, the paper Discriminative and Correlative Partial Multi-Label Learning induces high-confidence labels by using characteristic manifold, and further trains a multi-Label classifier. Paper Partial Multi-Label Learning via Credible Label Elicitation induces classifiers by extracting reliable labels using an iterative label propagation strategy. In terms of the patent, the Chinese patent application No. 202010412162.1 provides a multi-label learning method based on multi-subspace representation; the Chinese patent application with the application number of 202010412161.7 provides a multi-label learning method based on noise tolerance; the Chinese patent application with the application number of 202111369388.9 provides a patient screening label method based on partial multi-label learning; the Chinese patent application with the application number of 202010411579.6 provides a partial multi-label learning method based on global and local label relation; the Chinese patent application with the application number of 202110717550.5 provides a multi-label learning method based on complementary label cooperative training; the Chinese patent application with the application number of 202010411580.9 provides a partial multi-label learning method for characteristic information with noise, which utilizes the ideas of low rank and sparse decomposition to recover the correct characteristic information and effectively reduces the influence of the noise characteristic information. Although these conventional methods have made remarkable progress, they are generally studied based on manual features, and when they are faced with the problem of object classification of more marked images, they are weak in characterization ability and tag correction ability, and cannot achieve a good tag disambiguation effect.
Disclosure of Invention
The invention aims to provide a target classification method, a target classification system and electronic equipment for partial multi-mark images, which are used for training a classifier by utilizing a contrast label disambiguation principle to obtain a related mark determination model capable of accurately identifying related marks in the partial multi-mark images which are not seen, so that the accuracy of image classification is improved.
In order to achieve the above object, the present invention provides the following solutions:
a method of object classification for a partially labeled image, comprising:
acquiring a partial marked image to be identified; the partial marked image to be identified at least comprises 1 target;
inputting the partial marked images to be identified into a related mark determining model, and determining related marks of all targets in the partial marked images to be identified; the relevant mark determining model is obtained by training a classifier according to a plurality of partial multi-mark historical images by utilizing a contrast label disambiguation principle;
determining the types of all targets in the partial marked image to be identified according to the plurality of related marks; the related marks are in one-to-one correspondence with the target types.
Optionally, before acquiring the image to be classified, the method further includes:
acquiring a plurality of partial multi-mark historical images; the multi-mark historical image is marked with a plurality of marks; the types of the marks are related marks or noise marks;
performing random data augmentation processing on the partial multi-mark historical image to obtain a query view and a key view of the partial multi-mark historical image;
determining label level embedding under a query view and label level embedding under a key view; the label level embedding corresponds to a plurality of marks on the partial multi-mark historical image one by one;
and training the classifier according to the label level embedding under the query view and the label level embedding under the key view by utilizing a contrast label disambiguation principle to obtain the relevant mark determining model.
Optionally, after determining the tag level embedding under the query view and the tag level embedding under the key view, the method further includes:
determining the positive and negative of a plurality of tag level embedding under the query view;
the positivity of the multiple tag level embedding under the key view is determined.
Optionally, training the classifier according to the label level embedding under the query view and the label level embedding under the key view by using a contrast label disambiguation principle to obtain the relevant label determining model, including:
determining the classifier as the classifier at the 0 th iteration;
acquiring an initial prototype of each mark in the classifier as a prototype in the 0 th iteration;
acquiring an initial negative prototype of each mark in the classifier as a negative prototype at the 0 th iteration;
let the first iteration number i=1;
let the second iteration number j=1;
determining any tag level embedding under any query-based view as a current tag level embedding;
updating a positive prototype in the ith-1 th iteration and a negative prototype in the ith-1 th iteration according to the positive and negative polarities of the current label level embedding;
calculating the similarity between the current label level embedding and the updated positive prototype in the i-1 th iteration as a first similarity;
calculating the similarity of the current label level embedding and the negative prototype in the i-1 th iteration after updating as second similarity;
determining a label vector which is embedded in the current label level and predicted according to the prototype according to the first similarity and the second similarity;
updating the pseudo mark of the corresponding mark embedded in the current tag level according to the mark vector to obtain the pseudo mark of the corresponding mark embedded in the current tag level in the j-th iteration;
increasing the value of the second iteration number j by 1, updating the current label level embedding outside the current label level embedding under the same query view through the current label level embedding, and returning to the step of updating the positive prototype in the ith-1 iteration and the negative prototype in the ith-1 iteration according to the positive and negative polarities of the current label level embedding until the second iteration number reaches a second iteration number threshold;
embedding a plurality of current tag levels under the corresponding query view into a classifier in the i-1 th iteration to obtain a plurality of category outputs;
according to the category outputs and the pseudo marks of the corresponding marks embedded in the current tag level in multiple iterations, determining a category loss function in the i-1 th iteration;
judging whether the classification loss function is smaller than a classification loss function threshold value or not to obtain a first judgment result;
if the first judgment result is negative, updating parameters of the classifier in the ith-1 th iteration to obtain the classifier in the ith iteration, increasing the value of the first iteration number i by 1, and returning to the step of enabling the second iteration number j=1;
if the first judgment result is yes, judging whether the first iteration number reaches a first iteration number threshold value or not, and obtaining a second judgment result;
if the second judgment result is negative, determining that the classifier in the ith-1 th iteration is the classifier in the ith iteration, increasing the value of the first iteration number i by 1, and returning to the step of enabling the second iteration number j=1;
and if the second judgment result is yes, determining a classifier in the i-1 th iteration to determine a model for the relevant mark.
Optionally, before determining that the classifier at the i-1 th iteration determines the model for the relevant marker, the method further includes:
determining label level embedding under a query view and label level embedding under a key view as an embedding pool; the embedded pool further comprises a tag level embedded in the momentum tag level embedded queue;
determining any tag level embedded with positive polarity under the query view as the current positive tag level embedded;
determining positive tag level embedding of the same mark as the positive tag level embedding in the embedding pool as a positive sample set corresponding to the current positive tag level embedding;
determining that samples in the positive sample set corresponding to the current positive tag level embedding and the current positive tag level embedding form a plurality of positive sample pairs;
determining a contrast loss function of the corresponding partial marked historical images according to a plurality of positive sample pairs under the same query view;
judging whether the comparison loss functions are smaller than the comparison loss function threshold value or not to obtain a third judgment result;
if the third judgment result is negative, updating parameters of the classifier in the i-1 th iteration to obtain the classifier in the 0 th iteration, and returning to the step of enabling the first iteration number i=1;
and if the third judgment result is yes, calling a step of determining a model for the relevant mark by the classifier in the i-1 th iteration.
A target classification system for a partial multiple label image, comprising:
the to-be-identified partial multi-mark image acquisition module is used for acquiring to-be-identified partial multi-mark images; the partial marked image to be identified at least comprises 1 target;
the related mark identification module is used for inputting the partial marked images to be identified into the related mark determination model and determining the related marks of all targets in the partial marked images to be identified; the relevant mark determining model is obtained by training a classifier according to a plurality of partial multi-mark historical images by utilizing a contrast label disambiguation principle;
the target type determining module is used for determining the types of all targets in the partial marked images to be identified according to the plurality of related marks; the related marks are in one-to-one correspondence with the target types.
An electronic device, optionally comprising a memory for storing a computer program and a processor, the processor running the computer program to cause the electronic device to perform the method of object classification of a partial multiple marker image.
Optionally, the memory is a readable storage medium.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
the invention provides a target classification method, a target classification system and electronic equipment for partial multiple mark images, wherein the partial multiple mark images to be identified are obtained; inputting the partial marked images to be identified into a related mark determining model, and determining related marks of all targets in the partial marked images to be identified; determining the types of all targets in the partial marked images to be identified according to the plurality of relevant marks; the relevant marks are in one-to-one correspondence with the target types. According to the invention, the classifier is trained by using the contrast label disambiguation principle to obtain the related label determining model capable of accurately identifying the related label in the invisible image, so that the accuracy of image classification is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the drawings that are needed in the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a method for classifying objects of a partial multiple mark image according to embodiment 1 of the present invention;
fig. 2 is a frame diagram of a CPLD model according to embodiment 2 of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The invention aims to provide a target classification method, a target classification system and electronic equipment for partial multi-mark images, which are used for training a classifier by utilizing a contrast label disambiguation principle to obtain a related mark determination model capable of accurately identifying related marks in the partial multi-mark images which are not seen, so that the accuracy of image classification is improved.
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.
Example 1
As shown in fig. 1, the present embodiment provides a target classification method for a partial multiple label image, including:
step 101: and acquiring the partial marked images to be identified.
The partial marked image to be identified at least comprises 1 target.
Step 102: and inputting the partial marked images to be identified into a related mark determining model, and determining the related marks of all targets in the partial marked images to be identified.
The relevant mark determining model is obtained by training the classifier according to a plurality of partial multi-mark historical images by utilizing a contrast label disambiguation principle.
Step 103: determining the types of all targets in the partial marked images to be identified according to the plurality of relevant marks; the relevant marks are in one-to-one correspondence with the target types.
Prior to step 101, further comprising:
step 104: acquiring a plurality of partial multi-mark historical images; the more marked historical images are marked with various marks. The type of the mark is a correlation mark or a noise mark.
Step 105: and carrying out random data augmentation processing on the partial multi-mark historical image to obtain a query view and a key view of the partial multi-mark historical image.
Step 106: determining tag level embedding under the query view and tag level embedding under the key view. The label level embedding corresponds to a plurality of marks on the partial mark historical image one by one.
Step 107: and training the classifier according to the label level embedding under the query view and the label level embedding under the key view by utilizing a contrast label disambiguation principle to obtain a relevant mark determining model.
After step 106, further includes:
step 108: the positive and negative of the embedding of the plurality of tag levels under the query view are determined.
Step 109: the positivity of the multiple tag level embedding under the key view is determined.
Step 107, including:
step 1071: the classifier is determined to be the classifier at iteration 0.
Step 1072: the initial positive prototype for each marker in the classifier was obtained as the positive prototype at iteration 0.
Step 1073: the initial negative prototype for each marker in the classifier is obtained as the negative prototype at iteration 0.
Step 1074: let the first iteration number i=1.
Step 1075: let the second iteration number j=1.
Step 1076: determining any tag level embedding under any query-based view as the current tag level embedding.
Step 1077: and updating the positive prototype in the ith-1 th iteration and the negative prototype in the ith-1 th iteration according to the positive and negative polarities of the current label level embedding.
Step 1078: and calculating the similarity between the current label level embedding and the original model at the i-1 th iteration after updating as a first similarity.
Step 1079: and calculating the similarity of the current label level embedding and the negative prototype at the i-1 th iteration after updating as second similarity.
Step 10710: based on the first similarity and the second similarity, it is determined that the current tag level embeds a marker vector predicted from the prototype.
Step 10711: and updating the pseudo mark of the corresponding mark embedded in the current label level according to the mark vector to obtain the pseudo mark of the corresponding mark embedded in the current label level in the j-th iteration.
Step 10712: and increasing the value of the second iteration number j by 1, updating the current tag level embedding to the current tag level embedding except the current tag level embedding in the same query view, and returning to the step 1077 until the second iteration number reaches a second iteration number threshold.
Step 10713: and embedding a plurality of current tag levels under the corresponding query view into the classifier in the i-1 th iteration to obtain a plurality of class outputs.
Step 10714: and determining a classification loss function in the i-1 th iteration according to the multiple class outputs and the pseudo marks of the corresponding marks embedded in the current tag level in the multiple iterations.
Step 10715: judging whether the classification loss function is smaller than a classification loss function threshold value or not to obtain a first judgment result; if the first determination result is no, executing step 10716; if the first determination result is yes, step 10717 is performed.
Step 10716: updating parameters of the classifier in the i-1 th iteration to obtain the classifier in the i-1 th iteration, increasing the value of the first iteration number i by 1, and returning to the step 1075.
Step 10717: judging whether the first iteration number reaches a first iteration number threshold value or not to obtain a second judgment result; if the second determination result is no, executing step 10718; if the second determination result is yes, step 10719 is executed.
Step 10718: determining that the classifier in the i-1 th iteration is the classifier in the i-th iteration, increasing the value of the first iteration number i by 1, and returning to the step 1075.
Step 10719: the classifier at the time of determining the i-1 th iteration determines a model for the relevant markers.
Prior to step 10719, further comprising:
step 10720: determining label level embedding under a query view and label level embedding under a key view as an embedding pool; the embedding pool also includes tag level embedding in the momentum tag level embedding queue.
Step 10721: and determining any tag level embedded with positive polarity under the query view as the current positive tag level embedded.
Step 10722: and determining the positive label level embedding of the same mark as the positive label level embedding in the embedding pool as a positive sample set corresponding to the current positive label level embedding.
Step 10723: the method comprises the steps of determining that samples in a positive sample set corresponding to current positive tag level embedding and current positive tag level embedding form a plurality of positive sample pairs.
Step 10724: and determining a contrast loss function of the corresponding partial marked historical images according to a plurality of positive sample pairs in the same query view.
Step 10725: judging whether the multiple contrast loss functions are smaller than the contrast loss function threshold value or not to obtain a third judging result; if the third determination result is no, executing step 10726; if the third determination result is yes, step 10727 is executed.
Step 10726: updating parameters of the classifier in the i-1 th iteration to obtain the classifier in the 0 th iteration, and returning to the step 1074.
Step 10727: if the third determination result is yes, step 10719 is invoked.
Example 2
As shown in fig. 2, the object classification method for the partial multiple marker image provided in this embodiment is composed of two parts: the learning module is compared with the label disambiguation module based on the prototype. The method uses the two modules to construct a collaborative system framework: wherein contrast learning aims at obtaining high-quality characterization, label disambiguation based on prototypes utilizes prototypes with improved high-quality characterization learned by contrast learning, and then updates pseudo-labels to guide the prediction results of the model to help contrast learning establish more accurate positive sample pairs. Meanwhile, the invention uses a two-stage training strategy to enable the contrast learning technology to be more reasonably applied to the invention. The two modules are mutually dependent and cooperate, and along with the progress of training, the model can gradually update the confidence coefficient of the marks, extract relevant marks and reduce the attention to noise marks. The method comprises the following specific steps:
step 1: the partial mark image training data is used as input. Definition of the definitionAnd->Feature space and tag space, respectively, where K represents the number of tags of interest. Training data set->Consists of n samples, whereinRepresents the i-th image of the observation, +.>Representing candidate tag vectors corresponding to the ith image, y i,j =1 means that the label j is a label of the i-th image, and vice versa.
Step 2: acquiring an augmented view of an image:
for simplicity of expression, the present invention omits index i. For the input image x, the invention uses two image augmentation modes to respectively obtain the query view Aug q (x) And key view Aug k (x)。
The method uses a SimAugment data augmentation mode in paper Supervised contrastive learning for the query network, and uses paper Randaugment for the key network: practical Automated Data Augmentation With a Reduced Search Space) in randasegment data augmentation mode.
Step 3: obtaining tag-level embedding under two views:
for the query view, enc (Aug q (x))∈R d*h*w Where d, h, w represent the dimensions of the encoder output, the height and width of the feature map, respectively. Then the feature map is reduced to K through 1*1 convolution to obtain the feature map corresponding to each category, and the category feature map is flattened to the directionThe vector is projected to the contrast space required by the back by inputting a projection head (projection head) after the vector is measured, and the output g (Aug) of the query network g (-) is obtained q (x))∈R K *D Where D is the dimension of the contrast space. For this purpose, the feature map at the image level is decoupled into tag-level embedding q for K D dimensions j ∈R 1*D Each tag level embedding can be seen as a kind of token vector of the image in the corresponding tag context, containing the feature information of the corresponding class. The key network g '(. Cndot.) is the result of the parametric momentum moving average of the query network, the resulting g' (Aug) k (x))∈R K*D In preparation for subsequent use of contrast learning, each row k therein j ∈R 1*D j.epsilon.1..K.. And q j Similarly, the representation key network embeds the tag level resulting from decoupling the image level representation.
Step 4: obtaining high quality embedded characterizations using contrast learning:
the first step: and judging the positive and negative of the label level embedding.
Aug for the augmented image described above q (x) Through classifierObtain the output f (Aug) q (x))∈[0,1] K Wherein f (Aug) q (x) Is the output of the Sigmoid activation function.
UsingAnd judging the embedded positive and negative of each label level. Wherein if->The j-th tag level embedding is judged to be a positive tag level embedding and vice versa. Aggregating these label level embeddings results in a positive/negative label level embedment set PE (x)/NE (x) of the sample image. Wherein alpha is a super parameter, b j And->The reference probability for class j and the average reference probability for all classes, respectively.
And a second step of: a positive sample set is constructed for each positive tag level embedding.
The positive label level embedding obtained by the query network and the key network is combined with the history embedding in the embedded queue to construct an embedded pool A=B q ∪B k U is queue, and A (q j )=A\{q j }, wherein B is q And B k Representing positive label-level embedding of all sample images in the current batch corresponding to the query network and the key network. The invention uses a queue to reserve positive tag-level embedding of the latest batch of samples obtained by the key network.
Using P (q) j )={k j |k j ∈A j (q j ) Positive tag level embedding q j WhereinRepresents A (q) j ) The tag level corresponding to all tags j is embedded. The above equation shows that each positive sample set of positive tag level embeddings is the same class of other positive tag level embeddings in the embedded pool.
And a third step of: contrast loss is established.
And (3) embedding all positive tag levels into samples in a positive sample set to form positive sample pairs, and establishing contrast loss to obtain high-quality tag level embedded characterization.
The contrast loss function for a single image sample is calculated as follows:
where τ represents a temperature parameter.
Step 5: prototype-based tag disambiguation:
for each class c e {1,..k }, the invention uses a positive prototypeAnd a negative prototype->Respectively representing class c representative positive/negative label level embedded features.
The first step: the positive/negative prototype is updated.
In the current lot, the positive/negative prototypes of the corresponding class are updated using positive/negative label level embedding of the samples. The update formula is as follows:
and a second step of: the pseudo tag is updated.
And calculating the similarity between the label level embedding of the sample and the improved positive/negative prototype to obtain a label vector z predicted by the prototype. The pseudo-mark s is updated stepwise using a moving average.
Where φ ε (0, 1) is the normal amount. The above expression indicates that if a tag level of a tag is embedded in q c After similarity to the corresponding positive/negative prototype, the signature is considered as the relevant signature for image x, as it is found to be more similar to the positive prototype. As model training proceeds, predictions of label-level embedding by corresponding prototypes will be increasingly consistent. Thus, the pseudo tag of the relevant tag will gradually stabilize on 1, while the pseudo tag of the irrelevant tag will smoothly approach 0.
And a third step of: the output of the classifier is used with the pseudo-labels to construct a classification penalty.
The updated pseudo-labels are used to build classification losses with the output of the classifier, and the model classifier is guided in its predictions by means of improved pseudo-labels after improvement of the prototype by contrast learning. The classification loss calculation formula is as follows:
step 6: and combining the classification loss and the comparison loss to form a total loss function as an objective function for training the optimized neural network. The total loss function is calculated as follows:
where λ is an adjustable hyper-parameter.
Step 7: two-stage training strategy was used:
the two-stage training strategy consists of a pre-disambiguation stage and a contrast disambiguation stage.
Pre-disambiguation stage: removing the contrast learning branches, i.e. using only the query network and prototype disambiguation strategy, using onlyAs an objective function of the network.
Contrast disambiguation stage: the whole system is trained using the total loss function.
Step 8: multi-label prediction of unseen samples is performed on test data using a threshold δ:
for the unseen sample x, its associated signature predicts that
That is, for the unseen instance x, the classifier predictive probability is compared with a threshold δ to determine whether the marker is a relevant marker. The model is evaluated using a multi-labeled evaluation index.
Experiments were performed on the mainstream multi-marker image classification dataset VOC 2007. The VOC2007 dataset contained images from 20 target categories, with each image containing an average of 2.5 categories of targets. The VOC2007 data set comprises a training data set consisting of 5011 images and a test data set consisting of 4952 images.
To construct a partial marker dataset, the present invention makes the average size of the marker set a ratio q of the number of markers in the overall marker space. Q was taken to be 0.1 for the VOC2007 dataset in the experiment.
Following many studies available, the present invention uses mAP, OF1, CF1 as an evaluation index. The mAP is also called overall average precision or average precision mean value, and is obtained by comprehensively weighted average of average Accuracy (AP) of all types of detection. CF1 and OF1 take into account both overall and per class recall and precision. Therefore, these three metrics are the most important and representative evaluation metrics among all metrics.
To verify the effectiveness of the present invention, the present invention uses the binary cross entropy loss BCE (binary cross entropy) to train directly on the partial multi-label image dataset to construct a baseline method. In addition, two advanced Multi-label image classification methods ASL (published in Asymmetric loss for Multi-label classification) and ML-GCN (published in Multi-Label Image Recognition with Graph Convolutional Networks) were added as two other baseline comparison methods.
For comparison consistency, following ASL study, the invention adopts pre-trained Tresbet L Tresbet: high performance gpu-dedicated architecture on ImageNet as the main frame of the invention, and 224 x 224 as the resolution of the input image. Let α=0.8 for the VOC2007 dataset. The prototype is updated using γ=0.99, which the present invention linearly drops from 0.95 to 0.8 for the constant Φ of the pseudo-marker update. The temperature parameter τ=0.2, the weight factor λ=0.1 of the loss function, and the invention is updated with an exponential sliding average (EMA) for the model parameters, and the decay parameter is 0.9997..
The proposed method is named CPLD (Contrastive Prototype-based Label Disambiguation), and the comparison results are shown in Table 1. It can be seen that the method of the present invention outperforms other methods in performance over the VOC2007 data set, which demonstrates the effectiveness of the method of the present invention.
Table 1 model classification results comparison table
Method\index | mAP | CF1 | OF1 |
BCE | 88.37 | 82.17 | 84.42 |
ML-GCN | 80.31 | 68.13 | 72.81 |
ASL | 87.88 | 79.78 | 81.62 |
CPLD | 89.79 | 82.68 | 85.33 |
The invention provides a target classification method of partial multi-mark images, which directly utilizes the training data of the partial multi-mark images to obtain a classification model capable of carrying out multi-mark prediction on unseen examples, thereby greatly reducing the marking cost.
In addition, the classification model in the invention uses a binary cross entropy Loss function (BCE), and can also use more advanced Loss calculation methods such as Focal Loss Focal Loss for Dense Object Detection or ASL Asymmetric Loss For Multi-Label Classification. In addition, for positive and negative judgment of label level embedding obtained by model decoupling, simple judgment of fixed threshold value for classifier prediction probability can be adopted, or prediction probability obtained by softmax can be obtainedThese modifications or other readily conceivable variations or alternatives are intended to be within the scope of the present invention as a threshold value.
Example 3
In order to perform the method corresponding to the above embodiment 1 to achieve the corresponding functions and technical effects, the following provides a target classification system for multi-labeled images, including:
the to-be-identified partial multi-mark image acquisition module is used for acquiring to-be-identified partial multi-mark images; the partial marked image to be identified at least comprises 1 target.
The related mark identification module is used for inputting the partial marked images to be identified into the related mark determination model and determining the related marks of all targets in the partial marked images to be identified; the relevant mark determining model is obtained by training the classifier according to a plurality of partial multi-mark historical images by utilizing a contrast label disambiguation principle.
The target type determining module is used for determining the types of all targets in the partial marked images to be identified according to the plurality of related marks; the relevant marks are in one-to-one correspondence with the target types.
Example 4
The present embodiment provides an electronic device, including a memory and a processor, where the memory is configured to store a computer program, and the processor is configured to execute the computer program to cause the electronic device to execute a method for classifying objects of partial multiple marker images described in embodiment 1. Wherein the memory is a readable storage medium.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the system disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
The principles and embodiments of the present invention have been described herein with reference to specific examples, the description of which is intended only to assist in understanding the methods of the present invention and the core ideas thereof; also, it is within the scope of the present invention to be modified by those of ordinary skill in the art in light of the present teachings. In view of the foregoing, this description should not be construed as limiting the invention.
Claims (8)
1. A method for classifying objects in a plurality of marked images, comprising:
acquiring a partial marked image to be identified; the partial marked image to be identified at least comprises 1 target;
inputting the partial marked images to be identified into a related mark determining model, and determining related marks of all targets in the partial marked images to be identified; the relevant mark determining model is obtained by training a classifier according to a plurality of partial multi-mark historical images by utilizing a contrast label disambiguation principle;
determining the types of all targets in the partial marked image to be identified according to the plurality of related marks; the related marks are in one-to-one correspondence with the target types.
2. The method of claim 1, further comprising, prior to acquiring the image to be classified:
acquiring a plurality of partial multi-mark historical images; the multi-mark historical image is marked with a plurality of marks; the types of the marks are related marks or noise marks;
performing random data augmentation processing on the partial multi-mark historical image to obtain a query view and a key view of the partial multi-mark historical image;
determining label level embedding under a query view and label level embedding under a key view; the label level embedding corresponds to a plurality of marks on the partial multi-mark historical image one by one;
and training the classifier according to the label level embedding under the query view and the label level embedding under the key view by utilizing a contrast label disambiguation principle to obtain the relevant mark determining model.
3. The method of claim 2, further comprising, after determining the tag level embedding in the query view and the tag level embedding in the key view:
determining the positive and negative of a plurality of tag level embedding under the query view;
the positivity of the multiple tag level embedding under the key view is determined.
4. A method for classifying objects in a partial multi-label image according to claim 3, wherein the training of the classifier by using the principle of contrast label disambiguation according to the label level embedding in the query view and the label level embedding in the key view to obtain the relevant label determination model comprises:
determining the classifier as the classifier at the 0 th iteration;
acquiring an initial prototype of each mark in the classifier as a prototype in the 0 th iteration;
acquiring an initial negative prototype of each mark in the classifier as a negative prototype at the 0 th iteration;
let the first iteration number i=1;
let the second iteration number j=1;
determining any tag level embedding under any query-based view as a current tag level embedding;
updating a positive prototype in the ith-1 th iteration and a negative prototype in the ith-1 th iteration according to the positive and negative polarities of the current label level embedding;
calculating the similarity between the current label level embedding and the updated positive prototype in the i-1 th iteration as a first similarity;
calculating the similarity of the current label level embedding and the negative prototype in the i-1 th iteration after updating as second similarity;
determining a label vector which is embedded in the current label level and predicted according to the prototype according to the first similarity and the second similarity;
updating the pseudo mark of the corresponding mark embedded in the current tag level according to the mark vector to obtain the pseudo mark of the corresponding mark embedded in the current tag level in the j-th iteration;
increasing the value of the second iteration number j by 1, updating the current label level embedding outside the current label level embedding under the same query view through the current label level embedding, and returning to the step of updating the positive prototype in the ith-1 iteration and the negative prototype in the ith-1 iteration according to the positive and negative polarities of the current label level embedding until the second iteration number reaches a second iteration number threshold;
embedding a plurality of current tag levels under the corresponding query view into a classifier in the i-1 th iteration to obtain a plurality of category outputs;
according to the category outputs and the pseudo marks of the corresponding marks embedded in the current tag level in multiple iterations, determining a category loss function in the i-1 th iteration;
judging whether the classification loss function is smaller than a classification loss function threshold value or not to obtain a first judgment result;
if the first judgment result is negative, updating parameters of the classifier in the ith-1 th iteration to obtain the classifier in the ith iteration, increasing the value of the first iteration number i by 1, and returning to the step of enabling the second iteration number j=1;
if the first judgment result is yes, judging whether the first iteration number reaches a first iteration number threshold value or not, and obtaining a second judgment result;
if the second judgment result is negative, determining that the classifier in the ith-1 th iteration is the classifier in the ith iteration, increasing the value of the first iteration number i by 1, and returning to the step of enabling the second iteration number j=1;
and if the second judgment result is yes, determining a classifier in the i-1 th iteration to determine a model for the relevant mark.
5. A method of object classification for a partial marker image in accordance with claim 3, further comprising, prior to determining that the classifier at the i-1 th iteration determines a model for said associated marker:
determining label level embedding under a query view and label level embedding under a key view as an embedding pool; the embedded pool further comprises a tag level embedded in the momentum tag level embedded queue;
determining any tag level embedded with positive polarity under the query view as the current positive tag level embedded;
determining positive tag level embedding of the same mark as the positive tag level embedding in the embedding pool as a positive sample set corresponding to the current positive tag level embedding;
determining that samples in the positive sample set corresponding to the current positive tag level embedding and the current positive tag level embedding form a plurality of positive sample pairs;
determining a contrast loss function of the corresponding partial marked historical images according to a plurality of positive sample pairs under the same query view;
judging whether the comparison loss functions are smaller than the comparison loss function threshold value or not to obtain a third judgment result;
if the third judgment result is negative, updating parameters of the classifier in the i-1 th iteration to obtain the classifier in the 0 th iteration, and returning to the step of enabling the first iteration number i=1;
and if the third judgment result is yes, calling a step of determining a model for the relevant mark by the classifier in the i-1 th iteration.
6. A target classification system for a partial multiple label image, comprising:
the to-be-identified partial multi-mark image acquisition module is used for acquiring to-be-identified partial multi-mark images; the partial marked image to be identified at least comprises 1 target;
the related mark identification module is used for inputting the partial marked images to be identified into the related mark determination model and determining the related marks of all targets in the partial marked images to be identified; the relevant mark determining model is obtained by training a classifier according to a plurality of partial multi-mark historical images by utilizing a contrast label disambiguation principle;
the target type determining module is used for determining the types of all targets in the partial marked images to be identified according to the plurality of related marks; the related marks are in one-to-one correspondence with the target types.
7. An electronic device comprising a memory for storing a computer program and a processor that runs the computer program to cause the electronic device to perform a method of object classification of a partial multiple mark image according to any one of claims 1 to 5.
8. The electronic device of claim 7, wherein the memory is a readable storage medium.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310544125.XA CN116630694A (en) | 2023-05-12 | 2023-05-12 | Target classification method and system for partial multi-label images and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310544125.XA CN116630694A (en) | 2023-05-12 | 2023-05-12 | Target classification method and system for partial multi-label images and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116630694A true CN116630694A (en) | 2023-08-22 |
Family
ID=87590462
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310544125.XA Pending CN116630694A (en) | 2023-05-12 | 2023-05-12 | Target classification method and system for partial multi-label images and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116630694A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117992835A (en) * | 2024-04-03 | 2024-05-07 | 安徽大学 | Multi-strategy label disambiguation partial multi-label classification method, device and storage medium |
-
2023
- 2023-05-12 CN CN202310544125.XA patent/CN116630694A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117992835A (en) * | 2024-04-03 | 2024-05-07 | 安徽大学 | Multi-strategy label disambiguation partial multi-label classification method, device and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113378632B (en) | Pseudo-label optimization-based unsupervised domain adaptive pedestrian re-identification method | |
CN110516095B (en) | Semantic migration-based weak supervision deep hash social image retrieval method and system | |
CN112381098A (en) | Semi-supervised learning method and system based on self-learning in target segmentation field | |
CN111079847B (en) | Remote sensing image automatic labeling method based on deep learning | |
CN110909820A (en) | Image classification method and system based on self-supervision learning | |
CN111127364B (en) | Image data enhancement strategy selection method and face recognition image data enhancement method | |
CN109743642B (en) | Video abstract generation method based on hierarchical recurrent neural network | |
CN112966135B (en) | Image-text retrieval method and system based on attention mechanism and gate control mechanism | |
CN110889865B (en) | Video target tracking method based on local weighted sparse feature selection | |
CN112312541A (en) | Wireless positioning method and system | |
CN107346327A (en) | The zero sample Hash picture retrieval method based on supervision transfer | |
CN117237733A (en) | Breast cancer full-slice image classification method combining self-supervision and weak supervision learning | |
CN115439685A (en) | Small sample image data set dividing method and computer readable storage medium | |
CN113065409A (en) | Unsupervised pedestrian re-identification method based on camera distribution difference alignment constraint | |
CN114926693A (en) | SAR image small sample identification method and device based on weighted distance | |
CN111582371A (en) | Training method, device, equipment and storage medium for image classification network | |
CN108470025A (en) | Partial-Topic probability generates regularization own coding text and is embedded in representation method | |
CN116630694A (en) | Target classification method and system for partial multi-label images and electronic equipment | |
CN110442736B (en) | Semantic enhancer spatial cross-media retrieval method based on secondary discriminant analysis | |
CN114187506B (en) | Remote sensing image scene classification method of viewpoint-aware dynamic routing capsule network | |
CN113723572A (en) | Ship target identification method, computer system, program product and storage medium | |
CN112465016A (en) | Partial multi-mark learning method based on optimal distance between two adjacent marks | |
CN112613474A (en) | Pedestrian re-identification method and device | |
CN116433909A (en) | Similarity weighted multi-teacher network model-based semi-supervised image semantic segmentation method | |
CN114299342B (en) | Unknown mark classification method in multi-mark picture classification based on deep learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |