CN115170813A - Network supervision fine-grained image identification method based on partial label learning - Google Patents
Network supervision fine-grained image identification method based on partial label learning Download PDFInfo
- Publication number
- CN115170813A CN115170813A CN202210761418.9A CN202210761418A CN115170813A CN 115170813 A CN115170813 A CN 115170813A CN 202210761418 A CN202210761418 A CN 202210761418A CN 115170813 A CN115170813 A CN 115170813A
- Authority
- CN
- China
- Prior art keywords
- label
- noise
- sample
- image
- correlation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/30—Noise filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Biomedical Technology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a network supervision fine-grained image identification method based on partial label learning, which comprises the following steps: carrying out depth descriptor transformation by using a depth neural network model with pre-training to evaluate positive correlation between network images, detecting open-set label noise existing in a data set according to a correlation matrix, and removing the open-set label noise; the loss function is used for driving the model to show higher recall rate, so that the label set of each sample contains the labels of the real classes of the samples as much as possible; and selecting real labels of the samples from the label sets of the samples, correcting the label noise of the closed set, and putting the clean data and the closed set label noise with the corrected labels into a deep neural network for training. The method effectively removes the open set label noise, and simultaneously converts the closed set label noise into the training image with the accurate label by using the partial label learning, so that the number of available samples in the network data set is increased, and the learning performance of the neural network model is further improved.
Description
Technical Field
The invention belongs to the field of network supervision image recognition, and particularly relates to a network supervision fine-grained image recognition method based on partial label learning.
Background
Constructing a fine-grained dataset requires a specific domain expert to correctly classify through subtle differences between fine-grained subclasses, and is therefore a difficult task. To reduce the reliance on manual labeling for building fine-grained datasets and to learn more practical models, it is becoming increasingly popular to build and train network datasets directly from images of relevant classes collected from the internet. However, the constructed network data set has more data noise, and direct training can cause overfitting of the model, thereby affecting accuracy. There are generally two types of noise in a fine-grained network data set, namely, open-set label noise and closed-set label noise. Open-set label noise is typically caused by "cross-domain", i.e., noise does not belong to any class in the same fine-grained domain. Closed-set noise refers to images with false labels in a fine-grained domain.
The methods for processing general label noise include sample selection, soft label application or related loss functions, and although the methods have good classification effect, the methods have the risks of 1) discarding a part of clean images, and 2) still have the problems that closed set label noise images cannot be converted into accurate training images in the aspect of noise image utilization.
Disclosure of Invention
The invention aims to provide a network supervision fine-grained image identification method based on partial label learning.
The technical scheme for realizing the purpose of the invention is as follows: in a first aspect, the invention provides a network supervision fine-grained image identification method based on partial label learning, which comprises the following steps:
and 3, selecting a real label of the sample from the label set of the sample by using the idea of partial label learning, thereby correcting the closed set label noise.
In a second aspect, the invention provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method of the first aspect when executing the program.
In a third aspect, the invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method of the first aspect.
In a fourth aspect, the invention provides a computer program product comprising a computer program which, when executed by a processor, performs the steps of the method of the first aspect.
Compared with the prior art, the invention has the following remarkable advantages: (1) The invention provides an open set label noise removal strategy and a closed set label noise correction strategy to process a practical but challenging network supervision fine-grained identification task; (2) Carrying out depth descriptor transformation by using a depth model trained in advance to estimate positive correlation between network images, and effectively detecting and removing open-set label noise according to a correlation value; (3) The candidate label set of each sample is automatically constructed, a loss function driving model is utilized to show higher recall rate, so that the sample real labels exist in the candidate label set of each sample as far as possible, and the performance of biased label learning is ensured; (4) The closed set label noise is converted into a training image with an accurate label by utilizing partial label learning, the conversion from the closed set label noise to the training data is generated while the closed set label noise is corrected, the number of available samples in a network data set is increased, and the improvement of the learning performance of a neural network model is ensured.
Drawings
Fig. 1 is a flowchart of a network supervision fine-grained image identification method based on partial label learning according to the present invention.
Detailed Description
With reference to fig. 1, a network supervision fine-grained image identification method based on partial label learning specifically includes the following steps:
is provided withIn order to be a space for the label,for sample space, for each random batch containing n pictures By having a pretrained convolutional neural network Φ pre Extracting a feature map set containing n feature maps
Wherein H, W and d respectively represent a characteristic diagram t i Height, width and depth. To be gathered in a feature mapFinding more universal high-grade feature expression, and generating feature vector with maximum feature value by using principal component analysisFor a given feature map set, each feature map is subjected to channel weighted summation with the feature vector p to obtain a heat map and form the heat map setCalculating the ith feature map t i Corresponding toThe ith heatmap H in (II) i :
WhereinEach heat map is then up-sampled to the input image size to obtain a correlation matrix C. The correlation matrix is composed of correlation values, positive values representing the sumThe existing universal expression is in positive correlation with similarity, negative value represents negative correlation, and the larger the absolute value is, the stronger the correlation is. According to the quantity and the size of positive values in the correlation matrix, whether the samples are open set label noise can be effectively judged, therefore, a threshold value delta is set to judge each sample, and whether the ith sample is noise is judged:
if the sample does not satisfy the condition, it is regarded as open set label noise and removed from the sample space, so that a sample space composed of clean data and closed set label noise can be obtained
the invention defines a label space asThe sample space isWhereinIndicating that the label belongs to class i y i A set of instances of (c). In the sample selection stage at the beginning of training, C categories are randomly selected to generate a batchFor each selected category y i Randomly select n * Samples of the category. On a batch basisa=n * X C, can be obtained by convolution with a neural network phi CNN Obtaining embedded featuresFor inSample of (1)In other words, the neural network phi can be convolved with CNN Obtaining an embedding feature f i :
Where c is the embedding feature f i Length of (d). Obtaining a similarity matrix by calculating cosine similarity between embedded featuresThe cosine similarity between the ith query image (query image) and the jth support image (support image) is calculated as follows:
similarity s obtained by each query image and other images q,: Arranging the first k pictures with high similarity into a setIn (1). The invention defines the query image as belonging to the same category but not in the setThe image in (2) is called positive image, in the setAn image that is not of the same category as the query image is called a negative image. Is not in the setPositive images in (1) constitute a setWhereinTo representInComplement of, y q Is a label for the query image. The invention is provided with n * <k, a set of negative images can be obtainedWhereinIs in the aggregateNumber of images outside of the same category as the query image, s n Is a matrix containing only the similarity scores of the negative images. Thus, the loss function is defined as
Applying this loss function ensures that each image generates as many tag sets as possible that contain true classes.
And 3, selecting a real label of the sample from the label set of the sample by using the idea of partial label learning, thereby correcting the closed set label noise.
Obtaining as many tag sets containing real categories as possible through step 2Then, a true tag is determined from the closed set tag noise's tag set. In the encoding stage, an encoding matrix M epsilon { +1, -1} is constructed by randomly generating column encoding of N bits N×L Wherein N represents the number of classes, L represents the number of two classifiers, and the coding matrix is used for dividing samples in the training process. A randomly generated column code v = [ ] 1 ,v 2 ,…,v N ] T ∈{+1,-1} N The label space can be divided into positive label spacesAnd a negative label space
Selecting positive and negative samples by utilizing positive and negative label space, and giving a training sampleWhereinThe invention visual label setTo help build a two classifier for a whole. When label setAll fall intoOrTime, sampleWill be used as a positive or negative sample. These positive and negative samples then form a binary training set
At the decoding stage, a connected set is constructed for each class, and the connected set of the j-th class can be represented as:
according to a connected set epsilon y Generating a performance matrix G N×L To reflect the capabilities of the classifiers, class j at class t classifier g t The performance above is calculated as follows:
wherein Is an indicator function, and in order to obtain the relative performance of the classifier on each class, the performance matrix G is normalized line by line:
finally, closed set tag noise obtains pseudo tagsAnd combining the clean samples and the closed set label noise with the pseudo labels, and sending the combined signals into a convolutional neural network for training.
The above examples are only intended to illustrate the technical solution of the present invention, and not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (7)
1. A network supervision fine-grained image identification method based on partial label learning is characterized by comprising the following steps:
step 1, carrying out depth descriptor transformation by using a depth neural network model with pre-training to evaluate positive correlation between network images, detecting open-set label noise existing in a data set according to a correlation matrix reflecting the correlation, and removing the open-set label noise;
step 2, driving the model by using a loss function to enable the label set of each sample to contain the labels of the real types of the samples as much as possible;
and 3, selecting a real label of the sample from the label set of the sample by using the idea of partial label learning, thereby correcting the closed set label noise.
2. The method for identifying the network supervision fine-grained image based on the partial label learning according to the claim 1, characterized in that, in the step 1, the depth descriptor transformation is carried out by utilizing a depth neural network model with pre-training to evaluate the positive correlation between network images, the open set label noise existing in a data set is detected according to a correlation matrix reflecting the correlation, and the open set label noise is removed;
is provided withIn order to be a space for the label,for sample space, for each random batch containing n pictures By having a pretrained convolutional neural network Φ pre Extracting features containing nFeature graph set of graphs
Wherein H, W and d respectively represent a characteristic diagram t i Height, width and depth; feature vector with maximum feature value generated by principal component analysisFor a given feature map set, each feature map is subjected to channel weighted summation with the feature vector p to obtain a heat map and form the heat map setCalculating the ith feature map t i Corresponding toThe ith heatmap H in (II) i :
WhereinThen, each heat map is up-sampled to the size of an input image, and a correlation matrix C is obtained; the correlation matrix is composed of correlation values, positive values representing the sumThe existing positive correlation of universal expression similarity, the negative value represents the negative correlation, and the larger the absolute value is, the stronger the correlation is; judging whether the noise is the open set label noise according to the number and the size of positive values in the correlation matrix, and setting oneJudging each sample by a threshold value delta, judging whether the ith sample is noise:
3. The method for identifying the network supervision fine-grained image based on the partial label learning according to the claim 2, wherein the step 2 shows a higher recall rate by using a loss function driven model.
Defining a tag space asThe sample space isWhereinIndicating that the label belongs to class i y i A set of instances of (a); in the sample selection stage at the beginning of training, C categories are randomly selected to generate a batchFor each selected category y i Randomly select n * Samples of the category; on a batch basisBy convolutional neural networks Φ CNN Obtaining embedded featuresFor inSample of (1)By convolutional neural network phi CNN Obtaining an embedding feature f i :
Where c is the embedding feature f i The length of (d); obtaining a similarity matrix by calculating cosine similarity between embedded featuresThe cosine similarity of the ith query image and the jth support image is calculated as follows:
similarity s obtained by each query image and other images q,: Arranging the first k pictures with high similarity into a setThe preparation method comprises the following steps of (1) performing; defining images that belong to the same category as the query image but are not in the collectionThe image in (2) is called positive image, in the setAn image that is not of the same category as the query image is called a negative image; is not limited toIn the collectionPositive images in (1) constitute a setWhereinTo representInComplement of, y q Is a label of the query image; setting n * < k, get a set of negative images WhereinIs in the aggregateNumber of images belonging to the same category as the query image, s n Is a matrix containing only the similarity scores of negative images; thus, the loss function is defined as
Applying this loss function ensures that each image generates as much of a set of tags as possible that contain true classes.
4. The method for network supervision fine-grained image identification based on partial label learning according to claim 3, characterized in that in step 3, a real label is determined from the label set S obtained in step 2 by using partial label learning;
in the encoding stage, an encoding matrix M belongs to +1, -1 }is constructed by randomly generating column codes with N bits N×L Wherein N represents the number of classes, L represents the number of two classifiers, and the coding matrix is used for dividing samples in the training process; a randomly generated column codeThe label space can be divided into positive label spacesAnd a negative label space
Selecting positive and negative samples by utilizing positive and negative label space, and giving a training sampleWhereinVisual label collectionHelp build two classifiers for a whole; when label setAll fall intoOrWhile, the sampleWill be used as positive or negative samples; these positive and negative samples then form a binary training set
At the decoding stage, a connected set is constructed for each class, the connected set of the j-th class being represented as:
according to a connected set epsilon y Generating a performance matrix G N×L To reflect the capabilities of the classifiers, class j at class t classifier g t The performance above is calculated as follows:
5. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method of any of claims 1-4 are implemented when the program is executed by the processor.
6. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 4.
7. A computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements the steps of the method of any of claims 1 \u4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210761418.9A CN115170813A (en) | 2022-06-30 | 2022-06-30 | Network supervision fine-grained image identification method based on partial label learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210761418.9A CN115170813A (en) | 2022-06-30 | 2022-06-30 | Network supervision fine-grained image identification method based on partial label learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115170813A true CN115170813A (en) | 2022-10-11 |
Family
ID=83489216
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210761418.9A Pending CN115170813A (en) | 2022-06-30 | 2022-06-30 | Network supervision fine-grained image identification method based on partial label learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115170813A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115564960A (en) * | 2022-11-10 | 2023-01-03 | 南京码极客科技有限公司 | Network image label denoising method combining sample selection and label correction |
-
2022
- 2022-06-30 CN CN202210761418.9A patent/CN115170813A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115564960A (en) * | 2022-11-10 | 2023-01-03 | 南京码极客科技有限公司 | Network image label denoising method combining sample selection and label correction |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112115995B (en) | Image multi-label classification method based on semi-supervised learning | |
CN109754015B (en) | Neural networks for drawing multi-label recognition and related methods, media and devices | |
CN113657425B (en) | Multi-label image classification method based on multi-scale and cross-modal attention mechanism | |
CN109948735B (en) | Multi-label classification method, system, device and storage medium | |
CN115937655A (en) | Target detection model of multi-order feature interaction, and construction method, device and application thereof | |
CN110378911B (en) | Weak supervision image semantic segmentation method based on candidate region and neighborhood classifier | |
CN112926675B (en) | Depth incomplete multi-view multi-label classification method under double visual angle and label missing | |
CN115439685A (en) | Small sample image data set dividing method and computer readable storage medium | |
CN111882000A (en) | Network structure and method applied to small sample fine-grained learning | |
CN116486109A (en) | Modal self-adaptive descriptive query pedestrian re-identification method and system | |
CN117437426B (en) | Semi-supervised semantic segmentation method for high-density representative prototype guidance | |
CN115170813A (en) | Network supervision fine-grained image identification method based on partial label learning | |
CN117315377A (en) | Image processing method and device based on machine vision and electronic equipment | |
CN115422518A (en) | Text verification code identification method based on data-free knowledge distillation | |
CN111914949B (en) | Zero sample learning model training method and device based on reinforcement learning | |
CN117671261A (en) | Passive domain noise perception domain self-adaptive segmentation method for remote sensing image | |
CN116069985A (en) | Robust online cross-modal hash retrieval method based on label semantic enhancement | |
CN113762178B (en) | Weak supervision abnormal event time positioning method for background suppression sampling | |
Escalera et al. | Traffic sign recognition system with β-correction | |
CN114168780A (en) | Multimodal data processing method, electronic device, and storage medium | |
CN113378707A (en) | Object identification method and device | |
CN112926585A (en) | Cross-domain semantic segmentation method based on regenerative kernel Hilbert space | |
CN116825210B (en) | Hash retrieval method, system, equipment and medium based on multi-source biological data | |
CN116612466B (en) | Content identification method, device, equipment and medium based on artificial intelligence | |
CN117520104B (en) | System for predicting abnormal state of hard disk |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB03 | Change of inventor or designer information |
Inventor after: Xu Yuyan Inventor after: Wei Xiucan Inventor before: Wei Xiucan Inventor before: Xu Yuyan |
|
CB03 | Change of inventor or designer information |