CN115170813A

CN115170813A - Network supervision fine-grained image identification method based on partial label learning

Info

Publication number: CN115170813A
Application number: CN202210761418.9A
Authority: CN
Inventors: 魏秀参; 许玉燕
Original assignee: Nanjing University of Science and Technology
Current assignee: Nanjing University of Science and Technology
Priority date: 2022-06-30
Filing date: 2022-06-30
Publication date: 2022-10-11

Abstract

The invention discloses a network supervision fine-grained image identification method based on partial label learning, which comprises the following steps: carrying out depth descriptor transformation by using a depth neural network model with pre-training to evaluate positive correlation between network images, detecting open-set label noise existing in a data set according to a correlation matrix, and removing the open-set label noise; the loss function is used for driving the model to show higher recall rate, so that the label set of each sample contains the labels of the real classes of the samples as much as possible; and selecting real labels of the samples from the label sets of the samples, correcting the label noise of the closed set, and putting the clean data and the closed set label noise with the corrected labels into a deep neural network for training. The method effectively removes the open set label noise, and simultaneously converts the closed set label noise into the training image with the accurate label by using the partial label learning, so that the number of available samples in the network data set is increased, and the learning performance of the neural network model is further improved.

Description

Network supervision fine-grained image identification method based on partial label learning

Technical Field

The invention belongs to the field of network supervision image recognition, and particularly relates to a network supervision fine-grained image recognition method based on partial label learning.

Background

Constructing a fine-grained dataset requires a specific domain expert to correctly classify through subtle differences between fine-grained subclasses, and is therefore a difficult task. To reduce the reliance on manual labeling for building fine-grained datasets and to learn more practical models, it is becoming increasingly popular to build and train network datasets directly from images of relevant classes collected from the internet. However, the constructed network data set has more data noise, and direct training can cause overfitting of the model, thereby affecting accuracy. There are generally two types of noise in a fine-grained network data set, namely, open-set label noise and closed-set label noise. Open-set label noise is typically caused by "cross-domain", i.e., noise does not belong to any class in the same fine-grained domain. Closed-set noise refers to images with false labels in a fine-grained domain.

The methods for processing general label noise include sample selection, soft label application or related loss functions, and although the methods have good classification effect, the methods have the risks of 1) discarding a part of clean images, and 2) still have the problems that closed set label noise images cannot be converted into accurate training images in the aspect of noise image utilization.

Disclosure of Invention

The invention aims to provide a network supervision fine-grained image identification method based on partial label learning.

The technical scheme for realizing the purpose of the invention is as follows: in a first aspect, the invention provides a network supervision fine-grained image identification method based on partial label learning, which comprises the following steps:

step 1, carrying out depth descriptor transformation by using a depth neural network model with pre-training to evaluate positive correlation between network images, detecting open-set label noise existing in a data set according to a correlation matrix reflecting the correlation, and removing the open-set label noise;

step 2, driving the model by using a loss function to enable the label set of each sample to contain the labels of the real types of the samples as much as possible;

and 3, selecting a real label of the sample from the label set of the sample by using the idea of partial label learning, thereby correcting the closed set label noise.

In a second aspect, the invention provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method of the first aspect when executing the program.

In a third aspect, the invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method of the first aspect.

In a fourth aspect, the invention provides a computer program product comprising a computer program which, when executed by a processor, performs the steps of the method of the first aspect.

Compared with the prior art, the invention has the following remarkable advantages: (1) The invention provides an open set label noise removal strategy and a closed set label noise correction strategy to process a practical but challenging network supervision fine-grained identification task; (2) Carrying out depth descriptor transformation by using a depth model trained in advance to estimate positive correlation between network images, and effectively detecting and removing open-set label noise according to a correlation value; (3) The candidate label set of each sample is automatically constructed, a loss function driving model is utilized to show higher recall rate, so that the sample real labels exist in the candidate label set of each sample as far as possible, and the performance of biased label learning is ensured; (4) The closed set label noise is converted into a training image with an accurate label by utilizing partial label learning, the conversion from the closed set label noise to the training data is generated while the closed set label noise is corrected, the number of available samples in a network data set is increased, and the improvement of the learning performance of a neural network model is ensured.

Drawings

Fig. 1 is a flowchart of a network supervision fine-grained image identification method based on partial label learning according to the present invention.

Detailed Description

With reference to fig. 1, a network supervision fine-grained image identification method based on partial label learning specifically includes the following steps:

step 1, performing depth descriptor transformation by using a depth neural network model with pre-training to evaluate positive correlation between network images, detecting open set tag noise existing in a data set according to a correlation matrix reflecting the correlation, and removing the open set tag noise;

is provided with

In order to be a space for the label,

for sample space, for each random batch containing n pictures

By having a pretrained convolutional neural network Φ _pre Extracting a feature map set containing n feature maps

Wherein H, W and d respectively represent a characteristic diagram t _i Height, width and depth. To be gathered in a feature map

Finding more universal high-grade feature expression, and generating feature vector with maximum feature value by using principal component analysis

For a given feature map set, each feature map is subjected to channel weighted summation with the feature vector p to obtain a heat map and form the heat map set

Calculating the ith feature map t _i Corresponding to

The ith heatmap H in (II) _i ：

Wherein

Each heat map is then up-sampled to the input image size to obtain a correlation matrix C. The correlation matrix is composed of correlation values, positive values representing the sum

The existing universal expression is in positive correlation with similarity, negative value represents negative correlation, and the larger the absolute value is, the stronger the correlation is. According to the quantity and the size of positive values in the correlation matrix, whether the samples are open set label noise can be effectively judged, therefore, a threshold value delta is set to judge each sample, and whether the ith sample is noise is judged:

if the sample does not satisfy the condition, it is regarded as open set label noise and removed from the sample space, so that a sample space composed of clean data and closed set label noise can be obtained

Step 2, a loss function driving model is used for showing a high recall rate, so that the label set of each sample contains the labels of the real types of the samples as much as possible;

the invention defines a label space as

The sample space is

Wherein

Indicating that the label belongs to class i y _i A set of instances of (c). In the sample selection stage at the beginning of training, C categories are randomly selected to generate a batch

For each selected category y _i Randomly select n ^* Samples of the category. On a batch basis

a＝n ^* X C, can be obtained by convolution with a neural network phi _CNN Obtaining embedded features

For in

Sample of (1)

In other words, the neural network phi can be convolved with _CNN Obtaining an embedding feature f _i ：

Where c is the embedding feature f _i Length of (d). Obtaining a similarity matrix by calculating cosine similarity between embedded features

The cosine similarity between the ith query image (query image) and the jth support image (support image) is calculated as follows:

similarity s obtained by each query image and other images _q,: Arranging the first k pictures with high similarity into a set

In (1). The invention defines the query image as belonging to the same category but not in the set

The image in (2) is called positive image, in the set

An image that is not of the same category as the query image is called a negative image. Is not in the set

Positive images in (1) constitute a set

Wherein

To represent

In

Complement of, y _q Is a label for the query image. The invention is provided with n ^* <k, a set of negative images can be obtained

Wherein

Is in the aggregate

Number of images outside of the same category as the query image, s ⁿ Is a matrix containing only the similarity scores of the negative images. Thus, the loss function is defined as

Applying this loss function ensures that each image generates as many tag sets as possible that contain true classes.

Obtaining as many tag sets containing real categories as possible through step 2

Then, a true tag is determined from the closed set tag noise's tag set. In the encoding stage, an encoding matrix M epsilon { +1, -1} is constructed by randomly generating column encoding of N bits ^N×L Wherein N represents the number of classes, L represents the number of two classifiers, and the coding matrix is used for dividing samples in the training process. A randomly generated column code v = [ ] ₁ ,v ₂ ,…,v _N ] ^T ∈{+1,-1} ^N The label space can be divided into positive label spaces

And a negative label space

Selecting positive and negative samples by utilizing positive and negative label space, and giving a training sample

Wherein

The invention visual label set

To help build a two classifier for a whole. When label set

All fall into

Or

Time, sample

Will be used as a positive or negative sample. These positive and negative samples then form a binary training set

At the decoding stage, a connected set is constructed for each class, and the connected set of the j-th class can be represented as:

according to a connected set epsilon _y Generating a performance matrix G ^N×L To reflect the capabilities of the classifiers, class j at class t classifier g _t The performance above is calculated as follows:

wherein

Is an indicator function, and in order to obtain the relative performance of the classifier on each class, the performance matrix G is normalized line by line:

wherein

For a closed set tag noise

Class prediction can be obtained by:

finally, closed set tag noise obtains pseudo tags

And combining the clean samples and the closed set label noise with the pseudo labels, and sending the combined signals into a convolutional neural network for training.

The above examples are only intended to illustrate the technical solution of the present invention, and not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A network supervision fine-grained image identification method based on partial label learning is characterized by comprising the following steps:

2. The method for identifying the network supervision fine-grained image based on the partial label learning according to the claim 1, characterized in that, in the step 1, the depth descriptor transformation is carried out by utilizing a depth neural network model with pre-training to evaluate the positive correlation between network images, the open set label noise existing in a data set is detected according to a correlation matrix reflecting the correlation, and the open set label noise is removed;

is provided with

In order to be a space for the label,

for sample space, for each random batch containing n pictures

By having a pretrained convolutional neural network Φ _pre Extracting features containing nFeature graph set of graphs

Wherein H, W and d respectively represent a characteristic diagram t _i Height, width and depth; feature vector with maximum feature value generated by principal component analysis

Calculating the ith feature map t _i Corresponding to

The ith heatmap H in (II) _i ：

Wherein

Then, each heat map is up-sampled to the size of an input image, and a correlation matrix C is obtained; the correlation matrix is composed of correlation values, positive values representing the sum

The existing positive correlation of universal expression similarity, the negative value represents the negative correlation, and the larger the absolute value is, the stronger the correlation is; judging whether the noise is the open set label noise according to the number and the size of positive values in the correlation matrix, and setting oneJudging each sample by a threshold value delta, judging whether the ith sample is noise:

if the sample does not satisfy the condition, the open set label noise is regarded and removed from the sample space, thereby obtaining a sample space composed of clean data and closed set label noise

3. The method for identifying the network supervision fine-grained image based on the partial label learning according to the claim 2, wherein the step 2 shows a higher recall rate by using a loss function driven model.

Defining a tag space as

The sample space is

Wherein

Indicating that the label belongs to class i y _i A set of instances of (a); in the sample selection stage at the beginning of training, C categories are randomly selected to generate a batch

For each selected category y _i Randomly select n ^* Samples of the category; on a batch basis

By convolutional neural networks Φ _CNN Obtaining embedded features

For in

Sample of (1)

By convolutional neural network phi _CNN Obtaining an embedding feature f _i ：

Where c is the embedding feature f _i The length of (d); obtaining a similarity matrix by calculating cosine similarity between embedded features

The cosine similarity of the ith query image and the jth support image is calculated as follows:

similarity s obtained by each query image and other images _q，： Arranging the first k pictures with high similarity into a set

The preparation method comprises the following steps of (1) performing; defining images that belong to the same category as the query image but are not in the collection

The image in (2) is called positive image, in the set

An image that is not of the same category as the query image is called a negative image; is not limited toIn the collection

Positive images in (1) constitute a set

Wherein

To represent

In

Complement of, y _q Is a label of the query image; setting n ^* < k, get a set of negative images

Wherein

Is in the aggregate

Number of images belonging to the same category as the query image, s ⁿ Is a matrix containing only the similarity scores of negative images; thus, the loss function is defined as

Applying this loss function ensures that each image generates as much of a set of tags as possible that contain true classes.

4. The method for network supervision fine-grained image identification based on partial label learning according to claim 3, characterized in that in step 3, a real label is determined from the label set S obtained in step 2 by using partial label learning;

in the encoding stage, an encoding matrix M belongs to +1, -1 }is constructed by randomly generating column codes with N bits ^N×L Wherein N represents the number of classes, L represents the number of two classifiers, and the coding matrix is used for dividing samples in the training process; a randomly generated column code

The label space can be divided into positive label spaces

And a negative label space

Wherein

Visual label collection

Help build two classifiers for a whole; when label set

All fall into

Or

While, the sample

Will be used as positive or negative samples; these positive and negative samples then form a binary training set

At the decoding stage, a connected set is constructed for each class, the connected set of the j-th class being represented as:

wherein

Is an indicator function, normalizes the performance matrix G line by line:

wherein

For a closed set tag noise

Obtaining a class prediction by:

finally, closed set tag noise obtains pseudo tags

And combining the clean samples and the closed set label noise with the pseudo labels and sending the combined signals into a convolutional neural network for training.

5. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method of any of claims 1-4 are implemented when the program is executed by the processor.

6. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 4.

7. A computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements the steps of the method of any of claims 1 \u4.