CN116012569B

CN116012569B - Multi-label image recognition method based on deep learning and under noisy data

Info

Publication number: CN116012569B
Application number: CN202310299402.5A
Authority: CN
Inventors: 陈添水; 徐志华; 黄衍聪; 柯梓铭; 付晨博; 范耀洲; 杨志景
Original assignee: Guangdong University of Technology
Current assignee: Guangdong University of Technology
Priority date: 2023-03-24
Filing date: 2023-03-24
Publication date: 2023-08-15
Anticipated expiration: 2043-03-24
Also published as: CN116012569A

Abstract

The invention provides a multi-label image recognition method under noisy data based on deep learning, which comprises the steps of acquiring a multi-label noisy data set and preprocessing; establishing a double-branch multi-label correction neural network model; inputting the preprocessed multi-label noisy data set into a double-branch multi-label correction neural network model for comparison learning training to obtain an optimized double-branch multi-label correction neural network model; acquiring a noise-containing picture to be corrected, correcting the noise-containing picture to be corrected by using the optimized double-branch multi-label correction neural network model, and carrying out image recognition on the noise-containing picture to be corrected according to the correction label; the method can carry out label correction on the multi-label noisy data set, saves the cost of manpower and material resources, and realizes the efficient utilization of data resources; meanwhile, the prediction result is more robust; in addition, the invention prescribes the upper and lower bounds according to the predicted value of the training picture, so that the noise can be weakened, and the over fitting of the noise can be avoided.

Description

Multi-label image recognition method based on deep learning and under noisy data

Technical Field

The invention relates to the technical field of computer vision and image multi-label classification, in particular to a multi-label image recognition method based on deep learning and under noisy data.

Background

With the continuous development of internet technology, artificial intelligence technology is mature, and deep learning has become one of the most fire branches in the artificial intelligence technology. Deep learning is popular because of excellent performance, abundant frames, convenient calling and simple entry. However, conventional deep learning algorithms require a large number of manually labeled samples as data sets, which are typically large in sample size, often up to tens or even hundreds of thousands of samples, and require that the labels for each sample be accurate. Thus, the creation of a quality dataset suitable for training requires significant human and capital costs, which represents a significant impediment to further development of deep learning. On the other hand, there is a large amount of data containing tag noise on the internet, that is, tags of part of the data are erroneous, and the data can be easily obtained by using a crawler. The traditional deep learning algorithm can only train by using clean and correct data of the labels, and for multi-label noisy data, the traditional deep learning algorithm cannot use the data, so that the waste of data resources is caused.

Taking the identification of orange pictures as an example, a plurality of pictures with labels of orange on the network are found to be wrongly marked after analysis, for example, the pictures of lemon with similar shape and appearance to the orange are marked as orange, and the wrongly marked types are called as first type wrongly marked; or an object far from the orange, such as a sunset of orange, is marked as "orange", and such a mismark is referred to as a second type of mismark. If the data with the error labels are directly used for training a traditional deep learning network, the network learns a plurality of error data, so that the generalization effect of the model is poor, and the model is difficult to apply in a floor mode. In the face of this, there are two approaches to improvement: firstly, the pictures are marked again manually, which consumes great manpower and material resources; and secondly, the part of the data set is directly discarded, and the data resource is wasted.

Therefore, how to train the neural network by using the noisy data sets conveniently is one of the problems to be solved in the future development of deep learning, and is also a trend of development in the big data age.

The prior art discloses a weak supervision image multi-label classification method based on meta learning, which comprises the steps of providing an image multi-label classification model based on label information enhancement, adopting a neural network of an encoding-decoding architecture, and sequentially judging whether labels in a label sequence are related in a sequence labeling mode to obtain related labels of the image; aiming at the phenomenon of model overfitting caused by insufficient supervision information in a weak supervision environment, a teacher-student network architecture training method based on meta learning is also provided, and the accuracy of image annotation is further improved; the method in the prior art only aims at solving the problem that effective modeling cannot be realized due to tag missing, the image without tags or tag errors cannot be effectively corrected, and the accuracy of labeling a data set containing a large amount of noise and error tags is low.

Disclosure of Invention

The invention provides a multi-label image recognition method based on deep learning and under noisy data, which aims to overcome the defect that the correction effect of a data set containing multiple noisy labels in the prior art is poor, and can correct the labels of the multi-label noisy data set, save the cost of manpower and material resources and realize the efficient utilization of data resources.

In order to solve the technical problems, the technical scheme of the invention is as follows:

a multi-label image recognition method based on deep learning under noisy data comprises the following steps:

s1: the method comprises the steps of obtaining a multi-label noisy data set and preprocessing, wherein the specific method comprises the following steps:

acquiring a multi-label noisy data set according to preset K multi-label classification categories;

dividing the obtained multi-tag noisy data set intoThe training set comprises N pictures, and each picture is marked with a pseudo tagThe training set is marked as X; dividing the training set into two first sub-training sets D with the same number of pictures ¹ And a second sub-training set D ², wherein ,/>，，/>，/>Representing the i picture->And its corresponding pseudo tag->；

Determining length and width data and pseudo tags of pictures in each sub-training setWherein the length of the picture is denoted as H and the width of the picture is denoted as W; finishing preprocessing of the multi-label noisy data set;

S2: the method comprises the steps of establishing a double-branch multi-label correction neural network model, specifically:

the dual-branch multi-label correction neural network model comprises a first label correction sub-model M which is arranged in parallel ¹ And a second label modifier model M ² The method comprises the steps of carrying out a first treatment on the surface of the The first label modifier model M ¹ And a second label modifier model M ² The structure of the model is the same and the model parameters are different;

the first label modifier model M ¹ Or a second label modifier model M ² Includes feature extractor, instance contrast learning connected in turnThe device comprises a module, a category prototype comparison learning module, a classifier and a label correction module;

s3: inputting the preprocessed multi-label noisy data set into a double-branch multi-label correction neural network model for comparison learning training to obtain an optimized double-branch multi-label correction neural network model, wherein the specific method comprises the following steps of:

s3.1: will first sub training set D ¹ In a picture ofAnd a second sub-training set D ² Picture->Common input into a two-branch, multi-tag modified neural network model, wherein +.>Satisfy->，/>For the first sub-training set D ¹ Or a second sub-training set D ² The number of pictures in (a);

s3.2: modifying the submodel M by using the first label respectively ¹ And a second label modifier model M ² The feature extractor of (1) is used for inputting pictures And picture->Extracting features to obtain first features ∈>And second feature->And third feature->And fourth feature->；

S3.3: will first featureAnd second feature->Common input of first tag modifier sub-model M ¹ Is to add the third feature +.>And fourth feature->Common input of a second tag modifier sub-model M ² Is to picture +.>Is>And third feature->Performing first contrast learning, and performing +.>Second feature->And fourth feature->Performing first contrast learning, and setting a first loss function +.>Correction of the first label sub-model M ¹ And a second label modifier model M ² Example contrast learning moduleUpdating row parameters;

s3.4: will first featureInputting a first label modifier model M ¹ Is compared with a preset first category prototype feature>Performing a second contrast learning to obtain a fourth characteristic +.>Inputting a second label modifier model M ² Category prototype comparison learning module of (2) and a preset second category prototype feature +.>Performing a second contrast learning and setting a second loss function +.>Correction of the first label sub-model M ¹ And a second label modifier model M ² The category prototype comparison learning module of (1) performs parameter updating;

s3.5: will first featureInputting a first label modifier model M ¹ In the classifier of (2) calculating the output picture +.>Classification probability of (c); fourth feature->Inputting a second label modifier model M ² In the classifier of (2) calculating the output picture +.>Classification probability of (c);

s3.6: picture is madeIs input into a first label modifier sub-model M ¹ The label correction module of (2) for picturesPseudo tag of->Performing label correction to obtain picture->Is->The method comprises the steps of carrying out a first treatment on the surface of the Picture->Is input into a second label modifier sub-model M ² The label correction module of (1) for picture->Pseudo tag of->Performing label correction to obtain a pictureIs->The method comprises the steps of carrying out a first treatment on the surface of the And sets a third loss function->Respectively calculating a first label correction sub-model M ¹ And a second label correction sub-model M ² The cross entropy loss of the label correction module of the (2) is used for carrying out parameter updating;

s3.7: according to a first loss functionSecond loss function->And a third loss function->Setting the total loss function->Parameter updating is carried out on the double-branch multi-label correction neural network model, and an optimized double-branch multi-label correction neural network model is obtained;

S4: obtaining a noise-containing picture to be corrected, correcting the noise-containing picture to be corrected by using the optimized double-branch multi-label correction neural network model, obtaining a correction label of the noise-containing picture to be corrected, and carrying out image recognition on the noise-containing picture to be corrected according to the correction label.

Preferably, a pseudo tag of each picture in each sub-training set is determinedThe specific method of the values of (2) is as follows:

judging whether the pictures in each sub-training set belong to a preset multi-label classification category k, if so, the value of the pseudo label of the ith picture relative to the multi-label classification category kOtherwise->。

Preferably, the specific method of step S3.3 is as follows:

will first featureAnd second feature->Common input of first tag modifier sub-model M ¹ Is to add the third feature +.>And fourth feature->Common input of a second tag modifier sub-model M ² An instance comparison learning module of (a);

for picturesAccording to the first feature->And third feature->Calculating corresponding first eigenvector->And a second feature vector->The method specifically comprises the following steps:

wherein ,C₁ For picturesIs a pseudo tag number of (a); />Representing picture->C of (2) ₁ A j-th feature vector;

the obtained first feature vectorSatisfy- >Second feature vector->Satisfy the following requirements/>；

According to the first feature vectorAnd a second feature vector->Constructing a first positive sample pairAnd constructs the first circulation sequence +.>Satisfies the following conditions，R ₁ For the first cycle sequence->Is a sequence length of (2);

according to a first cyclic sequenceConstruction of the first negative sample pair +.>Performing first contrast learning by using the constructed first positive sample pair and the first negative sample pair;

setting a first loss functionModifying the first label sub-model M ¹ The example comparison learning module of (1) performs parameter updating, specifically:

wherein ,modifying the submodel M for the first tag ¹ In the example contrast learning module of (1), for pictures +.>Is a first loss function value,/>For picture->Total number of categories required for multi-tag classification, +.>For picture->Corresponding->Category (S),>is a temperature coefficient>For picture->Is>The 1 st eigenvector after dimension reduction, < ->For picture->Is the first of (2)The 2 nd feature vector after dimension reduction; />For picture->Classifying the value of the pseudo tag of class k relative to the multi-tag;

for picturesAccording to the second feature->And fourth feature->Calculating corresponding third eigenvector->And fourth feature vector->The method specifically comprises the following steps:

wherein ,C₂ For picturesIs a pseudo tag number of (a); />Representing picture- >C of (2) ₂ A j-th feature vector;

the obtained third feature vectorSatisfy->Fourth feature vector->Satisfy the following requirements；

According to the third feature vectorAnd fourth feature vector->Constructing a second positive sample pairAnd constructing a second circulation sequence +.>Satisfy->，R ₂ For the second cycle sequence->Is a sequence length of (2);

according to a second cyclic sequenceConstruction of a second negative sample pair +.>Performing first contrast learning by using the constructed second positive sample pair and the second negative sample pair;

setting a first loss functionModifying the second label sub-model M ² The example comparison learning module of (1) performs parameter updating, specifically:

wherein ,is a second tag modifierModel M ² In the example contrast learning module of (1), for pictures +.>Is a first loss function value,/>For picture->Total number of categories required for multi-tag classification, +.>For picture->Corresponding->The number of categories of the product,for picture->Is>The 2 nd eigenvector after dimension reduction, < ->For picture->Is>The 1 st feature vector after dimension reduction; />For picture->Relative multi-tag class->Is a pseudo tag value of (a).

Preferably, the specific method of step S3.4 is as follows:

will first featureInputting a first label modifier model M ¹ The category prototype comparison learning module of (1) compares pictures Is>Prototype feature +.>Performing a second contrast learning, and updating the first class prototype feature by using a momentum method>：

wherein ,for the first class prototype feature corresponding to the updated kth class,/for example>For a first class prototype feature corresponding to the kth class, m is a preset momentum;

setting a second loss functionModifying the first label sub-model M ¹ The category prototype comparison learning module of (1) performs parameter updating, specifically:

wherein ,modifying the submodel M for the first tag ¹ In the category prototype comparison learning module of (1), for picturesA second loss function value of (2);

will fourth featureInputting a second label modifier model M ² The category prototype comparison learning module of (1) compares the pictures +.>Second feature vector +.>And a second category prototype feature +.>Performing a second contrast learning, and updating the second class prototype feature by using a momentum method>：

wherein ,for the updated->Second category prototype feature corresponding to the respective category, +.>Is->A second class prototype feature corresponding to the individual class;

setting a second loss functionModifying the second label sub-model M ² The category prototype comparison learning module of (1) performs parameter updating, specifically:

wherein ,modifying the submodel M for the second label ² In the category prototype comparison learning module of (1), for pictures +.>A second loss function value of (c).

Preferably, the specific method of step S3.5 is as follows:

will first featureInputting a first label modifier model M ¹ In the classifier of (2) calculating the output picture +.>The classification probability of (3) is specifically:

wherein ,for picture->Classification probability of->For sigmoid function, +.>Calculating a function for the confidence score of the classifier;

will fourth featureInputting a second label modifier model M ² In the classifier of (2) calculating the output picture +.>The classification probability of (3) is specifically:

wherein ,for picture->Classification probability of->For sigmoid function, +.>A function is calculated for the confidence score of the classifier.

Preferably, the specific method of step S3.6 is as follows:

picture is madeClassification probability of->Inputting a first label modifier model M ¹ Is set to a first threshold +.>Second threshold->Third threshold->And a fourth threshold->Dynamically updating four thresholds by using a preset momentum m;

according to the updated third threshold valueAnd a fourth threshold->And picture->Classification probability of->Determining binary noise tag->Is a value of (2);

according to the updated first threshold value And a second threshold->Get pictures->Intermediate label->；

When noise labelAt the time of using picture->Intermediate label->Replacement picture->Is a pseudo tag of (a)As picture->Is->；

When noise labelWhen the picture is reserved->Pseudo tag of->As picture->Is->；

Picture is madeClassification probability of->Inputting a second label modifier model M ² A tag correction module of (a);

according to the updated first threshold valueAnd a second threshold->Get pictures->Intermediate label->；

When noise labelAt the time of using picture->Intermediate label->Replacement picture->Pseudo tag of->As picture->Is->；

The third loss functionThe method comprises the following steps:

wherein ,loss of binary cross entropy for the ith picture,/->The value of the pseudo tag of category k is classified for the i-th picture relative to the multi-tag.

Preferably, the total loss function in step S3.7The method comprises the following steps:

wherein ,for the total loss function value->For the first loss function->Balance factor of- >As a second loss functionIs a balance factor of (a).

Compared with the prior art, the technical scheme of the invention has the beneficial effects that:

the invention provides a multi-label image recognition method under noisy data based on deep learning, which comprises the steps of acquiring a multi-label noisy data set and preprocessing; establishing a double-branch multi-label correction neural network model; inputting the preprocessed multi-label noisy data set into a double-branch multi-label correction neural network model for comparison learning training to obtain an optimized double-branch multi-label correction neural network model; acquiring a noise-containing picture to be corrected, correcting the noise-containing picture to be corrected by using the optimized double-branch multi-label correction neural network model, acquiring a correction label of the noise-containing picture to be corrected, and carrying out image recognition on the noise-containing picture to be corrected according to the correction label;

according to the method and the device, related pictures can be collected from the Internet as data sets according to specific application of a user, a dual-branch network is trained, a model supporting classification of multi-label pictures is constructed, label correction and image recognition can be carried out on the multi-label noisy data sets, the cost of manpower and material resources is saved, and efficient utilization of data resources is realized; the invention also provides a contrast learning method, which can learn some common characterizations from each other while the difference exists in the branch networks, and average the prediction of the model when classifying the pictures, so that the result is more robust; in addition, the invention prescribes the upper and lower bounds according to the predicted value of the training picture, and changes the label of the picture with the predicted value exceeding or being lower than the threshold value, thereby achieving the effect of weakening noise and avoiding the overfitting to the noise.

Drawings

Fig. 1 is a flowchart of a multi-label image recognition method under noisy data based on deep learning according to embodiment 1.

Fig. 2 is a comparative learning training flowchart of the dual-branch multi-label modified neural network model provided in embodiment 2.

Detailed Description

The drawings are for illustrative purposes only and are not to be construed as limiting the present patent;

for the purpose of better illustrating the embodiments, certain elements of the drawings may be omitted, enlarged or reduced and do not represent the actual product dimensions;

it will be appreciated by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.

The technical scheme of the invention is further described below with reference to the accompanying drawings and examples.

Example 1

As shown in fig. 1, the embodiment provides a multi-label image recognition method under noisy data based on deep learning, which includes the following steps:

s1: acquiring a multi-label noisy data set and preprocessing;

s2: establishing a double-branch multi-label correction neural network model;

s3: inputting the preprocessed multi-label noisy data set into a double-branch multi-label correction neural network model for comparison learning training to obtain an optimized double-branch multi-label correction neural network model;

In the specific implementation process, firstly, a multi-label noisy data set is obtained and preprocessed, and in the embodiment, the multi-label noisy data set is obtained from the Internet; establishing a double-branch multi-label correction neural network model; inputting the preprocessed multi-label noisy data set into a double-branch multi-label correction neural network model for comparison learning training to obtain an optimized double-branch multi-label correction neural network model; finally, obtaining a noise-containing picture to be corrected, correcting the noise-containing picture to be corrected by using the optimized double-branch multi-label correction neural network model, obtaining a correction label of the noise-containing picture to be corrected, and carrying out image recognition on the noise-containing picture to be corrected according to the correction label;

according to the method and the device, related pictures can be collected from the Internet as data sets according to specific application of a user, the dual-branch network is trained, a model supporting classification of the multi-label pictures is constructed, label correction can be carried out on the multi-label noisy data sets, the cost of manpower and material resources is saved, and efficient utilization of data resources is achieved.

Example 2

The embodiment provides a multi-label image recognition method based on deep learning under noisy data, which comprises the following steps:

dividing the obtained multi-label noisy data set into a training set and a verification set, wherein the training set comprises N pictures, and each picture is marked with a pseudo labelThe training set is marked as X; dividing the training set into two first sub-training sets D with the same number of pictures ¹ And a second sub-training set D ², wherein ,/>，，/>，/>Representing the i picture->And its corresponding pseudo tag->；

the first label modifier model M ¹ Or a second label modifier model M ² The system comprises a feature extractor, an example comparison learning module, a category prototype comparison learning module, a classifier and a label correction module which are connected in sequence;

s3: as shown in fig. 2, the preprocessed multi-label noisy data set is input into a dual-branch multi-label correction neural network model for comparison learning training, and an optimized dual-branch multi-label correction neural network model is obtained, and the specific method is as follows:

s3.2: modifying the submodel M by using the first label respectively ¹ And a second label modifier model M ² The feature extractor of (1) is used for inputting picturesAnd picture->Extracting features to obtain first features ∈>And second feature->And third feature->And fourth feature->；

S3.3: will first featureAnd second feature->Common input of first tag modifier sub-model M ¹ Is to add the third feature +. >And fourth feature->Common input of a second tag modifier sub-model M ² Is to picture +.>Is>And third feature->Performing first contrast learning, and performing +.>Second feature->And fourth feature->Performing first contrast learning, and setting a first loss function +.>Correction of the first label sub-model M ¹ And a second label modifier model M ² The instance comparison learning module of (a) performs parameter updating;

s3.5: will first featureInputting a first label modifier model M ¹ In the classifier of (2) calculating the output picture +.>Classification probability of (c); fourth feature->Inputting a second label modifier model M ² In the classifier of (2) calculating the output picture +. >Classification probability of (c);

s4: acquiring a noise-containing picture to be corrected, correcting the noise-containing picture to be corrected by using the optimized double-branch multi-label correction neural network model, acquiring a correction label of the noise-containing picture to be corrected, and carrying out image recognition on the noise-containing picture to be corrected according to the correction label;

Determining pseudo tags for pictures in each sub-training setThe specific method of the values of (2) is as follows:

judging whether the pictures in each sub-training set belong to a preset multi-label classification category k, if so, the value of the pseudo label of the ith picture relative to the multi-label classification category kOtherwise->；

The specific method of the step S3.3 is as follows:

for picturesAccording to the first feature->And third feature->Calculation ofCorresponding first feature vector->And a second feature vector->The method specifically comprises the following steps:

the obtained first feature vectorSatisfy->Second feature vector->Satisfy->；

According to the first feature vectorAnd a second feature vector->Constructing a first positive sample pairAnd constructing a first cycle sequenceColumn->Satisfies the following conditions，R ₁ For the first cycle sequence->Is a sequence length of (2);

for picturesAccording to the second feature->And fourth feature->Calculate the correspondingThird feature vector->And fourth feature vector->The method specifically comprises the following steps:

wherein ,C₂ For picturesIs a pseudo tag number of (a); />Representing picture->C of (2) ₂ A j-th feature vector;

wherein ,modifying the submodel M for the second label ² In the example contrast learning module of (1), for pictures +.>Is a first loss function value,/>For picture->Total number of categories required for multi-tag classification, +.>For picture->Corresponding->The number of categories of the product,for picture->Is>The 2 nd eigenvector after dimension reduction, < ->For picture->Is>The 1 st feature vector after dimension reduction; />For picture->Relative multi-tag class->Is a pseudo tag value of (1);

the specific method of the step S3.4 is as follows:

will first featureInputting a first label modifier model M ¹ The category prototype comparison learning module of (1) compares picturesIs>Prototype feature +.>Performing a second contrast learning, and updating the first class prototype feature by using a momentum method>：

setting a second loss functionModifying the second label sub-model M ² The category prototype comparison learning module of (1) performs parameter updating, specifically: />

wherein ,modifying the submodel M for the second label ² In the category prototype comparison learning module of (1), for pictures +.>A second loss function value of (2);

the specific method of the step S3.5 is as follows:

will fourth featureInputting a second label modifier model M ² In the classifier of (2) calculating the output picture +.>Classification probability of (1), in particular：

the specific method of the step S3.6 is as follows:

First tag modifier sub-model M ¹ The specific correction process in the label correction module is as follows:

according to the updated third threshold valueAnd a fourth threshold- >And picture->Classification probability of->Determining binaryNoise label->Is a value of (2);

The third loss functionThe method comprises the following steps:

wherein ,loss of binary cross entropy for the ith picture,/->Classifying the value of the pseudo tag of category k for the i-th picture relative to the multi-tag;

the total loss function in step S3.7The method comprises the following steps:

wherein ,for the total loss function value->For the first loss function->Balance factor of->As a second loss functionIs a balance factor of (a).

In the specific implementation process, firstly acquiring and preprocessing a multi-label noisy data set, acquiring the multi-label noisy data set according to preset K multi-label classification categories, dividing the acquired multi-label noisy data set into a training set and a verification set, wherein the training set comprises N pictures, and each picture is marked with a pseudo labelThe training set is marked as X, and the specific method is as follows:

microsoft COCO and Pascal VOC are the two most widely used datasets in evaluating the MLR algorithm, where the Microsoft COCO dataset contains 80 categories and the Pascal VOC dataset contains 20 categories, in this embodiment, the 80 categories contained in the Microsoft COCO dataset are selected to construct the Web-COCO and Web-Pascal datasets, with one or more categories selected randomly as keywords, such as: "person" or "person, truck, bus";

Searching corresponding pictures from a search engine, wherein the pictures comprise google, hundred degrees and necessary pictures, and taking more than 500000 obtained noisy pictures as a multi-label noisy data set;

then, eliminating incomplete and repeated pictures, constructing a Web-COCO data set by using the rest 290000 noisy pictures, and deeply selecting pictures at least comprising one of 20 Pascal VOC categories to construct the Web-Pascal data set;

the Web-COCO data set contains 290000 pictures, and each picture needs to be endowed with a pseudo tag according to category keywordsRandomly selecting 20000 pictures for manual annotation, and endowing the pictures with more accurate and deeper description;

the Web-co dataset had the following drawbacks: firstly, the label noise exists, and when the network searches data, the label noise is inevitably generated; in the multi-tag picture of the present embodiment, tag noise can be divided into the following cases: a picture contains many categories of information, but the corresponding keywords do not contain these categories, which may lead to erroneous, negative-type pictures; a better description of noisy pictures is obtained by calculating the accuracy and recall of each class, which results show an average recall and accuracy of 46.1% and 64.6%, respectively, which indicates that there is severe label noise in the dataset;

Another drawback is semantic dispersion, a multi-labeled image contains multiple semantic objects spread across the image; therefore, it is necessary to find the corresponding semantic region to help find the missing label, while searching the whole image also helps correct the wrong front label;

a third drawback is that the category is not uniform, and in the real world, the phenomenon of category non-uniformity is common and appears more serious when the network retrieves multi-tag pictures; for example: the maximum number of pictures is "people", 15% and the minimum sum of 20 pictures is only 5% of the total, for evaluating WS-MLR tasks we use Web-COCO as training set and Microsoft COCO as validation set containing 40,504 fully manually annotated images;

the Web-Pascal data set comprises 236043 pictures, 20 categories in the Pascal VOC data set are used, and similar to the Web-COCO data set, the Web-Pascal data set also has the defects of label noise, semantic dispersion, uneven categories and the like; likewise, 4952 manually annotated pictures in the Web-Pascal dataset are used as verification sets, and other pictures are used as training sets;

dividing the training set into two first sub-training sets D with the same number of pictures ¹ And a second sub-training set D ², wherein ,，/>，/>，/>representing the i picture->And its corresponding pseudo tag->；

establishing a double-branch multi-label correction neural network model;

inputting the preprocessed multi-label noisy data set into a double-branch multi-label correction neural network model for comparison learning training to obtain an optimized double-branch multi-label correction neural network model, wherein the specific method comprises the following steps of:

S3.3: will first featureAnd second feature->Common input of first tag modifier sub-model M ¹ Is to add the third feature +.>And fourth feature->Common input of a second tag modifier sub-model M ² Is to picture +.>Is>And third feature->Performing first contrast learning, and performing +.>Second feature->And fourth feature->Performing first contrast learning, and setting a first loss function +.>Correction of the first label sub-model M ¹ And a second label modifier model M ² The example comparison learning module of (1) performs parameter updating, specifically:

for picturesAccording to the first feature->And third feature->Calculate the corresponding firstFeature vectorAnd a second feature vector->The method specifically comprises the following steps:

the obtained first feature vectorSatisfy->Second feature vector->Satisfy->；

According to the first feature vectorAnd a second feature vector->Constructing a first positive sample pairAnd constructs the first circulation sequence +. >Satisfies the following conditions，R ₁ For the first cycle sequence->In the present embodiment, R ₁ =8192; according to the first cycle sequence->Construction of the first negative sample pair +.>Performing first contrast learning by using the constructed first positive sample pair and the first negative sample pair;

wherein ,modifying the submodel M for the first tag ¹ In the example contrast learning module of (1), for pictures +.>Is a first loss function value,/>For picture->Total number of categories required for multi-tag classification, +.>For picture->Corresponding->Category (S),>is a temperature coefficient>For picture->Is>The 1 st eigenvector after dimension reduction, < ->For picture->Is the first of (2)The 2 nd eigenvector after dimension reduction, < ->For picture->The value of the pseudo tag for the category k of the relative multi-tag class, in this embodiment ++>，/>Is 128%>Is 2048 in dimension; />

the obtained third feature vector Satisfy->Fourth feature vector->Satisfy the following requirements；

According to the third feature vectorAnd fourth feature vector->Constructing a second positive sample pairAnd constructing a second circulation sequence +.>Satisfy->，R ₂ For the second cycle sequence->In the present embodiment, R ₂ =8192; according to the second circulation sequence->Construction of a second negative sample pair +.>Performing first contrast learning by using the constructed second positive sample pair and the second negative sample pair;

wherein ,modifying the submodel M for the second label ² In the example contrast learning module of (1), for pictures +.>Is a first loss function value,/>For picture->Total number of categories required for multi-tag classification, +.>For picture->Corresponding->The number of categories of the product,for picture->Is>The 2 nd eigenvector after dimension reduction, < ->For picture->Is>The 1 st eigenvector after dimension reduction, < ->For picture->Relative multi-tag class->Is a pseudo tag value of (1);

s3.4: will first featureInputting a first label modifier model M ¹ Is compared with a preset first category prototype feature>Performing a second contrast learning to obtain a fourth characteristic +. >Inputting a second label modifier model M ² Category prototype comparison learning module of (2) and a preset second category prototype feature +.>Performing a second contrast learning and setting a second loss function +.>Correction of the first label sub-model M ¹ And a second label modifier model M ² The category prototype comparison learning module of (1) performs parameter updating, specifically:

will fourth featureInputting a second label modifier model M ² The category prototype comparison learning module of (1) compares the pictures +. >Second feature vector +.>And a second category prototype feature +.>Performing a second contrast learning, and updating the second class prototype feature by using a momentum method>：

s3.5: will first featureInputting a first label modifier model M ¹ In the classifier of (2) calculating the output picture +.>Classification probability of (c); fourth feature->Inputting a second label modifier model M ² In the classifier of (2) calculating the output picture +.>The classification probability of (3) is specifically:

will fourth featureInputting a second label modifier model M ² In the classifier of (2), an output graph is calculatedSheet->The classification probability of (3) is specifically:

s3.6: picture is madeIs input into a first label modifier sub-model M ¹ The label correction module of (2) for picturesPseudo tag of->Performing label correction to obtain picture->Is->The method comprises the steps of carrying out a first treatment on the surface of the Picture->Is input into a second label modifier sub-model M ² The label correction module of (1) for picture->Pseudo tag of->Performing label correction to obtain a pictureIs->The method comprises the steps of carrying out a first treatment on the surface of the And sets a third loss function->Respectively calculating a first label correction sub-model M ¹ And a second label correction sub-model M ² The cross entropy loss of the label correction module of (a) to update parameters, specifically:

picture is madeClassification probability of->Inputting a first label modifier model M ¹ Is set to a first threshold +.>Second threshold->Third threshold->And a fourth threshold->Dynamically updating four thresholds with a preset momentum m, in this embodiment a third threshold +.>Fourth threshold->；

According to the updated third threshold valueAnd a fourth threshold->And picture- >Classification probability of->Determining binary noise tag->Is a value of (2);

The third loss functionThe method comprises the following steps:

wherein ,a binary cross entropy penalty for the i-th picture;

s3.7: according to a first loss function Second loss function->And a third loss function->Setting the total loss function->Parameter updating is carried out on the double-branch multi-label correction neural network model, and an optimized double-branch multi-label correction neural network model is obtained;

the total loss functionThe method comprises the following steps:

wherein ,for the total loss function value->For the first loss function->Balance factor of->For the second loss function->In the present example, ++>，/>；

Finally, obtaining a noise-containing picture to be corrected, correcting the noise-containing picture to be corrected by using the optimized double-branch multi-label correction neural network model, obtaining a correction label of the noise-containing picture to be corrected, and carrying out image recognition on the noise-containing picture to be corrected according to the correction label;

according to the method and the device, related pictures can be collected from the Internet as data sets according to specific application of a user, the dual-branch network is trained, a model supporting classification of multi-label pictures is constructed, label correction can be carried out on the multi-label noisy data sets, the cost of manpower and material resources is saved, and efficient utilization of data resources is realized; the invention also provides a contrast learning method, which can learn some common characterizations from each other while the difference exists in the branch networks, and average the prediction of the model when classifying the pictures, so that the result is more robust; in addition, the invention prescribes the upper and lower bounds according to the predicted value of the training picture, and changes the label of the picture with the predicted value exceeding or being lower than the threshold value, thereby achieving the effect of weakening noise and avoiding the overfitting to the noise.

The same or similar reference numerals correspond to the same or similar components;

the terms describing the positional relationship in the drawings are merely illustrative, and are not to be construed as limiting the present patent;

it is to be understood that the above examples of the present invention are provided by way of illustration only and not by way of limitation of the embodiments of the present invention. Other variations or modifications of the above teachings will be apparent to those of ordinary skill in the art. It is not necessary here nor is it exhaustive of all embodiments. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the invention are desired to be protected by the following claims.

Claims

1. The multi-label image recognition method based on deep learning under noisy data is characterized by comprising the following steps:

dividing the obtained multi-label noisy data set into a training set and a verification set, wherein the training set comprises N pictures, and each picture is marked with a pseudo label The training set is marked as X; dividing the training set into two first sub-training sets D with the same number of pictures ¹ And a second sub-training set D ², wherein ,/>，，/>，/>Representing the i picture->And its corresponding pseudo tag->；

S3.3: will first featureAnd second feature->Common input of first tag modifier sub-model M ¹ Is to add the third feature +.>And fourth feature->Common input of a second tag modifier sub-model M ² Is to picture +.>Is>And third feature->Performing first contrast learning, and performing +.>Second feature->And fourth feature->Performing first contrast learning, and setting a first loss function +.>Correction of the first label sub-model M ¹ And a second label modifier model M ² The instance comparison learning module of (a) performs parameter updating;

s3.6: picture is madeIs input into a first label modifier sub-model M ¹ The label correction module of (1) for picture->Pseudo tag of->Performing label correction to obtain picture->Is->The method comprises the steps of carrying out a first treatment on the surface of the Picture->Is input into a second label modifier sub-model M ² The label correction module of (1) for picture->Pseudo tag of->Performing label correction to obtain picture->Is->The method comprises the steps of carrying out a first treatment on the surface of the And sets a third loss function->Respectively calculating a first label correction sub-model M ¹ And a second label correction sub-model M ² The cross entropy loss of the label correction module of the (2) is used for carrying out parameter updating;

s3.7: according to a first loss functionSecond loss function->And a third loss function->Setting a total loss functionParameter updating is carried out on the double-branch multi-label correction neural network model, and an optimized double-branch multi-label correction neural network model is obtained;

2. The method for identifying multi-label images under noisy data based on deep learning according to claim 1, wherein the pseudo labels of the pictures in each sub-training set are determinedThe specific method of the values of (2) is as follows:

3. The method for identifying the multi-label image under the noisy data based on the deep learning according to claim 2, wherein the specific method of the step S3.3 is as follows:

the obtained first feature vectorSatisfy->Second feature vector->Satisfy the following requirements；

according to the first cycle sequenceColumn ofConstruction of the first negative sample pair +.>Performing first contrast learning by using the constructed first positive sample pair and the first negative sample pair;

wherein ,modifying the submodel M for the first tag ¹ In the example contrast learning module of (1), for pictures +.>Is a first loss function value,/ >For picture->Total number of categories required for multi-tag classification, +.>For picture->Corresponding->Category (S),>is a temperature coefficient>For picture->Is>The 1 st eigenvector after dimension reduction, < ->For picture->Is>The 2 nd feature vector after dimension reduction; />For picture->Classifying the value of the pseudo tag of class k relative to the multi-tag;

According to the third feature vectorAnd fourth feature vector->Constructing a second positive sample pairAnd is combined withConstruction of the second circulation sequence->Satisfy->，R ₂ For the second cycle sequence->Is a sequence length of (2);

wherein ,modifying the submodel M for the second label ² In the example contrast learning module of (1), for pictures +.>Is a first loss function value,/>For picture->Total number of categories required for multi-tag classification, +.>For picture->Corresponding->Category (S),>for picture->Is>The 2 nd eigenvector after dimension reduction, < ->For picture->Is>The 1 st feature vector after dimension reduction;for picture->Relative multi-tag class->Is a pseudo tag value of (a).

4. The method for identifying multi-label image under noisy data based on deep learning according to claim 3, wherein the specific method in step S3.4 is as follows:

will first featureInputting a first label modifier model M ¹ The category prototype comparison learning module of (1) compares the pictures +.>Is>Prototype feature +.>Performing a second contrast learning, and updating the first class prototype feature by using a momentum method>：

wherein ,modifying the submodel M for the first tag ¹ In the category prototype comparison learning module of (1), for pictures +.>A second loss function value of (2);

5. The method for identifying multi-label image under noisy data based on deep learning according to claim 4, wherein the specific method in step S3.5 is as follows:

wherein ,for picture->Classification probability of->For sigmoid function, +. >Calculating a function for the confidence score of the classifier;

6. The method for identifying multi-label image under noisy data based on deep learning according to claim 5, wherein the specific method in step S3.6 is as follows:

When noise labelWhen the picture is reserved->Pseudo tag of->As picture- >Is->；

according to the updated first threshold valueAnd a second thresholdValue->Get pictures->Intermediate label->；

The third loss functionThe method comprises the following steps:

7. The method for identifying multiple tag images under noisy data according to claim 6, wherein the total loss function in step S3.7The method comprises the following steps:

wherein ,for the total loss function value->For the first loss function->Balance factor of->For the second loss function->Is a balance factor of (a).