CN115496948A - Network supervision fine-grained image identification method and system based on deep learning - Google Patents
Network supervision fine-grained image identification method and system based on deep learning Download PDFInfo
- Publication number
- CN115496948A CN115496948A CN202211167812.6A CN202211167812A CN115496948A CN 115496948 A CN115496948 A CN 115496948A CN 202211167812 A CN202211167812 A CN 202211167812A CN 115496948 A CN115496948 A CN 115496948A
- Authority
- CN
- China
- Prior art keywords
- graph
- feature
- noise label
- image
- noise
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 88
- 238000013135 deep learning Methods 0.000 title claims abstract description 31
- 238000003062 neural network model Methods 0.000 claims abstract description 59
- 238000010586 diagram Methods 0.000 claims description 42
- 239000011159 matrix material Substances 0.000 claims description 37
- 230000004044 response Effects 0.000 claims description 32
- 230000006870 function Effects 0.000 claims description 22
- 230000002776 aggregation Effects 0.000 claims description 19
- 238000004220 aggregation Methods 0.000 claims description 19
- 230000009467 reduction Effects 0.000 claims description 19
- 238000000605 extraction Methods 0.000 claims description 16
- 239000013598 vector Substances 0.000 claims description 15
- 238000012549 training Methods 0.000 claims description 14
- 238000001914 filtration Methods 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000011176 pooling Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 abstract description 6
- 238000013527 convolutional neural network Methods 0.000 description 5
- 238000010276 construction Methods 0.000 description 4
- 238000002372 labelling Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000002411 adverse Effects 0.000 description 1
- 230000004931 aggregating effect Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000006116 polymerization reaction Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/30—Noise filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a network supervision fine-grained image recognition method and system based on deep learning, which are characterized in that an example graph containing noise label features is obtained by carrying out feature processing on an input image containing a noise label, a graph prototype is constructed for each category by utilizing the example graph containing the label, a preset graph matching neural network model is trained by utilizing the obtained example graph containing the noise label features and the graph prototype, and the optimized graph matching neural network model is utilized to recognize fine-grained images; the method identifies the fine-grained image based on deep learning, and by introducing the image prototype and the example image containing the noise label characteristics for comparison learning, the noise label can be effectively corrected and the outlier sample can be eliminated, so that the efficiency and the accuracy of identifying the fine-grained image are obviously improved.
Description
Technical Field
The invention relates to the technical field of image recognition, in particular to a network supervision fine-grained image recognition method and system based on deep learning.
Background
Fine-grained image recognition, which aims to identify subclasses of a given object class, such as different types of birds, airplanes and automobiles, has important scientific significance and application value in the fields of intelligent construction, internet and the like. In recent years, fine-grained image recognition has been greatly advanced with the development of deep learning.
At present, most algorithms mainly adopt deep learning driven by high-quality data to realize fine-grained image recognition, and rely on large-scale manually labeled data to a great extent, and the difficulty in collecting the data sets and the high cost of data labeling become bottlenecks which restrict the popularization and the promotion of the data sets.
Under the high-speed development of the internet, a large amount of weak label data on the network can be used for relieving the dependence of the current fine-grained image recognition algorithm on manual labeling, namely, the data obtained by network retrieval is used for training a neural network model. However, the data retrieved by the network contains a certain proportion of noise labels, which may adversely affect the training of the model. In addition, the inherent characteristics of small inter-class variance and large intra-class variance in the fine-grained image further improve the identification difficulty.
The prior art at present discloses a distributed label fine-grained image recognition algorithm based on inter-class similarity, which comprises the following steps: extracting a feature representation of an input image using a backbone network; calculating the center loss through the feature representation by using a center loss module and updating the category center; the classification loss module calculates a classification loss (e.g., cross-entropy loss) using the feature representation and a final label distribution, wherein the final label distribution is obtained by calculating a weighted sum of the one-hot label distribution and a distributed label distribution generated by the category center; obtaining a final target loss function by weighting and summing the central loss and the classification loss so as to optimize the whole model; the method in the prior art can relieve the problem of overfitting by reducing the certainty factor of model prediction, effectively learn the discriminative characteristics of fine-grained data, and improve the accuracy of distinguishing different fine-grained data to a certain extent; however, the method in the prior art mainly adopts the deep learning driven by high-quality data to distinguish subordinate categories, relies on large-scale manually labeled image data, has high data collection and labeling cost, often wastes time and labor when performing fine-grained image identification, and has the problem of low efficiency and accuracy.
Disclosure of Invention
The invention provides a network supervision fine-grained image identification method and system based on deep learning, aiming at overcoming the defects of low efficiency and accuracy in fine-grained image identification in the prior art, and the fine-grained image identification can be efficiently and accurately carried out on the image.
In order to solve the technical problems, the technical scheme of the invention is as follows:
a network supervision fine-grained image recognition method based on deep learning comprises the following steps:
s1: acquiring an input image containing a noise label from the Internet;
s2: performing feature extraction on the input image containing the noise label to obtain a region discrimination feature map and an overall feature map;
s3: acquiring an example graph containing noise label features according to the obtained region distinguishing feature graph and the whole feature graph;
s4: constructing a graph prototype for each category according to the obtained example graph containing the noise label characteristics;
s5: inputting the obtained example graph containing the noise label characteristics and the graph prototype into a preset graph matching neural network model for training to obtain an optimized graph matching neural network model;
s6: and acquiring an image to be recognized, after extracting the characteristics of the image to be recognized, recognizing the image to be recognized by using the optimized graph matching neural network model, and acquiring the recognition result of the image to be recognized.
Preferably, in step S2, feature extraction is performed on the input image containing the noise label to obtain an area discrimination feature map and an overall feature map, and the specific method is as follows:
performing feature extraction on the input image containing the noise label by using a feature extractor to obtain an overall feature map; passing the global feature map through a convolutional layer to obtain a mean value filtered global feature map; calculating the average value of each position of the overall characteristic diagram after the average value filtering based on the number of channels to obtain an overall average value characteristic diagram; searching a maximum response value area in the overall mean value characteristic diagram, positioning the coordinate of the maximum response value area, and acquiring an area judgment characteristic diagram according to the coordinate of the maximum response value area.
Preferably, the specific method for searching the maximum response value area in the overall mean value feature map and locating the coordinate of the maximum response value area includes:
searching a maximum response value area in the overall mean value characteristic diagram according to the following formula, and positioning the coordinates of the maximum response value area:
wherein,denotes an overall mean value feature map, f' g Denotes the mean-filtered global feature map, C denotes the number of channels of the mean-filtered global feature map,the method is characterized in that the row and the column corresponding to the area with the maximum response value are searched, and (i, j) represents the coordinate of the area with the maximum response value.
Preferably, in the step S3, an example graph containing noise label features is obtained according to the obtained region discrimination feature map and the overall feature map, and the specific method is as follows:
converting the obtained region distinguishing feature map into the same dimension by a bilinear interpolation method to obtain a region feature map with the same dimension; reducing dimensions of the overall feature map and the regional feature map with the same dimensions by using a global average pooling method to obtain the overall feature map after dimension reduction and the regional feature map after dimension reduction; acquiring an example graph containing noise label features according to the overall feature graph after dimension reduction and the regional feature graph after dimension reduction:
G ins =<V ins ,E ins >
wherein, G ins Example graph, V, representing features containing noise labels ins Representing the set of all feature points in the overall feature map after dimension reduction and the regional feature map after dimension reduction, E ins A adjacency matrix representing the connections between feature points in the example graph containing the noise label features.
Preferably, in step S4, according to the obtained example graph containing the noise label feature, a specific method for constructing a graph prototype includes:
according to the obtained example graph containing the noise label characteristics, constructing a graph prototype with the same structure as that of the example graph containing the noise label characteristics for each category, wherein the graph prototype is updated in a moving average mode:
G k =<V k ,E k >
wherein, G k Graph prototype, V, representing the kth class constructed k Set of all feature points in the prototype of the graph representing the kth class, E k Adjacent matrix, G' k For the updated graph prototype, m is a preset parameter.
Preferably, in step S5, the obtained example graph containing the noise label feature and the graph prototype are input into a preset graph matching neural network model for training, so as to obtain an optimized graph matching neural network model, and the specific method is as follows:
the preset graph matching neural network model comprises an intra-graph propagation layer, a graph aggregation layer, an inter-graph propagation layer and a graph matching layer, and the step of obtaining the optimized graph matching neural network model comprises the following steps;
s5.1: the obtained example graph G containing the noise label characteristics ins And graph original type G k Inputting a propagation layer in the graph, obtaining a first feature matrix and a second feature matrix, and respectively carrying out iterative updating on the first feature matrix and the second feature matrix through graph convolution operation;
s5.2: inputting the first feature matrix and the second feature matrix after iterative updating into the graph aggregation layer for feature combination to obtain an aggregation feature vector;
s5.3: inputting the aggregation characteristic vector into an inter-graph propagation layer for graph convolution operation, and iteratively updating the aggregation characteristic vector to obtain a first characteristic expression f ins And a second characteristic expression Z k ;
S5.4: expressing the first characteristic f ins And a second characteristic expression Z k Input graph matching layer calculation similarity S k According to the similarity S k Calculating graph match penalty
S5.5: correcting the noise label in the example graph containing the noise label characteristic and removing the outlier sample;
s5.6: computing classification cross entropy lossAnd total lossAccording to total lossMatching the graph to nervesAnd optimizing the network model to obtain an optimized graph matching neural network model.
Preferably, in said step S5.4, the first characteristic is expressed as f ins And the second feature expression Z k Calculating similarity S of input graph matching layer k According to the similarity S k Calculating graph match penaltyThe method comprises the following specific steps:
expressing the first characteristic f ins And a second characteristic expression Z k Inputting a graph matching layer to perform graph matching and calculating the similarity S k The method specifically comprises the following steps:
the graph matching layer sets a graph matching loss function according to the similarity S k Calculating a graph matching loss, wherein the graph matching loss function is specifically as follows:
wherein,for graph matching loss, y i Representing the original label, K representing the category of the diagram prototype, K representing the total number of categories of the diagram prototype.
Preferably, in step S5.5, the noise label in the example graph containing the noise label feature is corrected and the outlier sample is removed, and the specific method includes:
the propagation layer in the graph is provided with a classifier, the example graph containing the noise label characteristics is input into the classifier, and the distribution outline of the classifier is obtainedRate p i Calculating the probability d of the distribution of the matching of the graph i According to the classifier distribution probability p i Probability d of distribution of matching with the map i Calculating the total probability q i The method specifically comprises the following steps:
q i =αp i +(1-α)d i
wherein alpha is a preset parameter, and tau is a temperature coefficient;
according to the total probability q i And correcting the noise label in the example graph containing the noise label characteristic by a preset threshold T and removing the outlier sample OOD, wherein the method specifically comprises the following steps:
wherein,is a false label, T is a preset threshold value, when the total probability q i Is greater than T, the total probability q is determined i The category corresponding to the maximum value is used as a pseudo label; when total probability q i When the probability is larger than the class average probability, the original label y is labeled i As a pseudo tag, correcting the noise tag in the example graph containing the noise tag characteristic; in other cases, OOD is used as a pseudo label, OOD represents outlier samples, and outlier samples are removed.
Preferably, in said step S5.6, a classification cross entropy loss is calculatedAnd total lossAccording to total lossOptimizing the graph matching neural network model to obtain the optimized graph matching neural network model, wherein the specific method comprises the following steps of:
the propagation layer in the graph is provided with a classified cross entropy loss function, which specifically comprises the following steps:
wherein,to classify the cross-entropy losses, p ij For the ith example graph containing noise label features to the classifier distribution probability of the jth class,the ith example graph containing the noise label characteristics is relative to the jth category of pseudo labels;
constructing a total loss function according to the classified cross entropy loss function and the graph matching loss function, wherein the total loss function specifically comprises the following steps:
according to total lossAnd optimizing the graph matching neural network model to obtain the optimized graph matching neural network model.
The invention also provides a network supervision fine-grained image recognition system based on deep learning, and the network supervision fine-grained image recognition method based on deep learning is applied, and comprises the following steps:
an image acquisition unit: the method comprises the steps of obtaining an input image containing a noise label from the Internet;
a feature extraction unit: the system is used for extracting the characteristics of the input image containing the noise label to obtain an area discrimination characteristic diagram and an integral characteristic diagram;
example graph generation unit: the method is used for obtaining an example graph containing the noise label characteristics according to the obtained region distinguishing characteristic graph and the whole characteristic graph;
drawing prototype construction unit: the prototype of the graph is constructed for each category according to the acquired example graph containing the noise label characteristic;
a graph matching unit: the graph prototype model is used for inputting the obtained example graph containing the noise label characteristics and the graph prototype into a preset graph matching neural network model for training to obtain an optimized graph matching neural network model;
an image recognition unit: the method is used for obtaining an image to be recognized, recognizing the image to be recognized by utilizing the optimized graph matching neural network model after extracting the characteristics of the image to be recognized, and obtaining the recognition result of the image to be recognized.
Compared with the prior art, the technical scheme of the invention has the beneficial effects that:
the invention provides a network supervision fine-grained image recognition method and system based on deep learning, the method comprises the steps of carrying out feature processing on an input image containing a noise label to obtain an example graph containing noise label features, constructing a corresponding graph prototype for each category by using the example graph containing the noise label features, carrying out training and noise label correction on a preset image matching neural network model by using the obtained example graph containing the noise label features and the graph prototype, and carrying out fine-grained image recognition by using an optimized image matching neural network model; the method identifies the network supervision fine-grained image based on deep learning, and by introducing the image prototype and the example image containing the noise label characteristics for comparison learning, the noise label can be effectively corrected, and the efficiency and the accuracy of fine-grained image identification are obviously improved.
Drawings
Fig. 1 is a flowchart of a network supervised fine-grained image recognition method based on deep learning according to embodiment 1.
Fig. 2 is a schematic diagram of a network supervised fine-grained image recognition method based on deep learning provided in embodiment 2.
Fig. 3 is a structural diagram of a network supervised fine grained image recognition system based on deep learning provided in embodiment 3.
301-image acquisition unit, 302-feature extraction unit, 303-example graph generation unit, 304-graph prototype construction unit, 305-graph matching unit, 306-image identification unit.
Detailed Description
The drawings are for illustrative purposes only and are not to be construed as limiting the patent;
for the purpose of better illustrating the embodiments, certain features of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product;
it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
The technical solution of the present invention is further described below with reference to the accompanying drawings and examples.
Example 1
As shown in fig. 1, the present embodiment provides a network supervision fine-grained image recognition method based on deep learning, including the following steps:
s1: acquiring an input image containing a noise label from the Internet;
s2: performing feature extraction on the input image containing the noise label to obtain a region discrimination feature map and an overall feature map;
s3: acquiring an example graph containing noise label features according to the obtained region distinguishing feature graph and the whole feature graph;
s4: constructing a graph prototype for each category according to the obtained example graph containing the noise label characteristics;
s5: inputting the obtained example graph containing the noise label characteristics and the graph prototype into a preset graph matching neural network model for training to obtain an optimized graph matching neural network model;
s6: and acquiring an image to be recognized, extracting the characteristics of the image to be recognized, and recognizing the image to be recognized by using the optimized graph matching neural network model to obtain a recognition result of the image to be recognized.
In the specific implementation process, firstly, an input image containing a noise label is obtained through network retrieval, then, a CNN convolutional neural network is used for carrying out feature extraction on the input image containing the noise label, an area distinguishing feature map and an overall feature map are obtained, then, an instance map containing the noise label features is obtained according to the obtained area distinguishing feature map and the overall feature map, a corresponding graph prototype is constructed for each category according to the instance map containing the noise label features, the obtained instance map containing the noise label features and the graph prototype are input into a preset graph matching neural network model for training, graph matching loss and classified cross entropy loss are calculated for optimizing the neural network, an optimized graph matching neural network model is obtained, and finally, the optimized graph matching neural network model is used for identifying the image to be identified, and the identification result of the image to be identified is obtained;
the method identifies the fine-grained image based on deep learning, and by introducing the image prototype and the example image containing the noise label characteristics for comparison learning, the noise label can be effectively corrected, and the efficiency and the accuracy of fine-grained image identification are obviously improved.
Example 2
As shown in fig. 2, the present embodiment provides a network supervised fine-grained image recognition method based on deep learning, including the following steps:
s1: acquiring an input image containing a noise label from the Internet;
s2: performing feature extraction on the input image containing the noise label to obtain a region discrimination feature map and an overall feature map, wherein the specific method comprises the following steps of:
performing feature extraction on the input image containing the noise label by using a feature extractor to obtain an overall feature map; passing the global feature map through a convolutional layer to obtain a mean value filtered global feature map; calculating the average value of each position of the overall characteristic diagram after the average value filtering based on the number of channels to obtain an overall average value characteristic diagram; searching a maximum response value area in the overall mean value characteristic diagram, positioning the coordinate of the maximum response value area, and acquiring an area judgment characteristic diagram according to the coordinate of the maximum response value area;
the specific method for searching the maximum response value area in the overall mean characteristic diagram and positioning the coordinates of the maximum response value area comprises the following steps:
searching a maximum response value area in the overall mean value characteristic diagram according to the following formula, and positioning the coordinates of the maximum response value area:
wherein,denotes an overall mean value feature map, f' g Denotes the mean-filtered global feature map, C denotes the number of channels of the mean-filtered global feature map,representing the row and column corresponding to the area of searching the maximum response value, (i, j) representing the coordinate of the area of the maximum response value;
s3: acquiring an example graph containing noise label features according to the obtained region discrimination feature graph and the overall feature graph, wherein the specific method comprises the following steps of:
converting the obtained region distinguishing feature map into the same dimension by a bilinear interpolation method to obtain a region feature map with the same dimension; reducing the dimensions of the overall feature map and the regional feature map with the same dimensions by using a global average pooling method to obtain the overall feature map after dimension reduction and the regional feature map after dimension reduction; acquiring an example graph containing noise label features according to the overall feature graph after dimension reduction and the regional feature graph after dimension reduction:
G ins =<V ins ,E ins >
wherein, G ins Example graph, V, representing features containing noise labels ins Representing the set of all feature points in the overall feature map after dimension reduction and the regional feature map after dimension reduction, E ins A adjacency matrix representing connections between feature points in the example graph containing the noise label features;
s4: constructing a graph prototype for each category according to the acquired example graph containing the noise label characteristics, wherein the specific method comprises the following steps:
according to the obtained example graph containing the noise label features, constructing a graph prototype with the same structure as the example graph containing the noise label features for each category, wherein the graph prototype is updated in a moving average mode:
G k =<V k ,E k >
wherein, G k Graph primitive type, V, representing the kth class constructed k Set of all feature points in the prototype of the graph representing the kth class, E k A adjacency matrix, G ', representing connections between feature points in the prototype of the graph of the kth class' k M is a preset parameter for the updated graph prototype;
s5: inputting the obtained example graph containing the noise label characteristics and the graph prototype into a preset graph matching neural network model for training to obtain an optimized graph matching neural network model;
the preset graph matching neural network model comprises an intra-graph propagation layer, a graph aggregation layer, an inter-graph propagation layer and a graph matching layer, and the step of obtaining the optimized graph matching neural network model comprises the following steps;
s5.1: the obtained product is mixed with noiseExample graph G of acoustic signature features ins And graph original form G k Inputting a propagation layer in the graph, obtaining a first feature matrix and a second feature matrix, and performing iterative updating on the first feature matrix and the second feature matrix through graph convolution operation respectively, wherein the iterative updating method specifically comprises the following steps:
the obtained example graph G containing the noise label characteristics ins And graph original form G k Inputting a propagation layer in the graph, and collecting all feature points V in the overall feature graph and the region feature graph after dimension reduction ins Reconstructed into a first feature matrixWherein n is 1 Number of all feature points, c, for example graph containing noise label features 1 The dimension corresponding to each feature point in the example graph containing the noise label features;
set V of all feature points in the graph prototype k Reconstructed into a second feature matrixWherein n is 2 The number of all feature points in the prototype of the graph, c 2 Corresponding dimensions for each feature point in the graph prototype;
respectively performing graph convolution operation on the first feature matrix and the second feature matrix, and iteratively updating the first feature matrix and the second feature matrix, specifically:
wherein,for the updated first feature matrix for the l-th iteration,for the updated second feature matrix for the ith iteration,andparameters of the propagation layer in the graph;
s5.2: inputting the first feature matrix and the second feature matrix after iterative updating into the graph aggregation layer for feature combination to obtain an aggregation feature vector, specifically:
inputting the first feature matrix and the second feature matrix after iterative update into the image aggregation layer for feature combination to obtain an aggregation feature vector, specifically:
wherein,in order to aggregate the feature vectors,for the purpose of the updated first feature matrix,the updated second feature matrix;
s5.3: inputting the aggregation characteristic vector into an inter-graph propagation layer for graph convolution operation, and iteratively updating the aggregation characteristic vector to obtain a first characteristic expression f ins And a second characteristic expression Z k The method specifically comprises the following steps:
inputting the aggregation characteristic vector into an inter-graph propagation layer to perform graph convolution operation, and iteratively updating the aggregation characteristic vector, specifically:
wherein,for the aggregated feature vector updated for the first iteration, E cross Is a contiguous matrix of aggregated feature vectors,andparameters of the inter-diagram propagation layer;
obtaining a first feature expression f according to the polymerization feature vector updated by the first iteration ins And a second characteristic expression Z k ;
S5.4: expressing the first characteristic f ins And the second feature expression Z k Input graph matching layer calculation similarity S k According to the similarity S k Calculating graph match penaltyThe method specifically comprises the following steps:
expressing the first characteristic f ins And a second characteristic expression Z k Inputting a graph matching layer to perform graph matching and calculating the similarity S k The method specifically comprises the following steps:
the graph matching layer sets a graph matching loss function according to the similarity S k Calculating a graph matching loss, wherein the graph matching loss function is specifically as follows:
wherein,for graph matching loss, y i Representing original labels, K representing categories of diagram prototypes, and K representing the total number of categories of diagram prototypes;
s5.5: correcting the noise label in the example graph containing the noise label characteristic and removing the outlier sample, specifically comprising the following steps:
the propagation layer in the graph is provided with a classifier, the example graph containing the noise label characteristics is input into the classifier, and the distribution probability p of the classifier is obtained i Calculating the probability d of the distribution of the matching of the graph i According to the classifier distribution probability p i Probability d of matching distribution of sum graph i Calculating the total probability q i The method specifically comprises the following steps:
q i =αp i +(1-α)d i
wherein alpha is a preset parameter, and tau is a temperature coefficient;
according to the total probability q i And correcting the noise label in the example graph containing the noise label characteristic by a preset threshold T and removing the outlier sample OOD, wherein the method specifically comprises the following steps:
wherein,is a false label, T is a preset threshold value, when the total probability q is i Is greater than T, the total probability q is determined i The category corresponding to the maximum value is used as a pseudo label; when total probability q i When the probability is larger than the class average probability, the original label y is labeled i As a pseudo tag, realize pairCorrecting the noise label in the example graph containing the noise label characteristic; in other cases, OOD is used as a pseudo label, the OOD represents an outlier sample, and the outlier sample is removed;
s5.6: computing categorical cross entropy lossAnd total lossAccording to total lossOptimizing the graph matching neural network model to obtain an optimized graph matching neural network model, which specifically comprises the following steps:
the propagation layer in the graph is provided with a classified cross entropy loss function, which specifically comprises the following steps:
wherein,to classify the cross-entropy losses, p ij For the ith example graph containing noise label features to the probability of classifier distribution of the jth class,the ith example graph containing the noise label characteristics is relative to the jth category of pseudo labels;
constructing a total loss function according to the classified cross entropy loss function and the graph matching loss function, wherein the total loss function specifically comprises the following steps:
according to total lossOptimizing the graph matching neural network model to obtain an optimized graph matching neural network model;
s6: and acquiring an image to be recognized, after extracting the characteristics of the image to be recognized, recognizing the image to be recognized by using the optimized graph matching neural network model, and acquiring the recognition result of the image to be recognized. .
In a specific implementation process, firstly, an input image containing a noise label is obtained through network retrieval, a data set used in the embodiment is WebFG-496, the data set consists of three sub-data sets, namely Web-Bird, web-Aircraft and Web-Car, and the size of the input image containing the noise label is 448 multiplied by 448;
then, setting a convolutional neural network with ResNet50-varian as a backbone CNN, and performing feature extraction on the input image containing the noise label by using a feature extractor to obtain an overall feature map, wherein the dimension of the overall feature map is 14 multiplied by 2048; passing the integral characteristic diagram through a convolution layer to obtain an integral characteristic diagram after mean value filtering; calculating the average value of each position of the overall characteristic diagram after the average value filtering based on the number of channels to obtain an overall average value characteristic diagram;
searching a maximum response value area in the overall mean value characteristic diagram according to the following formula, and positioning the coordinates of the maximum response value area:
wherein,denotes an overall mean value characteristic diagram, f' g Denotes the mean-filtered global feature map, C denotes the number of channels of the mean-filtered global feature map,representing the row and column corresponding to the area of searching the maximum response value, (i, j) representing the coordinate of the area of the maximum response value;
intercepting a plurality of local areas with different sizes from the overall characteristic diagram according to the obtained coordinates of the maximum value response area, and setting three different area sizes S in the embodiment 1 、S 2 、S 3 And three different aspect ratios A 1 、A 2 、A 3 9 combinations are adopted in total, and the overall characteristic diagram is intercepted, wherein the three combinations have different area sizes S 1 、S 2 、S 3 Respectively being one half, one third and two thirds of the area of the whole characteristic diagram, and three different length-width ratios A 1 、A 2 、A 3 1, 0.5 and 2 respectively;
extracting the features of the intercepted local regions with different sizes by using a feature extractor to obtain a region discrimination feature map;
constructing an example graph containing noise label characteristics and a graph prototype corresponding to each category, and respectively inputting the obtained example graph containing the noise label characteristics and the obtained graph prototype into a graph internal propagation layer GCN for graph convolution operation, wherein in the embodiment, the number of output channels is 1024 and 2048 respectively; aggregating the output example graph and graph prototype feature containing the noise label feature to obtain a first feature expression f ins And the second feature expression Z k (ii) a Expressing f according to the first characteristics ins And the second feature expression Z k Respectively calculating graph matching loss and classification cross entropy loss to optimize the graph matching neural network model;
in the present example, α =0.5, τ =0.1, t =0.75, λ pro =1;
Acquiring an image to be recognized from CUB200-2011, FGVC-Aircraft and Stanford Cars as verification data, extracting the characteristics of the image to be recognized, recognizing the image to be recognized by utilizing the optimized image matching neural network model, and obtaining the recognition result of the image to be recognized;
the following table shows a comparison graph of the recognition accuracy of fine-grained images in different methods:
TABLE 1 comparison of recognition accuracy for fine grain images of different methods
Compared with a basic model, the performance of the method in the embodiment is far superior to that of various basic models in three data sets, the backbone network used in the embodiment is ResNet-50, and compared with a single ResNet-50 model, the method in the embodiment is greatly improved in three data sets, and the average identification accuracy is improved by 20.14%; for fair comparison, resNet-50 is used as the backbone network, and as can be seen from FIG. 3, when ResNet-50 is used as the backbone network, the method of the present embodiment achieves the highest average accuracy of 83.53%, and the accuracies on Web-Bird, web-Aircraft and Web-Car are 76.62%, 85.79% and 82.09%, respectively, which are 2.23%, 4.2% and 1.94% higher than the Peer-learning method currently advanced; furthermore, other models such as B-CNN are used as backbone networks, and the comparison result shows that the method of the embodiment can be adapted to different backbone networks, so that the performance is obviously improved in fine-grained image identification;
the method identifies the network supervision fine-grained image based on deep learning, and by introducing the image prototype and the example image containing the noise label characteristics for comparison learning, the noise label can be effectively corrected, and the efficiency and the accuracy of fine-grained image identification are obviously improved.
Example 3
As shown in fig. 3, the present embodiment provides a network supervised fine-grained image recognition system based on deep learning, and applies the network supervised fine-grained image recognition method based on deep learning described in embodiment 1 or 2, including:
the image acquisition unit 301: the system comprises a data processing unit, a data processing unit and a data processing unit, wherein the data processing unit is used for acquiring an input image containing a noise label from the Internet;
the feature extraction unit 302: the system is used for extracting the characteristics of the input image containing the noise label to obtain an area discrimination characteristic diagram and an integral characteristic diagram;
example graph generating unit 303: the method is used for obtaining an example graph containing the noise label characteristics according to the obtained region distinguishing characteristic graph and the whole characteristic graph;
graph primitive type configuration unit 304: the prototype of the graph is constructed for each category according to the acquired example graph containing the noise label characteristic;
the map matching unit 305: the graph prototype model is used for inputting the obtained example graph containing the noise label characteristics and the graph prototype into a preset graph matching neural network model for training to obtain an optimized graph matching neural network model;
the image recognition unit 306: the image recognition method comprises the steps of obtaining an image to be recognized, recognizing the image to be recognized by utilizing the optimized image matching neural network model after extracting the characteristics of the image to be recognized, and obtaining the recognition result of the image to be recognized;
in the specific implementation process, firstly, the image acquisition unit 301 is used for network retrieval to acquire an input image containing a noise label; then, a feature extraction unit 302 is used for extracting features of the input image containing the noise label to obtain a region discrimination feature map and an overall feature map; acquiring an example graph containing noise label features according to the obtained region discrimination feature graph and the overall feature graph by using the example graph generating unit 303; then, according to the obtained example graph containing the noise label characteristics, a graph prototype is constructed for each category by using a graph prototype construction unit 304; then, the graph matching unit 305 is used for inputting the obtained example graph containing the noise label characteristics and the graph prototype into a preset graph matching neural network model for training, and an optimized graph matching neural network model is obtained; finally, the image recognition unit 306 acquires an image to be recognized, and after the characteristics of the image to be recognized are extracted, the optimized image matching neural network model is used for recognizing the image to be recognized, so that the recognition result of the image to be recognized is obtained;
the system identifies the fine-grained image based on deep learning, and by introducing the image prototype and the example image containing the noise label characteristics for comparison learning, the noise label can be effectively corrected, and the efficiency and the accuracy of identifying the fine-grained image are obviously improved.
The same or similar reference numerals correspond to the same or similar parts;
the terms describing positional relationships in the drawings are for illustrative purposes only and are not to be construed as limiting the patent;
it should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. This need not be, nor should it be exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.
Claims (10)
1. A network supervision fine-grained image recognition method based on deep learning is characterized by comprising the following steps:
s1: acquiring an input image containing a noise label from the Internet;
s2: performing feature extraction on the input image containing the noise label to obtain a region discrimination feature map and an overall feature map;
s3: acquiring an example graph containing noise label features according to the obtained region discrimination feature graph and the overall feature graph;
s4: constructing a graph prototype for each category according to the obtained example graph containing the noise label characteristics;
s5: inputting the obtained example graph containing the noise label characteristics and the graph prototype into a preset graph matching neural network model for training to obtain an optimized graph matching neural network model;
s6: and acquiring an image to be recognized, extracting the characteristics of the image to be recognized, and recognizing the image to be recognized by using the optimized graph matching neural network model to obtain a recognition result of the image to be recognized.
2. The method according to claim 1, wherein in step S2, feature extraction is performed on the input image containing the noise label to obtain a region discrimination feature map and an overall feature map, and the specific method is as follows:
performing feature extraction on the input image containing the noise label by using a feature extractor to obtain an overall feature map; passing the integral characteristic diagram through a convolution layer to obtain an integral characteristic diagram after mean value filtering; calculating the average value of each position of the overall characteristic diagram after the average value filtering based on the number of channels to obtain an overall average value characteristic diagram; searching a maximum response value area in the overall mean value characteristic diagram, positioning the coordinate of the maximum response value area, and acquiring an area judgment characteristic diagram according to the coordinate of the maximum response value area.
3. The method for identifying the network supervision fine-grained image based on the deep learning according to claim 2, wherein the specific method for searching the maximum response value area in the overall mean value feature map and locating the coordinate of the maximum response value area comprises the following steps:
searching a maximum response value area in the overall mean value characteristic diagram according to the following formula, and positioning the coordinates of the maximum response value area:
wherein,feature graph representing the overall mean, f g ' denotes a mean-filtered global feature map, C denotes the number of channels of the mean-filtered global feature map,the method is characterized in that the row and the column corresponding to the area with the maximum response value are searched, and (i, j) represents the coordinate of the area with the maximum response value.
4. The method according to claim 3, wherein in step S3, an instance graph containing noise label features is obtained according to the obtained region discrimination feature map and the global feature map, and the specific method is as follows:
converting the obtained region distinguishing feature map into the same dimension by a bilinear interpolation method to obtain a region feature map with the same dimension; reducing dimensions of the overall feature map and the regional feature map with the same dimensions by using a global average pooling method to obtain the overall feature map after dimension reduction and the regional feature map after dimension reduction; acquiring an example graph containing noise label features according to the overall feature graph after dimensionality reduction and the regional feature graph after dimensionality reduction:
G ins =<V ins ,E ins >
wherein, G ins Example graph, V, representing features containing noise labels ins Representing the set of all feature points in the overall feature map after dimension reduction and the regional feature map after dimension reduction, E ins A adjacency matrix representing the connections between feature points in the example graph containing the noise label features.
5. The method according to claim 4, wherein in step S4, according to the obtained example graph containing the noise label feature, a concrete method for constructing a graph prototype comprises:
according to the obtained example graph containing the noise label features, constructing a graph prototype with the same structure as the example graph containing the noise label features for each category, wherein the graph prototype is updated in a moving average mode:
G k =<V k ,E k >
wherein G is k Graph primitive type, V, representing the kth class constructed k Set of all feature points in the prototype of the graph representing the kth class, E k Adjacent matrix, G' k For the updated graph prototype, m is a preset parameter.
6. The method for identifying network supervision fine-grained images based on deep learning according to claim 5, wherein in the step S5, the obtained example graph containing the noise label features and the graph primitive type are input into a preset graph matching neural network model for training to obtain an optimized graph matching neural network model, and the specific method is as follows:
the preset graph matching neural network model comprises an intra-graph propagation layer, a graph aggregation layer, an inter-graph propagation layer and a graph matching layer, and the step of obtaining the optimized graph matching neural network model comprises the following steps;
s5.1: the obtained example graph G containing the noise label characteristics ins And graph original form G k Inputting a propagation layer in the graph, obtaining a first characteristic matrix and a second characteristic matrix, and respectively carrying out iterative updating on the first characteristic matrix and the second characteristic matrix through graph convolution operation;
s5.2: inputting the first feature matrix and the second feature matrix after iterative updating into the graph aggregation layer for feature combination to obtain an aggregation feature vector;
s5.3: inputting the aggregation characteristic vector into an inter-graph propagation layer for graph convolution operation, and iteratively updating the aggregation characteristicEigenvector to obtain a first feature expression f ins And a second characteristic expression Z k ;
S5.4: expressing the first characteristic f ins And a second characteristic expression Z k Input graph matching layer calculation similarity S k According to the similarity S k Calculating graph match penalty
S5.5: correcting the noise labels in the example graph containing the noise label characteristics and removing outlier samples;
7. The method as claimed in claim 6, wherein in step S5.4, the first feature is expressed as f ins And the second feature expression Z k Calculating similarity S of input graph matching layer k According to the similarity S k Calculating graph match penaltyThe method specifically comprises the following steps:
expressing the first characteristic f ins And a second characteristic expression Z k Inputting a graph matching layer to perform graph matching and calculating the similarity S k The method specifically comprises the following steps:
the graph matching layer sets a graph matching loss function according to the similarity S k Calculating a graph matching loss, wherein the graph matching loss function is specifically as follows:
8. The method for identifying network supervision fine-grained images based on deep learning according to claim 7, wherein in step S5.5, the noise labels in the example graph containing the noise label features are corrected and outlier samples are eliminated, and the specific method is as follows:
the propagation layer in the graph is provided with a classifier, the example graph containing the noise label characteristics is input into the classifier, and the distribution probability p of the classifier is obtained i Calculating the probability d of the distribution of the matching of the graph i According to the classifier distribution probability p i Probability d of distribution of matching with the map i Calculating the total probability q i The method specifically comprises the following steps:
q i =αp i +(1-α)d i
wherein alpha is a preset parameter, and tau is a temperature coefficient;
according to the total probability q i And correcting the noise label in the example graph containing the noise label characteristic by a preset threshold T and removing the outlier sample OOD, wherein the method specifically comprises the following steps:
wherein,is a false label, T is a preset threshold value, when the total probability q i Is greater than T, the total probability q is determined i The category corresponding to the maximum value is used as a pseudo label; when total probability q i When the probability is larger than the class average probability, the original label y is labeled i As a pseudo tag, correcting the noise tag in the example graph containing the noise tag characteristic; in other cases, OOD is used as a pseudo label, OOD represents outlier samples, and outlier samples are removed.
9. The method for identifying network supervision fine-grained images based on deep learning as claimed in claim 8, wherein in the step S5.6, classification cross entropy loss is calculatedAnd total lossAccording to total lossOptimizing the graph matching neural network model to obtain the optimized graph matching neural network model, wherein the specific method comprises the following steps of:
the propagation layer in the graph is provided with a classified cross entropy loss function, which specifically comprises the following steps:
wherein,to classify cross entropy losses, p ij For the ith example graph containing noise label features to the probability of classifier distribution of the jth class,the ith example graph containing the noise label characteristics is relative to the jth category of pseudo labels;
constructing a total loss function according to the classified cross entropy loss function and the graph matching loss function, wherein the total loss function specifically comprises the following steps:
10. A network supervision fine-grained image recognition system based on deep learning, which applies the network supervision fine-grained image recognition method based on deep learning in any one of claims 1 to 9, and is characterized by comprising the following steps:
an image acquisition unit: the method comprises the steps of obtaining an input image containing a noise label from the Internet;
a feature extraction unit: the system is used for extracting the characteristics of the input image containing the noise label to obtain an area discrimination characteristic diagram and an integral characteristic diagram;
example graph generation unit: the method is used for obtaining an example graph containing the noise label characteristics according to the obtained region distinguishing characteristic graph and the whole characteristic graph;
the figure prototype structure unit: the prototype of the graph is constructed for each category according to the acquired example graph containing the noise label characteristic;
a graph matching unit: the graph prototype model is used for inputting the obtained example graph containing the noise label characteristics and the graph prototype into a preset graph matching neural network model for training to obtain an optimized graph matching neural network model;
an image recognition unit: the method is used for obtaining an image to be recognized, recognizing the image to be recognized by utilizing the optimized graph matching neural network model after extracting the characteristics of the image to be recognized, and obtaining the recognition result of the image to be recognized.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211167812.6A CN115496948A (en) | 2022-09-23 | 2022-09-23 | Network supervision fine-grained image identification method and system based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211167812.6A CN115496948A (en) | 2022-09-23 | 2022-09-23 | Network supervision fine-grained image identification method and system based on deep learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115496948A true CN115496948A (en) | 2022-12-20 |
Family
ID=84470196
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211167812.6A Pending CN115496948A (en) | 2022-09-23 | 2022-09-23 | Network supervision fine-grained image identification method and system based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115496948A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116012569A (en) * | 2023-03-24 | 2023-04-25 | 广东工业大学 | Multi-label image recognition method based on deep learning and under noisy data |
-
2022
- 2022-09-23 CN CN202211167812.6A patent/CN115496948A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116012569A (en) * | 2023-03-24 | 2023-04-25 | 广东工业大学 | Multi-label image recognition method based on deep learning and under noisy data |
CN116012569B (en) * | 2023-03-24 | 2023-08-15 | 广东工业大学 | Multi-label image recognition method based on deep learning and under noisy data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111881714B (en) | Unsupervised cross-domain pedestrian re-identification method | |
CN109919108B (en) | Remote sensing image rapid target detection method based on deep hash auxiliary network | |
CN111310861B (en) | License plate recognition and positioning method based on deep neural network | |
CN114067160B (en) | Small sample remote sensing image scene classification method based on embedded smooth graph neural network | |
CN113378632B (en) | Pseudo-label optimization-based unsupervised domain adaptive pedestrian re-identification method | |
CN112418117B (en) | Small target detection method based on unmanned aerial vehicle image | |
CN111797779A (en) | Remote sensing image semantic segmentation method based on regional attention multi-scale feature fusion | |
CN112308158A (en) | Multi-source field self-adaptive model and method based on partial feature alignment | |
CN110321967B (en) | Image classification improvement method based on convolutional neural network | |
CN111696101A (en) | Light-weight solanaceae disease identification method based on SE-Inception | |
CN110097060B (en) | Open set identification method for trunk image | |
CN113705641B (en) | Hyperspectral image classification method based on rich context network | |
CN111612017A (en) | Target detection method based on information enhancement | |
CN114842264B (en) | Hyperspectral image classification method based on multi-scale spatial spectrum feature joint learning | |
CN113032613B (en) | Three-dimensional model retrieval method based on interactive attention convolution neural network | |
CN115410088B (en) | Hyperspectral image field self-adaption method based on virtual classifier | |
CN111898621A (en) | Outline shape recognition method | |
CN114329031B (en) | Fine-granularity bird image retrieval method based on graph neural network and deep hash | |
CN112347284A (en) | Combined trademark image retrieval method | |
CN113947725B (en) | Hyperspectral image classification method based on convolution width migration network | |
CN111832580B (en) | SAR target recognition method combining less sample learning and target attribute characteristics | |
CN116704490B (en) | License plate recognition method, license plate recognition device and computer equipment | |
CN112784754A (en) | Vehicle re-identification method, device, equipment and storage medium | |
CN115393631A (en) | Hyperspectral image classification method based on Bayesian layer graph convolution neural network | |
CN115457332A (en) | Image multi-label classification method based on graph convolution neural network and class activation mapping |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |