CN115496948A - Network supervision fine-grained image identification method and system based on deep learning - Google Patents

Network supervision fine-grained image identification method and system based on deep learning Download PDF

Info

Publication number
CN115496948A
CN115496948A CN202211167812.6A CN202211167812A CN115496948A CN 115496948 A CN115496948 A CN 115496948A CN 202211167812 A CN202211167812 A CN 202211167812A CN 115496948 A CN115496948 A CN 115496948A
Authority
CN
China
Prior art keywords
graph
feature
noise label
image
noise
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211167812.6A
Other languages
Chinese (zh)
Inventor
林坚满
陈添水
林坚涛
杨志景
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN202211167812.6A priority Critical patent/CN115496948A/en
Publication of CN115496948A publication Critical patent/CN115496948A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/30Noise filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a network supervision fine-grained image recognition method and system based on deep learning, which are characterized in that an example graph containing noise label features is obtained by carrying out feature processing on an input image containing a noise label, a graph prototype is constructed for each category by utilizing the example graph containing the label, a preset graph matching neural network model is trained by utilizing the obtained example graph containing the noise label features and the graph prototype, and the optimized graph matching neural network model is utilized to recognize fine-grained images; the method identifies the fine-grained image based on deep learning, and by introducing the image prototype and the example image containing the noise label characteristics for comparison learning, the noise label can be effectively corrected and the outlier sample can be eliminated, so that the efficiency and the accuracy of identifying the fine-grained image are obviously improved.

Description

Network supervision fine-grained image identification method and system based on deep learning
Technical Field
The invention relates to the technical field of image recognition, in particular to a network supervision fine-grained image recognition method and system based on deep learning.
Background
Fine-grained image recognition, which aims to identify subclasses of a given object class, such as different types of birds, airplanes and automobiles, has important scientific significance and application value in the fields of intelligent construction, internet and the like. In recent years, fine-grained image recognition has been greatly advanced with the development of deep learning.
At present, most algorithms mainly adopt deep learning driven by high-quality data to realize fine-grained image recognition, and rely on large-scale manually labeled data to a great extent, and the difficulty in collecting the data sets and the high cost of data labeling become bottlenecks which restrict the popularization and the promotion of the data sets.
Under the high-speed development of the internet, a large amount of weak label data on the network can be used for relieving the dependence of the current fine-grained image recognition algorithm on manual labeling, namely, the data obtained by network retrieval is used for training a neural network model. However, the data retrieved by the network contains a certain proportion of noise labels, which may adversely affect the training of the model. In addition, the inherent characteristics of small inter-class variance and large intra-class variance in the fine-grained image further improve the identification difficulty.
The prior art at present discloses a distributed label fine-grained image recognition algorithm based on inter-class similarity, which comprises the following steps: extracting a feature representation of an input image using a backbone network; calculating the center loss through the feature representation by using a center loss module and updating the category center; the classification loss module calculates a classification loss (e.g., cross-entropy loss) using the feature representation and a final label distribution, wherein the final label distribution is obtained by calculating a weighted sum of the one-hot label distribution and a distributed label distribution generated by the category center; obtaining a final target loss function by weighting and summing the central loss and the classification loss so as to optimize the whole model; the method in the prior art can relieve the problem of overfitting by reducing the certainty factor of model prediction, effectively learn the discriminative characteristics of fine-grained data, and improve the accuracy of distinguishing different fine-grained data to a certain extent; however, the method in the prior art mainly adopts the deep learning driven by high-quality data to distinguish subordinate categories, relies on large-scale manually labeled image data, has high data collection and labeling cost, often wastes time and labor when performing fine-grained image identification, and has the problem of low efficiency and accuracy.
Disclosure of Invention
The invention provides a network supervision fine-grained image identification method and system based on deep learning, aiming at overcoming the defects of low efficiency and accuracy in fine-grained image identification in the prior art, and the fine-grained image identification can be efficiently and accurately carried out on the image.
In order to solve the technical problems, the technical scheme of the invention is as follows:
a network supervision fine-grained image recognition method based on deep learning comprises the following steps:
s1: acquiring an input image containing a noise label from the Internet;
s2: performing feature extraction on the input image containing the noise label to obtain a region discrimination feature map and an overall feature map;
s3: acquiring an example graph containing noise label features according to the obtained region distinguishing feature graph and the whole feature graph;
s4: constructing a graph prototype for each category according to the obtained example graph containing the noise label characteristics;
s5: inputting the obtained example graph containing the noise label characteristics and the graph prototype into a preset graph matching neural network model for training to obtain an optimized graph matching neural network model;
s6: and acquiring an image to be recognized, after extracting the characteristics of the image to be recognized, recognizing the image to be recognized by using the optimized graph matching neural network model, and acquiring the recognition result of the image to be recognized.
Preferably, in step S2, feature extraction is performed on the input image containing the noise label to obtain an area discrimination feature map and an overall feature map, and the specific method is as follows:
performing feature extraction on the input image containing the noise label by using a feature extractor to obtain an overall feature map; passing the global feature map through a convolutional layer to obtain a mean value filtered global feature map; calculating the average value of each position of the overall characteristic diagram after the average value filtering based on the number of channels to obtain an overall average value characteristic diagram; searching a maximum response value area in the overall mean value characteristic diagram, positioning the coordinate of the maximum response value area, and acquiring an area judgment characteristic diagram according to the coordinate of the maximum response value area.
Preferably, the specific method for searching the maximum response value area in the overall mean value feature map and locating the coordinate of the maximum response value area includes:
searching a maximum response value area in the overall mean value characteristic diagram according to the following formula, and positioning the coordinates of the maximum response value area:
Figure BDA0003862349850000021
Figure BDA0003862349850000022
wherein,
Figure BDA0003862349850000023
denotes an overall mean value feature map, f' g Denotes the mean-filtered global feature map, C denotes the number of channels of the mean-filtered global feature map,
Figure BDA0003862349850000031
the method is characterized in that the row and the column corresponding to the area with the maximum response value are searched, and (i, j) represents the coordinate of the area with the maximum response value.
Preferably, in the step S3, an example graph containing noise label features is obtained according to the obtained region discrimination feature map and the overall feature map, and the specific method is as follows:
converting the obtained region distinguishing feature map into the same dimension by a bilinear interpolation method to obtain a region feature map with the same dimension; reducing dimensions of the overall feature map and the regional feature map with the same dimensions by using a global average pooling method to obtain the overall feature map after dimension reduction and the regional feature map after dimension reduction; acquiring an example graph containing noise label features according to the overall feature graph after dimension reduction and the regional feature graph after dimension reduction:
G ins =<V ins ,E ins >
wherein, G ins Example graph, V, representing features containing noise labels ins Representing the set of all feature points in the overall feature map after dimension reduction and the regional feature map after dimension reduction, E ins A adjacency matrix representing the connections between feature points in the example graph containing the noise label features.
Preferably, in step S4, according to the obtained example graph containing the noise label feature, a specific method for constructing a graph prototype includes:
according to the obtained example graph containing the noise label characteristics, constructing a graph prototype with the same structure as that of the example graph containing the noise label characteristics for each category, wherein the graph prototype is updated in a moving average mode:
G k =<V k ,E k >
Figure BDA0003862349850000032
wherein, G k Graph prototype, V, representing the kth class constructed k Set of all feature points in the prototype of the graph representing the kth class, E k Adjacent matrix, G' k For the updated graph prototype, m is a preset parameter.
Preferably, in step S5, the obtained example graph containing the noise label feature and the graph prototype are input into a preset graph matching neural network model for training, so as to obtain an optimized graph matching neural network model, and the specific method is as follows:
the preset graph matching neural network model comprises an intra-graph propagation layer, a graph aggregation layer, an inter-graph propagation layer and a graph matching layer, and the step of obtaining the optimized graph matching neural network model comprises the following steps;
s5.1: the obtained example graph G containing the noise label characteristics ins And graph original type G k Inputting a propagation layer in the graph, obtaining a first feature matrix and a second feature matrix, and respectively carrying out iterative updating on the first feature matrix and the second feature matrix through graph convolution operation;
s5.2: inputting the first feature matrix and the second feature matrix after iterative updating into the graph aggregation layer for feature combination to obtain an aggregation feature vector;
s5.3: inputting the aggregation characteristic vector into an inter-graph propagation layer for graph convolution operation, and iteratively updating the aggregation characteristic vector to obtain a first characteristic expression f ins And a second characteristic expression Z k
S5.4: expressing the first characteristic f ins And a second characteristic expression Z k Input graph matching layer calculation similarity S k According to the similarity S k Calculating graph match penalty
Figure BDA0003862349850000044
S5.5: correcting the noise label in the example graph containing the noise label characteristic and removing the outlier sample;
s5.6: computing classification cross entropy loss
Figure BDA0003862349850000045
And total loss
Figure BDA0003862349850000046
According to total loss
Figure BDA0003862349850000047
Matching the graph to nervesAnd optimizing the network model to obtain an optimized graph matching neural network model.
Preferably, in said step S5.4, the first characteristic is expressed as f ins And the second feature expression Z k Calculating similarity S of input graph matching layer k According to the similarity S k Calculating graph match penalty
Figure BDA0003862349850000048
The method comprises the following specific steps:
expressing the first characteristic f ins And a second characteristic expression Z k Inputting a graph matching layer to perform graph matching and calculating the similarity S k The method specifically comprises the following steps:
Figure BDA0003862349850000041
the graph matching layer sets a graph matching loss function according to the similarity S k Calculating a graph matching loss, wherein the graph matching loss function is specifically as follows:
Figure BDA0003862349850000042
Figure BDA0003862349850000043
wherein,
Figure BDA0003862349850000049
for graph matching loss, y i Representing the original label, K representing the category of the diagram prototype, K representing the total number of categories of the diagram prototype.
Preferably, in step S5.5, the noise label in the example graph containing the noise label feature is corrected and the outlier sample is removed, and the specific method includes:
the propagation layer in the graph is provided with a classifier, the example graph containing the noise label characteristics is input into the classifier, and the distribution outline of the classifier is obtainedRate p i Calculating the probability d of the distribution of the matching of the graph i According to the classifier distribution probability p i Probability d of distribution of matching with the map i Calculating the total probability q i The method specifically comprises the following steps:
q i =αp i +(1-α)d i
Figure BDA0003862349850000051
wherein alpha is a preset parameter, and tau is a temperature coefficient;
according to the total probability q i And correcting the noise label in the example graph containing the noise label characteristic by a preset threshold T and removing the outlier sample OOD, wherein the method specifically comprises the following steps:
Figure BDA0003862349850000052
wherein,
Figure BDA00038623498500000512
is a false label, T is a preset threshold value, when the total probability q i Is greater than T, the total probability q is determined i The category corresponding to the maximum value is used as a pseudo label; when total probability q i When the probability is larger than the class average probability, the original label y is labeled i As a pseudo tag, correcting the noise tag in the example graph containing the noise tag characteristic; in other cases, OOD is used as a pseudo label, OOD represents outlier samples, and outlier samples are removed.
Preferably, in said step S5.6, a classification cross entropy loss is calculated
Figure BDA0003862349850000055
And total loss
Figure BDA0003862349850000056
According to total loss
Figure BDA0003862349850000057
Optimizing the graph matching neural network model to obtain the optimized graph matching neural network model, wherein the specific method comprises the following steps of:
the propagation layer in the graph is provided with a classified cross entropy loss function, which specifically comprises the following steps:
Figure BDA0003862349850000053
wherein,
Figure BDA0003862349850000058
to classify the cross-entropy losses, p ij For the ith example graph containing noise label features to the classifier distribution probability of the jth class,
Figure BDA0003862349850000059
the ith example graph containing the noise label characteristics is relative to the jth category of pseudo labels;
constructing a total loss function according to the classified cross entropy loss function and the graph matching loss function, wherein the total loss function specifically comprises the following steps:
Figure BDA0003862349850000054
wherein,
Figure BDA00038623498500000510
for total loss, λ pro Is a proportionality coefficient;
according to total loss
Figure BDA00038623498500000511
And optimizing the graph matching neural network model to obtain the optimized graph matching neural network model.
The invention also provides a network supervision fine-grained image recognition system based on deep learning, and the network supervision fine-grained image recognition method based on deep learning is applied, and comprises the following steps:
an image acquisition unit: the method comprises the steps of obtaining an input image containing a noise label from the Internet;
a feature extraction unit: the system is used for extracting the characteristics of the input image containing the noise label to obtain an area discrimination characteristic diagram and an integral characteristic diagram;
example graph generation unit: the method is used for obtaining an example graph containing the noise label characteristics according to the obtained region distinguishing characteristic graph and the whole characteristic graph;
drawing prototype construction unit: the prototype of the graph is constructed for each category according to the acquired example graph containing the noise label characteristic;
a graph matching unit: the graph prototype model is used for inputting the obtained example graph containing the noise label characteristics and the graph prototype into a preset graph matching neural network model for training to obtain an optimized graph matching neural network model;
an image recognition unit: the method is used for obtaining an image to be recognized, recognizing the image to be recognized by utilizing the optimized graph matching neural network model after extracting the characteristics of the image to be recognized, and obtaining the recognition result of the image to be recognized.
Compared with the prior art, the technical scheme of the invention has the beneficial effects that:
the invention provides a network supervision fine-grained image recognition method and system based on deep learning, the method comprises the steps of carrying out feature processing on an input image containing a noise label to obtain an example graph containing noise label features, constructing a corresponding graph prototype for each category by using the example graph containing the noise label features, carrying out training and noise label correction on a preset image matching neural network model by using the obtained example graph containing the noise label features and the graph prototype, and carrying out fine-grained image recognition by using an optimized image matching neural network model; the method identifies the network supervision fine-grained image based on deep learning, and by introducing the image prototype and the example image containing the noise label characteristics for comparison learning, the noise label can be effectively corrected, and the efficiency and the accuracy of fine-grained image identification are obviously improved.
Drawings
Fig. 1 is a flowchart of a network supervised fine-grained image recognition method based on deep learning according to embodiment 1.
Fig. 2 is a schematic diagram of a network supervised fine-grained image recognition method based on deep learning provided in embodiment 2.
Fig. 3 is a structural diagram of a network supervised fine grained image recognition system based on deep learning provided in embodiment 3.
301-image acquisition unit, 302-feature extraction unit, 303-example graph generation unit, 304-graph prototype construction unit, 305-graph matching unit, 306-image identification unit.
Detailed Description
The drawings are for illustrative purposes only and are not to be construed as limiting the patent;
for the purpose of better illustrating the embodiments, certain features of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product;
it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
The technical solution of the present invention is further described below with reference to the accompanying drawings and examples.
Example 1
As shown in fig. 1, the present embodiment provides a network supervision fine-grained image recognition method based on deep learning, including the following steps:
s1: acquiring an input image containing a noise label from the Internet;
s2: performing feature extraction on the input image containing the noise label to obtain a region discrimination feature map and an overall feature map;
s3: acquiring an example graph containing noise label features according to the obtained region distinguishing feature graph and the whole feature graph;
s4: constructing a graph prototype for each category according to the obtained example graph containing the noise label characteristics;
s5: inputting the obtained example graph containing the noise label characteristics and the graph prototype into a preset graph matching neural network model for training to obtain an optimized graph matching neural network model;
s6: and acquiring an image to be recognized, extracting the characteristics of the image to be recognized, and recognizing the image to be recognized by using the optimized graph matching neural network model to obtain a recognition result of the image to be recognized.
In the specific implementation process, firstly, an input image containing a noise label is obtained through network retrieval, then, a CNN convolutional neural network is used for carrying out feature extraction on the input image containing the noise label, an area distinguishing feature map and an overall feature map are obtained, then, an instance map containing the noise label features is obtained according to the obtained area distinguishing feature map and the overall feature map, a corresponding graph prototype is constructed for each category according to the instance map containing the noise label features, the obtained instance map containing the noise label features and the graph prototype are input into a preset graph matching neural network model for training, graph matching loss and classified cross entropy loss are calculated for optimizing the neural network, an optimized graph matching neural network model is obtained, and finally, the optimized graph matching neural network model is used for identifying the image to be identified, and the identification result of the image to be identified is obtained;
the method identifies the fine-grained image based on deep learning, and by introducing the image prototype and the example image containing the noise label characteristics for comparison learning, the noise label can be effectively corrected, and the efficiency and the accuracy of fine-grained image identification are obviously improved.
Example 2
As shown in fig. 2, the present embodiment provides a network supervised fine-grained image recognition method based on deep learning, including the following steps:
s1: acquiring an input image containing a noise label from the Internet;
s2: performing feature extraction on the input image containing the noise label to obtain a region discrimination feature map and an overall feature map, wherein the specific method comprises the following steps of:
performing feature extraction on the input image containing the noise label by using a feature extractor to obtain an overall feature map; passing the global feature map through a convolutional layer to obtain a mean value filtered global feature map; calculating the average value of each position of the overall characteristic diagram after the average value filtering based on the number of channels to obtain an overall average value characteristic diagram; searching a maximum response value area in the overall mean value characteristic diagram, positioning the coordinate of the maximum response value area, and acquiring an area judgment characteristic diagram according to the coordinate of the maximum response value area;
the specific method for searching the maximum response value area in the overall mean characteristic diagram and positioning the coordinates of the maximum response value area comprises the following steps:
searching a maximum response value area in the overall mean value characteristic diagram according to the following formula, and positioning the coordinates of the maximum response value area:
Figure BDA0003862349850000081
Figure BDA0003862349850000082
wherein,
Figure BDA0003862349850000083
denotes an overall mean value feature map, f' g Denotes the mean-filtered global feature map, C denotes the number of channels of the mean-filtered global feature map,
Figure BDA0003862349850000084
representing the row and column corresponding to the area of searching the maximum response value, (i, j) representing the coordinate of the area of the maximum response value;
s3: acquiring an example graph containing noise label features according to the obtained region discrimination feature graph and the overall feature graph, wherein the specific method comprises the following steps of:
converting the obtained region distinguishing feature map into the same dimension by a bilinear interpolation method to obtain a region feature map with the same dimension; reducing the dimensions of the overall feature map and the regional feature map with the same dimensions by using a global average pooling method to obtain the overall feature map after dimension reduction and the regional feature map after dimension reduction; acquiring an example graph containing noise label features according to the overall feature graph after dimension reduction and the regional feature graph after dimension reduction:
G ins =<V ins ,E ins >
wherein, G ins Example graph, V, representing features containing noise labels ins Representing the set of all feature points in the overall feature map after dimension reduction and the regional feature map after dimension reduction, E ins A adjacency matrix representing connections between feature points in the example graph containing the noise label features;
s4: constructing a graph prototype for each category according to the acquired example graph containing the noise label characteristics, wherein the specific method comprises the following steps:
according to the obtained example graph containing the noise label features, constructing a graph prototype with the same structure as the example graph containing the noise label features for each category, wherein the graph prototype is updated in a moving average mode:
G k =<V k ,E k >
Figure BDA0003862349850000091
wherein, G k Graph primitive type, V, representing the kth class constructed k Set of all feature points in the prototype of the graph representing the kth class, E k A adjacency matrix, G ', representing connections between feature points in the prototype of the graph of the kth class' k M is a preset parameter for the updated graph prototype;
s5: inputting the obtained example graph containing the noise label characteristics and the graph prototype into a preset graph matching neural network model for training to obtain an optimized graph matching neural network model;
the preset graph matching neural network model comprises an intra-graph propagation layer, a graph aggregation layer, an inter-graph propagation layer and a graph matching layer, and the step of obtaining the optimized graph matching neural network model comprises the following steps;
s5.1: the obtained product is mixed with noiseExample graph G of acoustic signature features ins And graph original form G k Inputting a propagation layer in the graph, obtaining a first feature matrix and a second feature matrix, and performing iterative updating on the first feature matrix and the second feature matrix through graph convolution operation respectively, wherein the iterative updating method specifically comprises the following steps:
the obtained example graph G containing the noise label characteristics ins And graph original form G k Inputting a propagation layer in the graph, and collecting all feature points V in the overall feature graph and the region feature graph after dimension reduction ins Reconstructed into a first feature matrix
Figure BDA0003862349850000092
Wherein n is 1 Number of all feature points, c, for example graph containing noise label features 1 The dimension corresponding to each feature point in the example graph containing the noise label features;
set V of all feature points in the graph prototype k Reconstructed into a second feature matrix
Figure BDA0003862349850000093
Wherein n is 2 The number of all feature points in the prototype of the graph, c 2 Corresponding dimensions for each feature point in the graph prototype;
respectively performing graph convolution operation on the first feature matrix and the second feature matrix, and iteratively updating the first feature matrix and the second feature matrix, specifically:
Figure BDA0003862349850000094
Figure BDA0003862349850000095
wherein,
Figure BDA0003862349850000096
for the updated first feature matrix for the l-th iteration,
Figure BDA0003862349850000097
for the updated second feature matrix for the ith iteration,
Figure BDA0003862349850000098
and
Figure BDA0003862349850000099
parameters of the propagation layer in the graph;
s5.2: inputting the first feature matrix and the second feature matrix after iterative updating into the graph aggregation layer for feature combination to obtain an aggregation feature vector, specifically:
inputting the first feature matrix and the second feature matrix after iterative update into the image aggregation layer for feature combination to obtain an aggregation feature vector, specifically:
Figure BDA0003862349850000101
wherein,
Figure BDA0003862349850000102
in order to aggregate the feature vectors,
Figure BDA0003862349850000103
for the purpose of the updated first feature matrix,
Figure BDA0003862349850000104
the updated second feature matrix;
s5.3: inputting the aggregation characteristic vector into an inter-graph propagation layer for graph convolution operation, and iteratively updating the aggregation characteristic vector to obtain a first characteristic expression f ins And a second characteristic expression Z k The method specifically comprises the following steps:
inputting the aggregation characteristic vector into an inter-graph propagation layer to perform graph convolution operation, and iteratively updating the aggregation characteristic vector, specifically:
Figure BDA0003862349850000105
wherein,
Figure BDA0003862349850000106
for the aggregated feature vector updated for the first iteration, E cross Is a contiguous matrix of aggregated feature vectors,
Figure BDA0003862349850000107
and
Figure BDA0003862349850000108
parameters of the inter-diagram propagation layer;
obtaining a first feature expression f according to the polymerization feature vector updated by the first iteration ins And a second characteristic expression Z k
S5.4: expressing the first characteristic f ins And the second feature expression Z k Input graph matching layer calculation similarity S k According to the similarity S k Calculating graph match penalty
Figure BDA00038623498500001012
The method specifically comprises the following steps:
expressing the first characteristic f ins And a second characteristic expression Z k Inputting a graph matching layer to perform graph matching and calculating the similarity S k The method specifically comprises the following steps:
Figure BDA0003862349850000109
the graph matching layer sets a graph matching loss function according to the similarity S k Calculating a graph matching loss, wherein the graph matching loss function is specifically as follows:
Figure BDA00038623498500001010
Figure BDA00038623498500001011
wherein,
Figure BDA00038623498500001013
for graph matching loss, y i Representing original labels, K representing categories of diagram prototypes, and K representing the total number of categories of diagram prototypes;
s5.5: correcting the noise label in the example graph containing the noise label characteristic and removing the outlier sample, specifically comprising the following steps:
the propagation layer in the graph is provided with a classifier, the example graph containing the noise label characteristics is input into the classifier, and the distribution probability p of the classifier is obtained i Calculating the probability d of the distribution of the matching of the graph i According to the classifier distribution probability p i Probability d of matching distribution of sum graph i Calculating the total probability q i The method specifically comprises the following steps:
q i =αp i +(1-α)d i
Figure BDA0003862349850000111
wherein alpha is a preset parameter, and tau is a temperature coefficient;
according to the total probability q i And correcting the noise label in the example graph containing the noise label characteristic by a preset threshold T and removing the outlier sample OOD, wherein the method specifically comprises the following steps:
Figure BDA0003862349850000112
wherein,
Figure BDA0003862349850000115
is a false label, T is a preset threshold value, when the total probability q is i Is greater than T, the total probability q is determined i The category corresponding to the maximum value is used as a pseudo label; when total probability q i When the probability is larger than the class average probability, the original label y is labeled i As a pseudo tag, realize pairCorrecting the noise label in the example graph containing the noise label characteristic; in other cases, OOD is used as a pseudo label, the OOD represents an outlier sample, and the outlier sample is removed;
s5.6: computing categorical cross entropy loss
Figure BDA0003862349850000116
And total loss
Figure BDA0003862349850000117
According to total loss
Figure BDA0003862349850000118
Optimizing the graph matching neural network model to obtain an optimized graph matching neural network model, which specifically comprises the following steps:
the propagation layer in the graph is provided with a classified cross entropy loss function, which specifically comprises the following steps:
Figure BDA0003862349850000113
wherein,
Figure BDA0003862349850000119
to classify the cross-entropy losses, p ij For the ith example graph containing noise label features to the probability of classifier distribution of the jth class,
Figure BDA00038623498500001110
the ith example graph containing the noise label characteristics is relative to the jth category of pseudo labels;
constructing a total loss function according to the classified cross entropy loss function and the graph matching loss function, wherein the total loss function specifically comprises the following steps:
Figure BDA0003862349850000114
wherein,
Figure BDA0003862349850000125
for total loss, λ pro Is a proportionality coefficient;
according to total loss
Figure BDA0003862349850000126
Optimizing the graph matching neural network model to obtain an optimized graph matching neural network model;
s6: and acquiring an image to be recognized, after extracting the characteristics of the image to be recognized, recognizing the image to be recognized by using the optimized graph matching neural network model, and acquiring the recognition result of the image to be recognized. .
In a specific implementation process, firstly, an input image containing a noise label is obtained through network retrieval, a data set used in the embodiment is WebFG-496, the data set consists of three sub-data sets, namely Web-Bird, web-Aircraft and Web-Car, and the size of the input image containing the noise label is 448 multiplied by 448;
then, setting a convolutional neural network with ResNet50-varian as a backbone CNN, and performing feature extraction on the input image containing the noise label by using a feature extractor to obtain an overall feature map, wherein the dimension of the overall feature map is 14 multiplied by 2048; passing the integral characteristic diagram through a convolution layer to obtain an integral characteristic diagram after mean value filtering; calculating the average value of each position of the overall characteristic diagram after the average value filtering based on the number of channels to obtain an overall average value characteristic diagram;
searching a maximum response value area in the overall mean value characteristic diagram according to the following formula, and positioning the coordinates of the maximum response value area:
Figure BDA0003862349850000121
Figure BDA0003862349850000122
wherein,
Figure BDA0003862349850000123
denotes an overall mean value characteristic diagram, f' g Denotes the mean-filtered global feature map, C denotes the number of channels of the mean-filtered global feature map,
Figure BDA0003862349850000124
representing the row and column corresponding to the area of searching the maximum response value, (i, j) representing the coordinate of the area of the maximum response value;
intercepting a plurality of local areas with different sizes from the overall characteristic diagram according to the obtained coordinates of the maximum value response area, and setting three different area sizes S in the embodiment 1 、S 2 、S 3 And three different aspect ratios A 1 、A 2 、A 3 9 combinations are adopted in total, and the overall characteristic diagram is intercepted, wherein the three combinations have different area sizes S 1 、S 2 、S 3 Respectively being one half, one third and two thirds of the area of the whole characteristic diagram, and three different length-width ratios A 1 、A 2 、A 3 1, 0.5 and 2 respectively;
extracting the features of the intercepted local regions with different sizes by using a feature extractor to obtain a region discrimination feature map;
constructing an example graph containing noise label characteristics and a graph prototype corresponding to each category, and respectively inputting the obtained example graph containing the noise label characteristics and the obtained graph prototype into a graph internal propagation layer GCN for graph convolution operation, wherein in the embodiment, the number of output channels is 1024 and 2048 respectively; aggregating the output example graph and graph prototype feature containing the noise label feature to obtain a first feature expression f ins And the second feature expression Z k (ii) a Expressing f according to the first characteristics ins And the second feature expression Z k Respectively calculating graph matching loss and classification cross entropy loss to optimize the graph matching neural network model;
in the present example, α =0.5, τ =0.1, t =0.75, λ pro =1;
Acquiring an image to be recognized from CUB200-2011, FGVC-Aircraft and Stanford Cars as verification data, extracting the characteristics of the image to be recognized, recognizing the image to be recognized by utilizing the optimized image matching neural network model, and obtaining the recognition result of the image to be recognized;
the following table shows a comparison graph of the recognition accuracy of fine-grained images in different methods:
Figure BDA0003862349850000131
TABLE 1 comparison of recognition accuracy for fine grain images of different methods
Compared with a basic model, the performance of the method in the embodiment is far superior to that of various basic models in three data sets, the backbone network used in the embodiment is ResNet-50, and compared with a single ResNet-50 model, the method in the embodiment is greatly improved in three data sets, and the average identification accuracy is improved by 20.14%; for fair comparison, resNet-50 is used as the backbone network, and as can be seen from FIG. 3, when ResNet-50 is used as the backbone network, the method of the present embodiment achieves the highest average accuracy of 83.53%, and the accuracies on Web-Bird, web-Aircraft and Web-Car are 76.62%, 85.79% and 82.09%, respectively, which are 2.23%, 4.2% and 1.94% higher than the Peer-learning method currently advanced; furthermore, other models such as B-CNN are used as backbone networks, and the comparison result shows that the method of the embodiment can be adapted to different backbone networks, so that the performance is obviously improved in fine-grained image identification;
the method identifies the network supervision fine-grained image based on deep learning, and by introducing the image prototype and the example image containing the noise label characteristics for comparison learning, the noise label can be effectively corrected, and the efficiency and the accuracy of fine-grained image identification are obviously improved.
Example 3
As shown in fig. 3, the present embodiment provides a network supervised fine-grained image recognition system based on deep learning, and applies the network supervised fine-grained image recognition method based on deep learning described in embodiment 1 or 2, including:
the image acquisition unit 301: the system comprises a data processing unit, a data processing unit and a data processing unit, wherein the data processing unit is used for acquiring an input image containing a noise label from the Internet;
the feature extraction unit 302: the system is used for extracting the characteristics of the input image containing the noise label to obtain an area discrimination characteristic diagram and an integral characteristic diagram;
example graph generating unit 303: the method is used for obtaining an example graph containing the noise label characteristics according to the obtained region distinguishing characteristic graph and the whole characteristic graph;
graph primitive type configuration unit 304: the prototype of the graph is constructed for each category according to the acquired example graph containing the noise label characteristic;
the map matching unit 305: the graph prototype model is used for inputting the obtained example graph containing the noise label characteristics and the graph prototype into a preset graph matching neural network model for training to obtain an optimized graph matching neural network model;
the image recognition unit 306: the image recognition method comprises the steps of obtaining an image to be recognized, recognizing the image to be recognized by utilizing the optimized image matching neural network model after extracting the characteristics of the image to be recognized, and obtaining the recognition result of the image to be recognized;
in the specific implementation process, firstly, the image acquisition unit 301 is used for network retrieval to acquire an input image containing a noise label; then, a feature extraction unit 302 is used for extracting features of the input image containing the noise label to obtain a region discrimination feature map and an overall feature map; acquiring an example graph containing noise label features according to the obtained region discrimination feature graph and the overall feature graph by using the example graph generating unit 303; then, according to the obtained example graph containing the noise label characteristics, a graph prototype is constructed for each category by using a graph prototype construction unit 304; then, the graph matching unit 305 is used for inputting the obtained example graph containing the noise label characteristics and the graph prototype into a preset graph matching neural network model for training, and an optimized graph matching neural network model is obtained; finally, the image recognition unit 306 acquires an image to be recognized, and after the characteristics of the image to be recognized are extracted, the optimized image matching neural network model is used for recognizing the image to be recognized, so that the recognition result of the image to be recognized is obtained;
the system identifies the fine-grained image based on deep learning, and by introducing the image prototype and the example image containing the noise label characteristics for comparison learning, the noise label can be effectively corrected, and the efficiency and the accuracy of identifying the fine-grained image are obviously improved.
The same or similar reference numerals correspond to the same or similar parts;
the terms describing positional relationships in the drawings are for illustrative purposes only and are not to be construed as limiting the patent;
it should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. This need not be, nor should it be exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims (10)

1. A network supervision fine-grained image recognition method based on deep learning is characterized by comprising the following steps:
s1: acquiring an input image containing a noise label from the Internet;
s2: performing feature extraction on the input image containing the noise label to obtain a region discrimination feature map and an overall feature map;
s3: acquiring an example graph containing noise label features according to the obtained region discrimination feature graph and the overall feature graph;
s4: constructing a graph prototype for each category according to the obtained example graph containing the noise label characteristics;
s5: inputting the obtained example graph containing the noise label characteristics and the graph prototype into a preset graph matching neural network model for training to obtain an optimized graph matching neural network model;
s6: and acquiring an image to be recognized, extracting the characteristics of the image to be recognized, and recognizing the image to be recognized by using the optimized graph matching neural network model to obtain a recognition result of the image to be recognized.
2. The method according to claim 1, wherein in step S2, feature extraction is performed on the input image containing the noise label to obtain a region discrimination feature map and an overall feature map, and the specific method is as follows:
performing feature extraction on the input image containing the noise label by using a feature extractor to obtain an overall feature map; passing the integral characteristic diagram through a convolution layer to obtain an integral characteristic diagram after mean value filtering; calculating the average value of each position of the overall characteristic diagram after the average value filtering based on the number of channels to obtain an overall average value characteristic diagram; searching a maximum response value area in the overall mean value characteristic diagram, positioning the coordinate of the maximum response value area, and acquiring an area judgment characteristic diagram according to the coordinate of the maximum response value area.
3. The method for identifying the network supervision fine-grained image based on the deep learning according to claim 2, wherein the specific method for searching the maximum response value area in the overall mean value feature map and locating the coordinate of the maximum response value area comprises the following steps:
searching a maximum response value area in the overall mean value characteristic diagram according to the following formula, and positioning the coordinates of the maximum response value area:
Figure FDA0003862349840000011
Figure FDA0003862349840000012
wherein,
Figure FDA0003862349840000021
feature graph representing the overall mean, f g ' denotes a mean-filtered global feature map, C denotes the number of channels of the mean-filtered global feature map,
Figure FDA0003862349840000022
the method is characterized in that the row and the column corresponding to the area with the maximum response value are searched, and (i, j) represents the coordinate of the area with the maximum response value.
4. The method according to claim 3, wherein in step S3, an instance graph containing noise label features is obtained according to the obtained region discrimination feature map and the global feature map, and the specific method is as follows:
converting the obtained region distinguishing feature map into the same dimension by a bilinear interpolation method to obtain a region feature map with the same dimension; reducing dimensions of the overall feature map and the regional feature map with the same dimensions by using a global average pooling method to obtain the overall feature map after dimension reduction and the regional feature map after dimension reduction; acquiring an example graph containing noise label features according to the overall feature graph after dimensionality reduction and the regional feature graph after dimensionality reduction:
G ins =<V ins ,E ins >
wherein, G ins Example graph, V, representing features containing noise labels ins Representing the set of all feature points in the overall feature map after dimension reduction and the regional feature map after dimension reduction, E ins A adjacency matrix representing the connections between feature points in the example graph containing the noise label features.
5. The method according to claim 4, wherein in step S4, according to the obtained example graph containing the noise label feature, a concrete method for constructing a graph prototype comprises:
according to the obtained example graph containing the noise label features, constructing a graph prototype with the same structure as the example graph containing the noise label features for each category, wherein the graph prototype is updated in a moving average mode:
G k =<V k ,E k >
Figure FDA0003862349840000023
wherein G is k Graph primitive type, V, representing the kth class constructed k Set of all feature points in the prototype of the graph representing the kth class, E k Adjacent matrix, G' k For the updated graph prototype, m is a preset parameter.
6. The method for identifying network supervision fine-grained images based on deep learning according to claim 5, wherein in the step S5, the obtained example graph containing the noise label features and the graph primitive type are input into a preset graph matching neural network model for training to obtain an optimized graph matching neural network model, and the specific method is as follows:
the preset graph matching neural network model comprises an intra-graph propagation layer, a graph aggregation layer, an inter-graph propagation layer and a graph matching layer, and the step of obtaining the optimized graph matching neural network model comprises the following steps;
s5.1: the obtained example graph G containing the noise label characteristics ins And graph original form G k Inputting a propagation layer in the graph, obtaining a first characteristic matrix and a second characteristic matrix, and respectively carrying out iterative updating on the first characteristic matrix and the second characteristic matrix through graph convolution operation;
s5.2: inputting the first feature matrix and the second feature matrix after iterative updating into the graph aggregation layer for feature combination to obtain an aggregation feature vector;
s5.3: inputting the aggregation characteristic vector into an inter-graph propagation layer for graph convolution operation, and iteratively updating the aggregation characteristicEigenvector to obtain a first feature expression f ins And a second characteristic expression Z k
S5.4: expressing the first characteristic f ins And a second characteristic expression Z k Input graph matching layer calculation similarity S k According to the similarity S k Calculating graph match penalty
Figure FDA0003862349840000031
S5.5: correcting the noise labels in the example graph containing the noise label characteristics and removing outlier samples;
s5.6: computing categorical cross entropy loss
Figure FDA0003862349840000032
And total loss
Figure FDA0003862349840000033
According to total loss
Figure FDA0003862349840000034
And optimizing the graph matching neural network model to obtain the optimized graph matching neural network model.
7. The method as claimed in claim 6, wherein in step S5.4, the first feature is expressed as f ins And the second feature expression Z k Calculating similarity S of input graph matching layer k According to the similarity S k Calculating graph match penalty
Figure FDA0003862349840000035
The method specifically comprises the following steps:
expressing the first characteristic f ins And a second characteristic expression Z k Inputting a graph matching layer to perform graph matching and calculating the similarity S k The method specifically comprises the following steps:
Figure FDA0003862349840000036
the graph matching layer sets a graph matching loss function according to the similarity S k Calculating a graph matching loss, wherein the graph matching loss function is specifically as follows:
Figure FDA0003862349840000037
Figure FDA0003862349840000038
wherein,
Figure FDA0003862349840000039
for graph matching loss, y i Representing the original label, K representing the category of the diagram prototype, K representing the total number of categories of the diagram prototype.
8. The method for identifying network supervision fine-grained images based on deep learning according to claim 7, wherein in step S5.5, the noise labels in the example graph containing the noise label features are corrected and outlier samples are eliminated, and the specific method is as follows:
the propagation layer in the graph is provided with a classifier, the example graph containing the noise label characteristics is input into the classifier, and the distribution probability p of the classifier is obtained i Calculating the probability d of the distribution of the matching of the graph i According to the classifier distribution probability p i Probability d of distribution of matching with the map i Calculating the total probability q i The method specifically comprises the following steps:
q i =αp i +(1-α)d i
Figure FDA0003862349840000041
wherein alpha is a preset parameter, and tau is a temperature coefficient;
according to the total probability q i And correcting the noise label in the example graph containing the noise label characteristic by a preset threshold T and removing the outlier sample OOD, wherein the method specifically comprises the following steps:
Figure FDA0003862349840000042
wherein,
Figure FDA0003862349840000043
is a false label, T is a preset threshold value, when the total probability q i Is greater than T, the total probability q is determined i The category corresponding to the maximum value is used as a pseudo label; when total probability q i When the probability is larger than the class average probability, the original label y is labeled i As a pseudo tag, correcting the noise tag in the example graph containing the noise tag characteristic; in other cases, OOD is used as a pseudo label, OOD represents outlier samples, and outlier samples are removed.
9. The method for identifying network supervision fine-grained images based on deep learning as claimed in claim 8, wherein in the step S5.6, classification cross entropy loss is calculated
Figure FDA0003862349840000044
And total loss
Figure FDA0003862349840000045
According to total loss
Figure FDA0003862349840000046
Optimizing the graph matching neural network model to obtain the optimized graph matching neural network model, wherein the specific method comprises the following steps of:
the propagation layer in the graph is provided with a classified cross entropy loss function, which specifically comprises the following steps:
Figure FDA0003862349840000047
wherein,
Figure FDA0003862349840000048
to classify cross entropy losses, p ij For the ith example graph containing noise label features to the probability of classifier distribution of the jth class,
Figure FDA0003862349840000049
the ith example graph containing the noise label characteristics is relative to the jth category of pseudo labels;
constructing a total loss function according to the classified cross entropy loss function and the graph matching loss function, wherein the total loss function specifically comprises the following steps:
Figure FDA0003862349840000051
wherein,
Figure FDA0003862349840000052
for total loss, λ pro Is a proportionality coefficient;
according to total loss
Figure FDA0003862349840000053
And optimizing the graph matching neural network model to obtain the optimized graph matching neural network model.
10. A network supervision fine-grained image recognition system based on deep learning, which applies the network supervision fine-grained image recognition method based on deep learning in any one of claims 1 to 9, and is characterized by comprising the following steps:
an image acquisition unit: the method comprises the steps of obtaining an input image containing a noise label from the Internet;
a feature extraction unit: the system is used for extracting the characteristics of the input image containing the noise label to obtain an area discrimination characteristic diagram and an integral characteristic diagram;
example graph generation unit: the method is used for obtaining an example graph containing the noise label characteristics according to the obtained region distinguishing characteristic graph and the whole characteristic graph;
the figure prototype structure unit: the prototype of the graph is constructed for each category according to the acquired example graph containing the noise label characteristic;
a graph matching unit: the graph prototype model is used for inputting the obtained example graph containing the noise label characteristics and the graph prototype into a preset graph matching neural network model for training to obtain an optimized graph matching neural network model;
an image recognition unit: the method is used for obtaining an image to be recognized, recognizing the image to be recognized by utilizing the optimized graph matching neural network model after extracting the characteristics of the image to be recognized, and obtaining the recognition result of the image to be recognized.
CN202211167812.6A 2022-09-23 2022-09-23 Network supervision fine-grained image identification method and system based on deep learning Pending CN115496948A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211167812.6A CN115496948A (en) 2022-09-23 2022-09-23 Network supervision fine-grained image identification method and system based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211167812.6A CN115496948A (en) 2022-09-23 2022-09-23 Network supervision fine-grained image identification method and system based on deep learning

Publications (1)

Publication Number Publication Date
CN115496948A true CN115496948A (en) 2022-12-20

Family

ID=84470196

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211167812.6A Pending CN115496948A (en) 2022-09-23 2022-09-23 Network supervision fine-grained image identification method and system based on deep learning

Country Status (1)

Country Link
CN (1) CN115496948A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116012569A (en) * 2023-03-24 2023-04-25 广东工业大学 Multi-label image recognition method based on deep learning and under noisy data

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116012569A (en) * 2023-03-24 2023-04-25 广东工业大学 Multi-label image recognition method based on deep learning and under noisy data
CN116012569B (en) * 2023-03-24 2023-08-15 广东工业大学 Multi-label image recognition method based on deep learning and under noisy data

Similar Documents

Publication Publication Date Title
CN111881714B (en) Unsupervised cross-domain pedestrian re-identification method
CN109919108B (en) Remote sensing image rapid target detection method based on deep hash auxiliary network
CN111310861B (en) License plate recognition and positioning method based on deep neural network
CN114067160B (en) Small sample remote sensing image scene classification method based on embedded smooth graph neural network
CN113378632B (en) Pseudo-label optimization-based unsupervised domain adaptive pedestrian re-identification method
CN112418117B (en) Small target detection method based on unmanned aerial vehicle image
CN111797779A (en) Remote sensing image semantic segmentation method based on regional attention multi-scale feature fusion
CN112308158A (en) Multi-source field self-adaptive model and method based on partial feature alignment
CN110321967B (en) Image classification improvement method based on convolutional neural network
CN111696101A (en) Light-weight solanaceae disease identification method based on SE-Inception
CN110097060B (en) Open set identification method for trunk image
CN113705641B (en) Hyperspectral image classification method based on rich context network
CN111612017A (en) Target detection method based on information enhancement
CN114842264B (en) Hyperspectral image classification method based on multi-scale spatial spectrum feature joint learning
CN113032613B (en) Three-dimensional model retrieval method based on interactive attention convolution neural network
CN115410088B (en) Hyperspectral image field self-adaption method based on virtual classifier
CN111898621A (en) Outline shape recognition method
CN114329031B (en) Fine-granularity bird image retrieval method based on graph neural network and deep hash
CN112347284A (en) Combined trademark image retrieval method
CN113947725B (en) Hyperspectral image classification method based on convolution width migration network
CN111832580B (en) SAR target recognition method combining less sample learning and target attribute characteristics
CN116704490B (en) License plate recognition method, license plate recognition device and computer equipment
CN112784754A (en) Vehicle re-identification method, device, equipment and storage medium
CN115393631A (en) Hyperspectral image classification method based on Bayesian layer graph convolution neural network
CN115457332A (en) Image multi-label classification method based on graph convolution neural network and class activation mapping

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination