CN115496948A - A network-supervised fine-grained image recognition method and system based on deep learning - Google Patents
A network-supervised fine-grained image recognition method and system based on deep learning Download PDFInfo
- Publication number
- CN115496948A CN115496948A CN202211167812.6A CN202211167812A CN115496948A CN 115496948 A CN115496948 A CN 115496948A CN 202211167812 A CN202211167812 A CN 202211167812A CN 115496948 A CN115496948 A CN 115496948A
- Authority
- CN
- China
- Prior art keywords
- graph
- feature
- noise label
- feature map
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/30—Noise filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
Description
技术领域technical field
本发明涉及图像识别技术领域,更具体地,涉及一种基于深度学习的网络监督细粒度图像识别方法和系统。The present invention relates to the technical field of image recognition, and more specifically, to a network-supervised fine-grained image recognition method and system based on deep learning.
背景技术Background technique
细粒度图像识别旨在识别给定对象类别的子类,例如不同种类的鸟类以及飞机和汽车,在智慧建设以及互联网等领域有着重要的科学意义和应用价值。近年来,随着深度学习的不断发展,细粒度图像识别取得了很大的进展。Fine-grained image recognition aims to identify subcategories of a given object category, such as different kinds of birds as well as airplanes and cars, and has important scientific significance and application value in the fields of smart construction and the Internet. In recent years, with the continuous development of deep learning, great progress has been made in fine-grained image recognition.
目前大部分算法主要采用以优质数据驱动的深度学习来实现细粒度图像识别,在很大程度上依赖于大规模的人工标注的数据,而这些数据集的收集之难以及数据标注成本之高已经成为制约其推广和普及的瓶颈。At present, most algorithms mainly use high-quality data-driven deep learning to achieve fine-grained image recognition, which largely relies on large-scale manually labeled data, and the difficulty of collecting these datasets and the high cost of data labeling have already Become a bottleneck restricting its promotion and popularization.
在互联网高速发展的当下,网络上有大量的弱标签数据可用于缓解目前细粒度图像识别算法对人工标注的依赖,即将网络检索所得的数据用于训练神经网络模型。然而,网络检索的数据中包含一定比例的噪声标签,这会对模型的训练产生不良影响。此外,细粒度图像中固有的类间方差小和类内方差大的特点进一步提高了识别难度。With the rapid development of the Internet, a large amount of weakly labeled data on the Internet can be used to alleviate the current fine-grained image recognition algorithm's dependence on manual annotation, that is, the data retrieved from the network is used to train the neural network model. However, the data retrieved by the network contains a certain proportion of noisy labels, which will adversely affect the training of the model. In addition, the small inter-class variance and large intra-class variance inherent in fine-grained images further increase the difficulty of recognition.
目前的现有技术公开了基于类间相似度的分布式标签的细粒度图像识别算法,包括以下步骤:使用骨干网络提取输入图像的特征表示;利用中心损失模块通过特征表示计算中心损失并更新类别中心;分类损失模块利用特征表示和最终标签分布计算分类损失(例如交叉熵损失),其中的最终标签分布通过计算独热标签分布和由类别中心生成的分布式标签分布的加权和得到;由中心损失和分类损失加权求和得到最终的目标损失函数,以此优化整个模型;现有技术中的方法能够通过降低模型预测的确信度缓解过拟合的问题,能够有效学习细粒度数据的辨别性特征,在一定程度上提高区分不同细粒度类别数据的准确性;但现有技术中的方法主要采用以优质数据驱动的深度学习来区分从属类别,依赖于大规模的人工标注的图像数据,数据收集及标注成本较高,在进行细粒度图像识别时常常费时费力,存在着效率和准确率均较低的问题。The current state-of-the-art discloses a fine-grained image recognition algorithm based on inter-class similarity distributed labels, including the following steps: using the backbone network to extract the feature representation of the input image; using the center loss module to calculate the center loss and update the category through the feature representation center; the classification loss module calculates the classification loss (e.g., cross-entropy loss) using the feature representation and the final label distribution, where the final label distribution is obtained by calculating the weighted sum of the one-hot label distribution and the distributed label distribution generated by the category center; by the center The weighted sum of loss and classification loss is used to obtain the final target loss function, so as to optimize the entire model; the method in the prior art can alleviate the problem of overfitting by reducing the certainty of model prediction, and can effectively learn the discriminability of fine-grained data features, to a certain extent, improve the accuracy of distinguishing different fine-grained category data; but the methods in the prior art mainly use high-quality data-driven deep learning to distinguish subordinate categories, relying on large-scale manually labeled image data, data The cost of collection and labeling is high, and it is often time-consuming and labor-intensive when performing fine-grained image recognition, and there are problems of low efficiency and accuracy.
发明内容Contents of the invention
本发明为克服上述现有技术在进行细粒度图像识别时效率和准确率低下的缺陷,提供一种基于深度学习的网络监督细粒度图像识别方法和系统,能够高效准确地对图像进行细粒度识别。In order to overcome the defects of low efficiency and accuracy in the fine-grained image recognition of the above-mentioned prior art, the present invention provides a network-supervised fine-grained image recognition method and system based on deep learning, which can efficiently and accurately perform fine-grained image recognition .
为解决上述技术问题,本发明的技术方案如下:In order to solve the problems of the technologies described above, the technical solution of the present invention is as follows:
一种基于深度学习的网络监督细粒度图像识别方法,包括以下步骤:A network supervised fine-grained image recognition method based on deep learning, comprising the following steps:
S1:从互联网中获取含有噪声标签的输入图像;S1: Obtain an input image with noisy labels from the Internet;
S2:对所述含有噪声标签的输入图像进行特征提取,获取区域判别特征图和整体特征图;S2: Perform feature extraction on the input image containing the noise label, and obtain a region discrimination feature map and an overall feature map;
S3:根据所获得的区域判别特征图和整体特征图,获取含有噪声标签特征的实例图;S3: According to the obtained region discriminant feature map and overall feature map, obtain an instance map containing noise label features;
S4:根据所获取的含有噪声标签特征的实例图,为每个类别构造图原型;S4: Construct a graph prototype for each category according to the obtained instance graph containing noise label features;
S5:将所获得的含有噪声标签特征的实例图与图原型输入预置的图匹配神经网络模型中进行训练,获得优化后的图匹配神经网络模型;S5: Input the obtained instance graph and graph prototype containing noise label features into the preset graph matching neural network model for training, and obtain the optimized graph matching neural network model;
S6:获取待识别图像,提取待识别图像特征后,利用所述优化后的图匹配神经网络模型对待识别图像进行识别,获得待识别图像的识别结果。S6: Acquiring the image to be recognized, extracting the features of the image to be recognized, using the optimized graph matching neural network model to recognize the image to be recognized, and obtaining a recognition result of the image to be recognized.
优选地,所述步骤S2中,对所述含有噪声标签的输入图像进行特征提取,获取区域判别特征图和整体特征图,具体方法为:Preferably, in the step S2, the feature extraction is performed on the input image containing the noise label, and the region discriminant feature map and the overall feature map are obtained, the specific method is:
用特征提取器对所述含有噪声标签的输入图像进行特征提取,获取整体特征图;将所述整体特征图通过一个卷积层,获取均值滤波后的整体特征图;对所述均值滤波后的整体特征图基于通道数计算每个位置的均值,获取整体均值特征图;搜寻整体均值特征图中的最大响应值区域,并定位最大响应值区域的坐标,根据最大响应值区域的坐标获取区域判别特征图。Carry out feature extraction to the input image that contains noise label with feature extractor, obtain overall feature map; Pass described overall feature map through a convolutional layer, obtain the overall feature map after mean value filtering; After described mean value filtering The overall feature map calculates the mean value of each position based on the number of channels, and obtains the overall mean feature map; searches for the maximum response value area in the overall mean feature map, and locates the coordinates of the maximum response value area, and obtains area discrimination based on the coordinates of the maximum response value area feature map.
优选地,所述搜寻整体均值特征图中的最大响应值区域,并定位最大响应值区域的坐标的具体方法为:Preferably, the specific method of searching for the maximum response value area in the overall mean feature map and locating the coordinates of the maximum response value area is:
根据以下公式进行搜寻整体均值特征图中的最大响应值区域,并定位最大响应值区域的坐标:Search for the maximum response value area in the overall mean feature map according to the following formula, and locate the coordinates of the maximum response value area:
其中,表示整体均值特征图,f‘g表示均值滤波后的整体特征图,C表示均值滤波后的整体特征图的通道数,表示搜寻最大响应值区域对应的行和列,(i,j)表示最大响应值区域的坐标。in, Represents the overall mean feature map, f' g represents the overall feature map after mean filtering, C represents the number of channels of the overall feature map after mean filtering, Indicates the row and column corresponding to the region of the maximum response value searched, and (i,j) represents the coordinates of the region of the maximum response value.
优选地,所述步骤S3中,根据所获得的区域判别特征图和整体特征图,获取含有噪声标签特征的实例图,具体方法为:Preferably, in the step S3, according to the obtained region discrimination feature map and the overall feature map, an instance map containing noise label features is obtained, and the specific method is as follows:
将所获得的区域判别特征图采用双线性插值的方法变换为相同的维度,获取相同维度的区域特征图;利用全局平均池化的方法对整体特征图和相同维度的区域特征图进行降维,获取降维后的整体特征图和降维后的区域特征图;根据降维后的整体特征图和降维后的区域特征图获取含有噪声标签特征的实例图:The obtained region discriminant feature map is transformed into the same dimension by bilinear interpolation method, and the regional feature map of the same dimension is obtained; the global average pooling method is used to reduce the dimensionality of the overall feature map and the regional feature map of the same dimension , to obtain the overall feature map after dimensionality reduction and the regional feature map after dimensionality reduction; according to the overall feature map after dimensionality reduction and the regional feature map after dimensionality reduction, an instance map containing noise label features is obtained:
Gins=<Vins,Eins>G ins =<V ins ,E ins >
其中,Gins表示含有噪声标签特征的实例图,Vins表示降维后的整体特征图和降维后的区域特征图中所有特征点的集合,Eins表示含有噪声标签特征的实例图中特征点之间连接的邻接矩阵。Among them, G ins represents the instance map containing noise label features, V ins represents the set of all feature points in the overall feature map after dimension reduction and the region feature map after dimension reduction, and E ins represents the feature in the instance map containing noise label features Adjacency matrix of connections between points.
优选地,所述步骤S4中,根据所获取的含有噪声标签特征的实例图,构造图原型的具体方法为:Preferably, in the step S4, according to the obtained instance graph containing noise label features, the specific method of constructing the graph prototype is as follows:
根据所获取的含有噪声标签特征的实例图,为每个类别构造一个与所述含有噪声标签特征的实例图相同结构的图原型,图原型采用移动平均的方式进行更新:According to the obtained instance graph containing noise label features, construct a graph prototype with the same structure as the instance graph containing noise label features for each category, and the graph prototype is updated by moving average:
Gk=<Vk,Ek>G k =<V k , E k >
其中,Gk表示所构建的第k个类别的图原型,Vk表示第k个类别的图原型中所有特征点的集合,Ek表示第k个类别的图原型中特征点之间连接的邻接矩阵,G'k为更新后的图原型,m为预设参数。Among them, G k represents the constructed graph prototype of the k-th category, V k represents the collection of all feature points in the graph prototype of the k-th category, and E k represents the connection between feature points in the graph prototype of the k-th category Adjacency matrix, G' k is the updated graph prototype, m is the preset parameter.
优选地,所述步骤S5中,将所获得的含有噪声标签特征的实例图与图原型输入预置的图匹配神经网络模型中进行训练,获得优化后的图匹配神经网络模型,具体方法为:Preferably, in the step S5, the obtained instance graph and graph prototype containing noise label features are input into a preset graph matching neural network model for training to obtain an optimized graph matching neural network model, the specific method is:
所述预置的图匹配神经网络模型包括图内传播层、图聚合层、图间传播层和图匹配层,获得优化后的图匹配神经网络模型包括以下步骤;The preset graph matching neural network model includes a graph intra-graph propagation layer, a graph aggregation layer, an inter-graph propagation layer and a graph matching layer, and obtaining the optimized graph matching neural network model includes the following steps;
S5.1:将所获得的含有噪声标签特征的实例图Gins与图原型Gk输入图内传播层,获得第一特征矩阵和第二特征矩阵,将第一特征矩阵和第二特征矩阵分别通过图卷积操作进行迭代更新;S5.1: Input the obtained instance graph G ins and graph prototype G k containing noise label features into the in-graph propagation layer to obtain the first feature matrix and the second feature matrix, and the first feature matrix and the second feature matrix respectively Iterative update by graph convolution operation;
S5.2:将迭代更新后的第一特征矩阵和第二特征矩阵输入所述图聚合层进行特征结合,获得聚合特征向量;S5.2: Input the iteratively updated first feature matrix and second feature matrix into the graph aggregation layer for feature combination to obtain an aggregated feature vector;
S5.3:将所述聚合特征向量输入图间传播层进行图卷积操作,并迭代更新所述聚合特征向量,获得第一特征表达fins和第二特征表达Zk;S5.3: Input the aggregated feature vector into the inter-graph propagation layer to perform a graph convolution operation, and iteratively update the aggregated feature vector to obtain the first feature expression f ins and the second feature expression Z k ;
S5.4:将第一特征表达fins和第二特征表达Zk输入图匹配层计算相似度Sk,根据相似度Sk计算图匹配损失 S5.4: Input the first feature expression f ins and the second feature expression Z k into the graph matching layer to calculate the similarity S k , and calculate the graph matching loss according to the similarity S k
S5.5:对含有噪声标签特征的实例图中的噪声标签进行修正以及对离群样本进行剔除;S5.5: Correct the noise labels in the instance graph containing noise label features and eliminate outlier samples;
S5.6:计算分类交叉熵损失和总损失根据总损失对所述图匹配神经网络模型进行优化,获得优化后的图匹配神经网络模型。S5.6: Compute categorical cross-entropy loss and total loss According to the total loss The graph matching neural network model is optimized to obtain the optimized graph matching neural network model.
优选地,所述步骤S5.4中,将第一特征表达fins和第二特征表达Zk输入图匹配层计算相似度Sk,根据相似度Sk计算图匹配损失具体为:Preferably, in the step S5.4, the first feature expression f ins and the second feature expression Z k are input into the graph matching layer to calculate the similarity S k , and the graph matching loss is calculated according to the similarity S k Specifically:
将所述第一特征表达fins和第二特征表达Zk输入图匹配层进行图匹配,并计算相似度Sk,具体为:The first feature expression f ins and the second feature expression Z k are input into the graph matching layer for graph matching, and the similarity S k is calculated, specifically:
所述图匹配层设置图匹配损失函数,根据相似度Sk计算图匹配损失,所述图匹配损失函数具体为:The graph matching layer sets the graph matching loss function, and calculates the graph matching loss according to the similarity S k , and the graph matching loss function is specifically:
其中,为图匹配损失,yi表示原始标签,k表示图原型的类别,K表示图原型的类别总数。in, is the graph matching loss, y i represents the original label, k represents the category of the graph prototype, and K represents the total number of categories of the graph prototype.
优选地,所述步骤S5.5中,对含有噪声标签特征的实例图中的噪声标签进行修正以及对离群样本进行剔除,具体方法为:Preferably, in the step S5.5, the noise labels in the instance graph containing noise label features are corrected and the outlier samples are eliminated, the specific method is:
所述图内传播层设置有分类器,将所述含有噪声标签特征的实例图输入分类器中,获得分类器分布概率pi,计算图匹配分布概率di,根据分类器分布概率pi和图匹配分布概率di计算总概率qi,具体为:The propagation layer in the graph is provided with a classifier, and the instance graph containing noise label features is input into the classifier to obtain the classifier distribution probability p i , and calculate the graph matching distribution probability d i , according to the classifier distribution probability p i and The graph matching distribution probability d i calculates the total probability q i , specifically:
qi=αpi+(1-α)di q i =αp i +(1-α)d i
其中,α为预设参数,τ为温度系数;Among them, α is a preset parameter, and τ is a temperature coefficient;
根据总概率qi和预设阈值T对含有噪声标签特征的实例图中的噪声标签进行修正以及对离群样本OOD进行剔除,具体为:According to the total probability q i and the preset threshold T, the noise labels in the instance graph containing noise label features are corrected and the outlier sample OOD is eliminated, specifically:
其中,为伪标签,T为预设阈值,当总概率qi的最大值大于T时,将总概率qi最大值对应的类别作为伪标签;当总概率qi大于类别平均概率时,将原始标签yi作为伪标签,实现对含有噪声标签特征的实例图中的噪声标签进行修正;其他情况将OOD作为伪标签,OOD表示离群样本,实现对离群样本的剔除。in, is a pseudo label, T is a preset threshold, when the maximum value of the total probability q i is greater than T, the category corresponding to the maximum value of the total probability q i is used as a pseudo label; when the total probability q i is greater than the average probability of the category, the original label y i is used as a pseudo-label to correct the noise label in the instance graph containing noise label features; in other cases, OOD is used as a pseudo-label, and OOD represents outlier samples to realize the elimination of outlier samples.
优选地,所述步骤S5.6中,计算分类交叉熵损失和总损失根据总损失对所述图匹配神经网络模型进行优化,获得优化后的图匹配神经网络模型,具体方法为:Preferably, in said step S5.6, the classification cross entropy loss is calculated and total loss According to the total loss Optimizing the graph matching neural network model to obtain the optimized graph matching neural network model, the specific method is:
所述图内传播层设置有分类交叉熵损失函数,具体为:The propagation layer in the graph is provided with a classification cross-entropy loss function, specifically:
其中,为分类交叉熵损失,pij为第i张含有噪声标签特征的实例图相对第j个类别的分类器分布概率,为第i张含有噪声标签特征的实例图相对第j个类别的伪标签;in, is the classification cross-entropy loss, p ij is the classifier distribution probability of the i-th instance image containing noise label features relative to the j-th category, is the pseudo-label of the i-th instance image containing noise label features relative to the j-th category;
根据分类交叉熵损失函数和图匹配损失函数构建总损失函数,所述总损失函数具体为:The total loss function is constructed according to the classification cross-entropy loss function and the graph matching loss function, and the total loss function is specifically:
其中,为总损失,λpro为比例系数;in, is the total loss, λ pro is the proportional coefficient;
根据总损失对所述图匹配神经网络模型进行优化,获得优化后的图匹配神经网络模型。According to the total loss The graph matching neural network model is optimized to obtain the optimized graph matching neural network model.
本发明还提供一种基于深度学习的网络监督细粒度图像识别系统,应用上述一种基于深度学习的网络监督细粒度图像识别方法,包括:The present invention also provides a network-supervised fine-grained image recognition system based on deep learning, applying the above-mentioned network-supervised fine-grained image recognition method based on deep learning, including:
图像获取单元:用来从互联网中获取含有噪声标签的输入图像;Image acquisition unit: used to acquire input images containing noise labels from the Internet;
特征提取单元:用来对所述含有噪声标签的输入图像进行特征提取,获取区域判别特征图和整体特征图;Feature extraction unit: used to perform feature extraction on the input image containing the noise label, and obtain a region discriminant feature map and an overall feature map;
实例图生成单元:用来根据所获得的区域判别特征图和整体特征图,获取含有噪声标签特征的实例图;Instance map generation unit: used to obtain an instance map containing noise label features according to the obtained regional discrimination feature map and overall feature map;
图原型构造单元:用来根据所获取的含有噪声标签特征的实例图,为每个类别构造图原型;Graph prototype construction unit: used to construct a graph prototype for each category based on the obtained instance graph containing noise label features;
图匹配单元:用来将所获得的含有噪声标签特征的实例图与图原型输入预置的图匹配神经网络模型中进行训练,获得优化后的图匹配神经网络模型;Graph matching unit: used to input the obtained instance graph and graph prototype containing noise label features into the preset graph matching neural network model for training, and obtain the optimized graph matching neural network model;
图像识别单元:用来获取待识别图像,提取待识别图像特征后,利用所述优化后的图匹配神经网络模型对待识别图像进行识别,获得待识别图像的识别结果。Image recognition unit: used to obtain the image to be recognized, extract the features of the image to be recognized, use the optimized graph matching neural network model to recognize the image to be recognized, and obtain the recognition result of the image to be recognized.
与现有技术相比,本发明技术方案的有益效果是:Compared with the prior art, the beneficial effects of the technical solution of the present invention are:
本发明提供一种基于深度学习的网络监督细粒度图像识别方法和系统,该方法通过对含有噪声标签的输入图像进行特征处理,获取含有噪声标签特征的实例图,利用含有噪声标签特征的实例图为每个类别构建一个对应的图原型,用所获得的含有噪声标签特征的实例图与图原型对预置的图像匹配神经网络模型中进行训练以及噪声标签的修正,利用优化后的图像匹配神经网络模型进行细粒度图像的识别;该方法基于深度学习进行网络监督细粒度图像的识别,通过引入图原型与含有噪声标签特征的实例图进行对比学习,能够有效地对噪声标签进行校正,显著提高了细粒度图像识别的效率和准确率。The present invention provides a method and system for network-supervised fine-grained image recognition based on deep learning. The method obtains an instance map containing noise label features by performing feature processing on an input image containing noise labels, and utilizes the instance map containing noise label features Construct a corresponding graph prototype for each category, use the obtained instance graph and graph prototype containing noise label features to train the preset image matching neural network model and correct the noise label, and use the optimized image matching neural network Network model for fine-grained image recognition; this method is based on deep learning for network-supervised fine-grained image recognition. By introducing graph prototypes and instance graphs with noise label features for comparative learning, noise labels can be effectively corrected and significantly improved. Improve the efficiency and accuracy of fine-grained image recognition.
附图说明Description of drawings
图1为实施例1所提供的一种基于深度学习的网络监督细粒度图像识别方法流程图。FIG. 1 is a flow chart of a network-supervised fine-grained image recognition method based on deep learning provided in Embodiment 1.
图2为实施例2所提供的一种基于深度学习的网络监督细粒度图像识别方法示意图。FIG. 2 is a schematic diagram of a network-supervised fine-grained image recognition method based on deep learning provided in Embodiment 2.
图3为实施例3所提供的一种基于深度学习的网络监督细粒度图像识别系统结构图。FIG. 3 is a structural diagram of a network-supervised fine-grained image recognition system based on deep learning provided in Embodiment 3.
301-图像获取单元,302-特征提取单元,303-实例图生成单元,304-图原型构造单元,305-图匹配单元,306-图像识别单元。301-image acquisition unit, 302-feature extraction unit, 303-instance graph generation unit, 304-graph prototype construction unit, 305-graph matching unit, 306-image recognition unit.
具体实施方式detailed description
附图仅用于示例性说明,不能理解为对本专利的限制;The accompanying drawings are for illustrative purposes only and cannot be construed as limiting the patent;
为了更好说明本实施例,附图某些部件会有省略、放大或缩小,并不代表实际产品的尺寸;In order to better illustrate this embodiment, some parts in the drawings will be omitted, enlarged or reduced, and do not represent the size of the actual product;
对于本领域技术人员来说,附图中某些公知结构及其说明可能省略是可以理解的。For those skilled in the art, it is understandable that some well-known structures and descriptions thereof may be omitted in the drawings.
下面结合附图和实施例对本发明的技术方案做进一步的说明。The technical solutions of the present invention will be further described below in conjunction with the accompanying drawings and embodiments.
实施例1Example 1
如图1所示,本实施例提供一种基于深度学习的网络监督细粒度图像识别方法,包括以下步骤:As shown in Fig. 1, the present embodiment provides a network supervision fine-grained image recognition method based on deep learning, comprising the following steps:
S1:从互联网中获取含有噪声标签的输入图像;S1: Obtain an input image with noisy labels from the Internet;
S2:对所述含有噪声标签的输入图像进行特征提取,获取区域判别特征图和整体特征图;S2: Perform feature extraction on the input image containing the noise label, and obtain a region discrimination feature map and an overall feature map;
S3:根据所获得的区域判别特征图和整体特征图,获取含有噪声标签特征的实例图;S3: According to the obtained region discriminant feature map and overall feature map, obtain an instance map containing noise label features;
S4:根据所获取的含有噪声标签特征的实例图,为每个类别构造图原型;S4: Construct a graph prototype for each category according to the obtained instance graph containing noise label features;
S5:将所获得的含有噪声标签特征的实例图与图原型输入预置的图匹配神经网络模型中进行训练,获得优化后的图匹配神经网络模型;S5: Input the obtained instance graph and graph prototype containing noise label features into the preset graph matching neural network model for training, and obtain the optimized graph matching neural network model;
S6:获取待识别图像,提取待识别图像特征后,利用所述优化后的图匹配神经网络模型对待识别图像进行识别,获得待识别图像的识别结果。S6: Acquiring the image to be recognized, extracting the features of the image to be recognized, using the optimized graph matching neural network model to recognize the image to be recognized, and obtaining a recognition result of the image to be recognized.
在具体实施过程中,首先通过网络检索获取含有噪声标签的输入图像,之后用CNN卷积神经网络对所述含有噪声标签的输入图像进行特征提取,获取区域判别特征图和整体特征图,之后根据所获得的区域判别特征图和整体特征图获取含有噪声标签特征的实例图,之后根据含有噪声标签特征的实例图为每个类别构建一个对应的图原型,之后将所获得的含有噪声标签特征的实例图与图原型输入预置的图匹配神经网络模型中进行训练,并计算图匹配损失和分类交叉熵损失进行优化神经网络,获得优化后的图匹配神经网络模型,最后利用所述优化后的图匹配神经网络模型对待识别图像进行识别,获得待识别图像的识别结果;In the specific implementation process, firstly, the input image containing the noise label is obtained through network retrieval, and then the CNN convolutional neural network is used to extract the feature of the input image containing the noise label, and the region discrimination feature map and the overall feature map are obtained, and then according to The obtained region discriminative feature map and the overall feature map obtain the instance map containing the noise label feature, and then construct a corresponding graph prototype for each category according to the instance map containing the noise label feature, and then use the obtained noise label feature The instance graph and graph prototype are input into the preset graph matching neural network model for training, and the graph matching loss and classification cross-entropy loss are calculated to optimize the neural network to obtain the optimized graph matching neural network model, and finally use the optimized graph matching neural network model The graph matching neural network model recognizes the image to be recognized, and obtains the recognition result of the image to be recognized;
该方法基于深度学习进行细粒度图像的识别,通过引入图原型与含有噪声标签特征的实例图进行对比学习,能够有效地对噪声标签进行校正,显著提高了细粒度图像识别的效率和准确率。This method is based on deep learning for fine-grained image recognition. By introducing graph prototypes and instance graphs with noise label features for comparative learning, it can effectively correct noise labels and significantly improve the efficiency and accuracy of fine-grained image recognition.
实施例2Example 2
如图2所示,本实施例提供一种基于深度学习的网络监督细粒度图像识别方法,包括以下步骤:As shown in Figure 2, the present embodiment provides a network supervision fine-grained image recognition method based on deep learning, including the following steps:
S1:从互联网中获取含有噪声标签的输入图像;S1: Obtain an input image with noisy labels from the Internet;
S2:对所述含有噪声标签的输入图像进行特征提取,获取区域判别特征图和整体特征图,具体方法为:S2: Perform feature extraction on the input image containing the noise label, and obtain the region discriminant feature map and the overall feature map, the specific method is:
用特征提取器对所述含有噪声标签的输入图像进行特征提取,获取整体特征图;将所述整体特征图通过一个卷积层,获取均值滤波后的整体特征图;对所述均值滤波后的整体特征图基于通道数计算每个位置的均值,获取整体均值特征图;搜寻整体均值特征图中的最大响应值区域,并定位最大响应值区域的坐标,根据最大响应值区域的坐标获取区域判别特征图;Carry out feature extraction to the input image that contains noise label with feature extractor, obtain overall feature map; Pass described overall feature map through a convolutional layer, obtain the overall feature map after mean value filtering; After described mean value filtering The overall feature map calculates the mean value of each position based on the number of channels, and obtains the overall mean feature map; searches for the maximum response value area in the overall mean feature map, and locates the coordinates of the maximum response value area, and obtains area discrimination based on the coordinates of the maximum response value area feature map;
所述搜寻整体均值特征图中的最大响应值区域,并定位最大响应值区域的坐标的具体方法为:The specific method of searching for the maximum response value area in the overall mean feature map and locating the coordinates of the maximum response value area is:
根据以下公式进行搜寻整体均值特征图中的最大响应值区域,并定位最大响应值区域的坐标:Search for the maximum response value area in the overall mean feature map according to the following formula, and locate the coordinates of the maximum response value area:
其中,表示整体均值特征图,f‘g表示均值滤波后的整体特征图,C表示均值滤波后的整体特征图的通道数,表示搜寻最大响应值区域对应的行和列,(i,j)表示最大响应值区域的坐标;in, Represents the overall mean feature map, f' g represents the overall feature map after mean filtering, C represents the number of channels of the overall feature map after mean filtering, Indicates the row and column corresponding to the area of maximum response value searched, and (i,j) indicates the coordinates of the area of maximum response value;
S3:根据所获得的区域判别特征图和整体特征图,获取含有噪声标签特征的实例图,具体方法为:S3: According to the obtained region discriminant feature map and overall feature map, obtain an instance map containing noise label features, the specific method is:
将所获得的区域判别特征图采用双线性插值的方法变换为相同的维度,获取相同维度的区域特征图;利用全局平均池化的方法对整体特征图和相同维度的区域特征图进行降维,获取降维后的整体特征图和降维后的区域特征图;根据降维后的整体特征图和降维后的区域特征图获取含有噪声标签特征的实例图:The obtained region discriminant feature map is transformed into the same dimension by bilinear interpolation method, and the regional feature map of the same dimension is obtained; the global average pooling method is used to reduce the dimensionality of the overall feature map and the regional feature map of the same dimension , to obtain the overall feature map after dimensionality reduction and the regional feature map after dimensionality reduction; according to the overall feature map after dimensionality reduction and the regional feature map after dimensionality reduction, an instance map containing noise label features is obtained:
Gins=<Vins,Eins>G ins =<V ins ,E ins >
其中,Gins表示含有噪声标签特征的实例图,Vins表示降维后的整体特征图和降维后的区域特征图中所有特征点的集合,Eins表示含有噪声标签特征的实例图中特征点之间连接的邻接矩阵;Among them, G ins represents the instance map containing noise label features, V ins represents the set of all feature points in the overall feature map after dimension reduction and the region feature map after dimension reduction, and E ins represents the feature in the instance map containing noise label features adjacency matrix of connections between points;
S4:根据所获取的含有噪声标签特征的实例图,为每个类别构造图原型,具体方法为:S4: Construct a graph prototype for each category according to the obtained instance graph containing noise label features, the specific method is:
根据所获取的含有噪声标签特征的实例图,为每个类别构造一个与所述含有噪声标签特征的实例图相同结构的图原型,图原型采用移动平均的方式进行更新:According to the obtained instance graph containing noise label features, construct a graph prototype with the same structure as the instance graph containing noise label features for each category, and the graph prototype is updated by moving average:
Gk=<Vk,Ek>G k =<V k , E k >
其中,Gk表示所构建的第k个类别的图原型,Vk表示第k个类别的图原型中所有特征点的集合,Ek表示第k个类别的图原型中特征点之间连接的邻接矩阵,G'k为更新后的图原型,m为预设参数;Among them, G k represents the constructed graph prototype of the k-th category, V k represents the collection of all feature points in the graph prototype of the k-th category, and E k represents the connection between feature points in the graph prototype of the k-th category Adjacency matrix, G' k is the updated graph prototype, m is the preset parameter;
S5:将所获得的含有噪声标签特征的实例图与图原型输入预置的图匹配神经网络模型中进行训练,获得优化后的图匹配神经网络模型;S5: Input the obtained instance graph and graph prototype containing noise label features into the preset graph matching neural network model for training, and obtain the optimized graph matching neural network model;
所述预置的图匹配神经网络模型包括图内传播层、图聚合层、图间传播层和图匹配层,获取优化后的图匹配神经网络模型包括以下步骤;The preset graph matching neural network model includes a graph intra-graph propagation layer, a graph aggregation layer, an inter-graph propagation layer and a graph matching layer, and obtaining the optimized graph matching neural network model includes the following steps;
S5.1:将所获得的含有噪声标签特征的实例图Gins与图原型Gk输入图内传播层,获得第一特征矩阵和第二特征矩阵,将第一特征矩阵和第二特征矩阵分别通过图卷积操作进行迭代更新,具体为:S5.1: Input the obtained instance graph G ins and graph prototype G k containing noise label features into the in-graph propagation layer to obtain the first feature matrix and the second feature matrix, and the first feature matrix and the second feature matrix respectively Iterative updates are performed through graph convolution operations, specifically:
将所获得的含有噪声标签特征的实例图Gins与图原型Gk输入图内传播层,将降维后的整体特征图和降维后的区域特征图中所有特征点的集合Vins重构为第一特征矩阵其中,n1为含有噪声标签特征的实例图所有特征点的数量,c1为含有噪声标签特征的实例图中每个特征点对应的维度;Input the obtained instance graph G ins containing noise label features and the graph prototype G k into the in-graph propagation layer, and reconstruct the set V ins of all feature points in the overall feature map after dimensionality reduction and the regional feature map after dimensionality reduction is the first characteristic matrix Among them, n 1 is the number of all feature points in the instance graph containing noise label features, and c 1 is the dimension corresponding to each feature point in the instance graph containing noise label features;
将图原型中所有特征点的集合Vk重构为第二特征矩阵其中,n2为图原型中所有特征点的数量,c2为图原型中每个特征点对应的维度;Reconstruct the set V k of all feature points in the graph prototype into the second feature matrix Among them, n 2 is the number of all feature points in the graph prototype, and c 2 is the dimension corresponding to each feature point in the graph prototype;
对所述第一特征矩阵和第二特征矩阵分别进行图卷积操作,并迭代更新所述第一特征矩阵和第二特征矩阵,具体为:Perform graph convolution operations on the first feature matrix and the second feature matrix, and iteratively update the first feature matrix and the second feature matrix, specifically:
其中,为第l次迭代更新后的第一特征矩阵,为第l次迭代更新后的第二特征矩阵,和为图内传播层的参数;in, is the first feature matrix after the l-th iteration update, is the second feature matrix after the l-th iteration update, with is the parameter of the propagation layer in the graph;
S5.2:将迭代更新后的第一特征矩阵和第二特征矩阵输入所述图聚合层进行特征结合,获得聚合特征向量,具体为:S5.2: Input the iteratively updated first feature matrix and second feature matrix into the graph aggregation layer for feature combination to obtain an aggregated feature vector, specifically:
将迭代更新后的第一特征矩阵和第二特征矩阵输入所述图像聚合层进行特征结合,获得聚合特征向量,具体为:Input the iteratively updated first feature matrix and the second feature matrix into the image aggregation layer for feature combination to obtain an aggregated feature vector, specifically:
其中,为聚合特征向量,为更新后的第一特征矩阵,为更新后的第二特征矩阵;in, is the aggregated feature vector, is the updated first feature matrix, is the updated second feature matrix;
S5.3:将所述聚合特征向量输入图间传播层进行图卷积操作,并迭代更新所述聚合特征向量,获得第一特征表达fins和第二特征表达Zk,具体为:S5.3: Input the aggregated feature vector into the inter-graph propagation layer for graph convolution operation, and iteratively update the aggregated feature vector to obtain the first feature expression f ins and the second feature expression Z k , specifically:
将所述聚合特征向量输入图间传播层进行图卷积操作,并迭代更新所述聚合特征向量,具体为:Input the aggregated feature vector into the inter-graph propagation layer for graph convolution operation, and iteratively update the aggregated feature vector, specifically:
其中,为第l次迭代更新后的聚合特征向量,Ecross为聚合特征向量的邻接矩阵,和为图间传播层的参数;in, E cross is the adjacency matrix of the aggregated feature vector, with is the parameter of the inter-graph propagation layer;
根据第l次迭代更新后的聚合特征向量获得第一特征表达fins和第二特征表达Zk;Obtain the first feature expression f ins and the second feature expression Z k according to the aggregated feature vector updated in the l iteration;
S5.4:将第一特征表达fins和第二特征表达Zk输入图匹配层计算相似度Sk,根据相似度Sk计算图匹配损失具体为:S5.4: Input the first feature expression f ins and the second feature expression Z k into the graph matching layer to calculate the similarity S k , and calculate the graph matching loss according to the similarity S k Specifically:
将所述第一特征表达fins和第二特征表达Zk输入图匹配层进行图匹配,并计算相似度Sk,具体为:The first feature expression f ins and the second feature expression Z k are input into the graph matching layer for graph matching, and the similarity S k is calculated, specifically:
所述图匹配层设置图匹配损失函数,根据相似度Sk计算图匹配损失,所述图匹配损失函数具体为:The graph matching layer sets the graph matching loss function, and calculates the graph matching loss according to the similarity S k , and the graph matching loss function is specifically:
其中,为图匹配损失,yi表示原始标签,k表示图原型的类别,K表示图原型的类别总数;in, is the graph matching loss, y i represents the original label, k represents the category of the graph prototype, and K represents the total number of categories of the graph prototype;
S5.5:对含有噪声标签特征的实例图中的噪声标签进行修正以及对离群样本进行剔除,具体为:S5.5: Correct the noise labels in the instance graph containing noise label features and eliminate outlier samples, specifically:
所述图内传播层设置有分类器,将所述含有噪声标签特征的实例图输入分类器中,获得分类器分布概率pi,计算图匹配分布概率di,根据分类器分布概率pi和图匹配分布概率di计算总概率qi,具体为:The propagation layer in the graph is provided with a classifier, and the instance graph containing noise label features is input into the classifier to obtain the classifier distribution probability p i , and calculate the graph matching distribution probability d i , according to the classifier distribution probability p i and The graph matching distribution probability d i calculates the total probability q i , specifically:
qi=αpi+(1-α)di q i =αp i +(1-α)d i
其中,α为预设参数,τ为温度系数;Among them, α is a preset parameter, and τ is a temperature coefficient;
根据总概率qi和预设阈值T对含有噪声标签特征的实例图中的噪声标签进行修正以及对离群样本OOD进行剔除,具体为:According to the total probability q i and the preset threshold T, the noise labels in the instance graph containing noise label features are corrected and the outlier sample OOD is eliminated, specifically:
其中,为伪标签,T为预设阈值,当总概率qi的最大值大于T时,将总概率qi最大值对应的类别作为伪标签;当总概率qi大于类别平均概率时,将原始标签yi作为伪标签,实现对含有噪声标签特征的实例图中的噪声标签进行修正;其他情况将OOD作为伪标签,OOD表示离群样本,实现对离群样本的剔除;in, is a pseudo label, T is a preset threshold, when the maximum value of the total probability q i is greater than T, the category corresponding to the maximum value of the total probability q i is used as a pseudo label; when the total probability q i is greater than the average probability of the category, the original label y i is used as a pseudo-label to correct the noise label in the instance graph containing noise label features; in other cases, OOD is used as a pseudo-label, and OOD represents an outlier sample to realize the elimination of outlier samples;
S5.6:计算分类交叉熵损失和总损失根据总损失对所述图匹配神经网络模型进行优化,获得优化后的图匹配神经网络模型,具体为:S5.6: Compute categorical cross-entropy loss and total loss According to the total loss The graph matching neural network model is optimized to obtain the optimized graph matching neural network model, specifically:
所述图内传播层设置有分类交叉熵损失函数,具体为:The propagation layer in the graph is provided with a classification cross-entropy loss function, specifically:
其中,为分类交叉熵损失,pij为第i张含有噪声标签特征的实例图相对第j个类别的分类器分布概率,为第i张含有噪声标签特征的实例图相对第j个类别的伪标签;in, is the classification cross-entropy loss, p ij is the classifier distribution probability of the i-th instance image containing noise label features relative to the j-th category, is the pseudo-label of the i-th instance image containing noise label features relative to the j-th category;
根据分类交叉熵损失函数和图匹配损失函数构建总损失函数,所述总损失函数具体为:The total loss function is constructed according to the classification cross-entropy loss function and the graph matching loss function, and the total loss function is specifically:
其中,为总损失,λpro为比例系数;in, is the total loss, λ pro is the proportional coefficient;
根据总损失对所述图匹配神经网络模型进行优化,获得优化后的图匹配神经网络模型;According to the total loss Optimizing the graph matching neural network model to obtain the optimized graph matching neural network model;
S6:获取待识别图像,提取待识别图像特征后,利用所述优化后的图匹配神经网络模型对待识别图像进行识别,获得待识别图像的识别结果。。S6: Acquiring the image to be recognized, extracting the features of the image to be recognized, using the optimized graph matching neural network model to recognize the image to be recognized, and obtaining a recognition result of the image to be recognized. .
在具体实施过程中,首先通过网络检索获取含有噪声标签的输入图像,本实施例中所使用的数据集为WebFG-496,该数据集由三个子数据集组成,分别为Web-Bird、Web-Aircraft和Web-Car,所述含有噪声标签的输入图像尺寸为448×448;In the specific implementation process, firstly, the input image containing noise labels is retrieved through the network. The data set used in this embodiment is WebFG-496, which consists of three sub-data sets, namely Web-Bird, Web-Bird, and Web-Bird. Aircraft and Web-Car, the size of the input image containing noise labels is 448×448;
之后设置以ResNet50-varian作为骨干CNN的卷积神经网络,用特征提取器对所述含有噪声标签的输入图像进行特征提取,获取整体特征图,所述整体特征图维度为14×14×2048;将所述整体特征图通过一个卷积层,获取均值滤波后的整体特征图;对所述均值滤波后的整体特征图基于通道数计算每个位置的均值,获取整体均值特征图;Afterwards, the convolutional neural network with ResNet50-varian as the backbone CNN is set, and the feature extractor is used to extract the features of the input image containing the noise label to obtain the overall feature map, and the overall feature map dimension is 14 * 14 * 2048; passing the overall feature map through a convolutional layer to obtain an overall feature map after mean filtering; calculating the mean value of each position based on the number of channels for the overall feature map after mean filtering to obtain an overall mean feature map;
根据以下公式进行搜寻整体均值特征图中的最大响应值区域,并定位最大响应值区域的坐标:Search for the maximum response value area in the overall mean feature map according to the following formula, and locate the coordinates of the maximum response value area:
其中,表示整体均值特征图,f‘g表示均值滤波后的整体特征图,C表示均值滤波后的整体特征图的通道数,表示搜寻最大响应值区域对应的行和列,(i,j)表示最大响应值区域的坐标;in, Represents the overall mean feature map, f' g represents the overall feature map after mean filtering, C represents the number of channels of the overall feature map after mean filtering, Indicates the row and column corresponding to the area of maximum response value searched, and (i,j) indicates the coordinates of the area of maximum response value;
根据所获得的最大值响应区域的坐标在所述整体特征图中截取若干不同大小的局部区域,本实施例设置三种不同的面积大小S1、S2、S3以及三种不同的长宽比A1、A2、A3共9种组合,对所述整体特征图进行截取,其中三种不同面积大小S1、S2、S3分别为整体特征图面积的二分之一、三分之一、三分之二,三类不同的长宽比值A1、A2、A3分别为1、0.5、2;According to the coordinates of the obtained maximum response area, several local areas of different sizes are intercepted in the overall feature map. In this embodiment, three different area sizes S 1 , S 2 , S 3 and three different lengths and widths are set. Compared with A 1 , A 2 , and A 3 in total 9 combinations, the overall feature map is intercepted, and the three different area sizes S 1 , S 2 , and S 3 are respectively one-half and three times the area of the overall feature map. One-third, two-thirds, three different aspect ratios A 1 , A 2 , A 3 are 1, 0.5, 2 respectively;
用特征提取器对所截取的若干不同大小的局部区域进行特征提取,获取区域判别特征图;Use the feature extractor to perform feature extraction on several intercepted local areas of different sizes, and obtain the area discriminant feature map;
构建含有噪声标签特征的实例图和每个类别对应的图原型,将得到的含有噪声标签特征的实例图和图原型分别输入图内传播层GCN进行图卷积操作,本实施例中,输出通道数分别为1024和2048;将输出的含有噪声标签特征的实例图和图原型特征进行聚合,并获得第一特征表达fins和第二特征表达Zk;根据第一特征表达fins和第二特征表达Zk分别计算图匹配损失和分类交叉熵损失来对图匹配神经网络模型进行优化;Construct an instance graph containing noise label features and a graph prototype corresponding to each category, and input the obtained instance graph and graph prototype containing noise label features into the graph propagation layer GCN for graph convolution operation. In this embodiment, the output channel The numbers are 1024 and 2048 respectively; aggregate the output instance graphs containing noise label features and graph-based features, and obtain the first feature expression f ins and the second feature expression Z k ; according to the first feature expression f ins and the second The feature expression Z k calculates the graph matching loss and classification cross entropy loss respectively to optimize the graph matching neural network model;
本实施例中,α=0.5,τ=0.1,T=0.75,λpro=1;In this embodiment, α=0.5, τ=0.1, T=0.75, λ pro =1;
从CUB200-2011、FGVC-Aircraft和Stanford Cars中获取待识别图像作为验证数据,提取待识别图像的特征后,利用所述优化后的图像匹配神经网络模型对待识别图像进行识别,获得待识别图像的识别结果;Obtain the image to be recognized from CUB200-2011, FGVC-Aircraft and Stanford Cars as verification data, extract the features of the image to be recognized, use the optimized image matching neural network model to recognize the image to be recognized, and obtain the image to be recognized recognition result;
如下表所示,为不同方法细粒度图像的识别准确率对比图:As shown in the table below, it is a comparison chart of the recognition accuracy of fine-grained images by different methods:
表1-不同方法细粒度图像的识别准确率对比图Table 1 - Comparison of recognition accuracy of fine-grained images by different methods
与基本模型进行比较,本实施例中的方法在三个数据集上的性能表现都远超于各类基本模型,本实施例使用的骨干网络为ResNet-50,相比于单独ResNet-50模型,本实施例的方法在三个数据集上都有了大幅度的提升,平均识别准确率提升了20.14%;为了进行公平的比较,统一使用ResNet-50作为骨干网络,由图3可知,当使用ResNet-50作为骨干网络时,本实施例的方法取得最高的83.53%的平均准确率,而在Web-Bird、Web-Aircraft和Web-Car上的准确率分别为76.62%、85.79%和82.09%,比目前较为先进的方法Peer-learning高出2.23%、4.2%和1.94%;更进一步地使用其它模型如B-CNN作为骨干网络,从比较结果中可知,本实施例的方法可与不同的骨干网络进行适配,从而在细粒度图像识别中得到较为明显的性能提升;Compared with the basic model, the performance of the method in this example on the three data sets is far superior to that of various basic models. The backbone network used in this example is ResNet-50. Compared with the single ResNet-50 model , the method of this embodiment has greatly improved on the three data sets, and the average recognition accuracy rate has increased by 20.14%. In order to make a fair comparison, ResNet-50 is uniformly used as the backbone network. When using ResNet-50 as the backbone network, the method of this embodiment achieves the highest average accuracy rate of 83.53%, while the accuracy rates on Web-Bird, Web-Aircraft and Web-Car are 76.62%, 85.79% and 82.09% respectively %, 2.23%, 4.2% and 1.94% higher than the currently more advanced method Peer-learning; further use other models such as B-CNN as the backbone network, as can be seen from the comparison results, the method of this embodiment can be different from The backbone network is adapted to obtain a more obvious performance improvement in fine-grained image recognition;
该方法基于深度学习进行网络监督细粒度图像的识别,通过引入图原型与含有噪声标签特征的实例图进行对比学习,能够有效地对噪声标签进行校正,显著提高了细粒度图像识别的效率和准确率。This method is based on deep learning for network-supervised fine-grained image recognition. By introducing graph prototypes and instance graphs with noisy label features for comparative learning, it can effectively correct noisy labels and significantly improve the efficiency and accuracy of fine-grained image recognition. Rate.
实施例3Example 3
如图3所示,本实施例提供一种基于深度学习的网络监督细粒度图像识别系统,应用实施例1或2所述的基于深度学习的网络监督细粒度图像识别方法,包括:As shown in FIG. 3 , this embodiment provides a network-supervised fine-grained image recognition system based on deep learning, applying the deep-learning-based network-supervised fine-grained image recognition method described in Embodiment 1 or 2, including:
图像获取单元301:用来从互联网中获取含有噪声标签的输入图像;Image acquisition unit 301: used to acquire input images containing noise labels from the Internet;
特征提取单元302:用来对所述含有噪声标签的输入图像进行特征提取,获取区域判别特征图和整体特征图;Feature extraction unit 302: used to perform feature extraction on the input image containing noise labels, and obtain a region discrimination feature map and an overall feature map;
实例图生成单元303:用来根据所获得的区域判别特征图和整体特征图,获取含有噪声标签特征的实例图;Instance map generation unit 303: used to obtain an instance map containing noise label features according to the obtained region discrimination feature map and overall feature map;
图原型构造单元304:用来根据所获取的含有噪声标签特征的实例图,为每个类别构造图原型;Graph prototype construction unit 304: used to construct a graph prototype for each category according to the obtained instance graph containing noise label features;
图匹配单元305:用来将所获得的含有噪声标签特征的实例图与图原型输入预置的图匹配神经网络模型中进行训练,获得优化后的图匹配神经网络模型;Graph matching unit 305: used to input the obtained instance graph and graph prototype containing noise label features into the preset graph matching neural network model for training, and obtain the optimized graph matching neural network model;
图像识别单元306:用来获取待识别图像,提取待识别图像特征后,利用所述优化后的图匹配神经网络模型对待识别图像进行识别,获得待识别图像的识别结果;Image recognition unit 306: used to acquire the image to be recognized, extract the features of the image to be recognized, use the optimized graph matching neural network model to recognize the image to be recognized, and obtain the recognition result of the image to be recognized;
在具体实施过程中,首先利用图像获取单元301进行网络检索,获取含有噪声标签的输入图像;之后利用特征提取单元302对所述含有噪声标签的输入图像进行特征提取,获取区域判别特征图和整体特征图;利用实例图生成单元303根据所获得的区域判别特征图和整体特征图,获取含有噪声标签特征的实例图;之后根据所获取的含有噪声标签特征的实例图,利用图原型构造单元304为每个类别构造图原型;之后利用图匹配单元305将所获得的含有噪声标签特征的实例图与图原型输入预置的图匹配神经网络模型中进行训练,获得优化后的图匹配神经网络模型;最后图像识别单元306获取待识别图像,提取待识别图像特征后,利用所述优化后的图像匹配神经网络模型对待识别图像进行识别,获得待识别图像的识别结果;In the specific implementation process, first use the
该系统基于深度学习进行细粒度图像的识别,通过引入图原型与含有噪声标签特征的实例图进行对比学习,能够有效地对噪声标签进行校正,显著提高了细粒度图像识别的效率和准确率。The system recognizes fine-grained images based on deep learning, and can effectively correct noisy labels by introducing graph prototypes and instance graphs containing noise label features for comparative learning, significantly improving the efficiency and accuracy of fine-grained image recognition.
相同或相似的标号对应相同或相似的部件;The same or similar reference numerals correspond to the same or similar components;
附图中描述位置关系的用语仅用于示例性说明,不能理解为对本专利的限制;The terms describing the positional relationship in the drawings are only for illustrative purposes and cannot be interpreted as limitations on this patent;
显然,本发明的上述实施例仅仅是为清楚地说明本发明所作的举例,而并非是对本发明的实施方式的限定。对于所属领域的普通技术人员来说,在上述说明的基础上还可以做出其它不同形式的变化或变动。这里无需也无法对所有的实施方式予以穷举。凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等,均应包含在本发明权利要求的保护范围之内。Apparently, the above-mentioned embodiments of the present invention are only examples for clearly illustrating the present invention, rather than limiting the implementation of the present invention. For those of ordinary skill in the art, other changes or changes in different forms can be made on the basis of the above description. It is not necessary and impossible to exhaustively list all the implementation manners here. All modifications, equivalent replacements and improvements made within the spirit and principles of the present invention shall be included within the protection scope of the claims of the present invention.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211167812.6A CN115496948B (en) | 2022-09-23 | 2022-09-23 | A network-supervised fine-grained image recognition method and system based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211167812.6A CN115496948B (en) | 2022-09-23 | 2022-09-23 | A network-supervised fine-grained image recognition method and system based on deep learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115496948A true CN115496948A (en) | 2022-12-20 |
CN115496948B CN115496948B (en) | 2025-06-27 |
Family
ID=84470196
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211167812.6A Active CN115496948B (en) | 2022-09-23 | 2022-09-23 | A network-supervised fine-grained image recognition method and system based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115496948B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116012569A (en) * | 2023-03-24 | 2023-04-25 | 广东工业大学 | Multi-label image recognition method based on deep learning and under noisy data |
CN119579992A (en) * | 2024-11-26 | 2025-03-07 | 济南大学 | Semi-supervised image classification method and system based on pseudo-label and embedded cluster matching |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109800811A (en) * | 2019-01-24 | 2019-05-24 | 吉林大学 | A kind of small sample image-recognizing method based on deep learning |
CN113392875A (en) * | 2021-05-20 | 2021-09-14 | 广东工业大学 | Method, system and equipment for classifying fine granularity of image |
CN113592023A (en) * | 2021-08-11 | 2021-11-02 | 杭州电子科技大学 | High-efficiency fine-grained image classification model based on depth model framework |
-
2022
- 2022-09-23 CN CN202211167812.6A patent/CN115496948B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109800811A (en) * | 2019-01-24 | 2019-05-24 | 吉林大学 | A kind of small sample image-recognizing method based on deep learning |
CN113392875A (en) * | 2021-05-20 | 2021-09-14 | 广东工业大学 | Method, system and equipment for classifying fine granularity of image |
CN113592023A (en) * | 2021-08-11 | 2021-11-02 | 杭州电子科技大学 | High-efficiency fine-grained image classification model based on depth model framework |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116012569A (en) * | 2023-03-24 | 2023-04-25 | 广东工业大学 | Multi-label image recognition method based on deep learning and under noisy data |
CN116012569B (en) * | 2023-03-24 | 2023-08-15 | 广东工业大学 | Multi-label image recognition method based on deep learning and under noisy data |
CN119579992A (en) * | 2024-11-26 | 2025-03-07 | 济南大学 | Semi-supervised image classification method and system based on pseudo-label and embedded cluster matching |
CN119579992B (en) * | 2024-11-26 | 2025-05-30 | 济南大学 | Semi-supervised image classification method and system based on pseudo tag and embedded cluster matching |
Also Published As
Publication number | Publication date |
---|---|
CN115496948B (en) | 2025-06-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113378632B (en) | Pseudo-label optimization-based unsupervised domain adaptive pedestrian re-identification method | |
CN110942091B (en) | Semi-supervised few-sample image classification method for searching reliable abnormal data center | |
CN107437100A (en) | A kind of picture position Forecasting Methodology based on the association study of cross-module state | |
CN111079847B (en) | Remote sensing image automatic labeling method based on deep learning | |
CN110297931B (en) | Image retrieval method | |
CN110728694B (en) | Long-time visual target tracking method based on continuous learning | |
CN110046671A (en) | A kind of file classification method based on capsule network | |
CN107515895A (en) | A visual target retrieval method and system based on target detection | |
CN113326731A (en) | Cross-domain pedestrian re-identification algorithm based on momentum network guidance | |
CN110097060B (en) | Open set identification method for trunk image | |
CN106203523A (en) | The classification hyperspectral imagery of the semi-supervised algorithm fusion of decision tree is promoted based on gradient | |
CN109753897B (en) | Behavior recognition method based on memory cell reinforcement-time sequence dynamic learning | |
CN110390275A (en) | A hand gesture classification method based on transfer learning | |
CN113159066B (en) | Fine-grained image recognition algorithm of distributed labels based on inter-class similarity | |
CN111783688B (en) | A classification method of remote sensing image scene based on convolutional neural network | |
CN112926451B (en) | Cross-modal pedestrian re-identification method based on self-simulation mutual distillation | |
CN115496948A (en) | A network-supervised fine-grained image recognition method and system based on deep learning | |
CN109543546B (en) | Gait age estimation method based on depth sequence distribution regression | |
CN111259917B (en) | An Image Feature Extraction Method Based on Local Neighbor Component Analysis | |
CN117058437A (en) | Flower classification method, system, equipment and medium based on knowledge distillation | |
CN115457332A (en) | Image multi-label classification method based on graph convolution neural network and class activation mapping | |
CN111832580B (en) | SAR Target Recognition Method Combining Few-Shot Learning and Target Attribute Features | |
CN111695531B (en) | Cross-domain pedestrian re-identification method based on heterogeneous convolution network | |
CN116681975A (en) | An open set image recognition method and system based on active learning | |
CN112949771A (en) | Hyperspectral remote sensing image classification method based on multi-depth multi-scale hierarchical attention fusion mechanism |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant |