CN115496948A

CN115496948A - A network-supervised fine-grained image recognition method and system based on deep learning

Info

Publication number: CN115496948A
Application number: CN202211167812.6A
Authority: CN
Inventors: 林坚满; 陈添水; 林坚涛; 杨志景
Original assignee: Guangdong University of Technology
Current assignee: Guangdong University of Technology
Priority date: 2022-09-23
Filing date: 2022-09-23
Publication date: 2022-12-20
Anticipated expiration: 2042-09-23
Also published as: CN115496948B

Abstract

The invention provides a network supervision fine-grained image recognition method and system based on deep learning, which are characterized in that an example graph containing noise label features is obtained by carrying out feature processing on an input image containing a noise label, a graph prototype is constructed for each category by utilizing the example graph containing the label, a preset graph matching neural network model is trained by utilizing the obtained example graph containing the noise label features and the graph prototype, and the optimized graph matching neural network model is utilized to recognize fine-grained images; the method identifies the fine-grained image based on deep learning, and by introducing the image prototype and the example image containing the noise label characteristics for comparison learning, the noise label can be effectively corrected and the outlier sample can be eliminated, so that the efficiency and the accuracy of identifying the fine-grained image are obviously improved.

Description

A network-supervised fine-grained image recognition method and system based on deep learning

技术领域technical field

本发明涉及图像识别技术领域，更具体地，涉及一种基于深度学习的网络监督细粒度图像识别方法和系统。The present invention relates to the technical field of image recognition, and more specifically, to a network-supervised fine-grained image recognition method and system based on deep learning.

背景技术Background technique

细粒度图像识别旨在识别给定对象类别的子类，例如不同种类的鸟类以及飞机和汽车，在智慧建设以及互联网等领域有着重要的科学意义和应用价值。近年来，随着深度学习的不断发展，细粒度图像识别取得了很大的进展。Fine-grained image recognition aims to identify subcategories of a given object category, such as different kinds of birds as well as airplanes and cars, and has important scientific significance and application value in the fields of smart construction and the Internet. In recent years, with the continuous development of deep learning, great progress has been made in fine-grained image recognition.

目前大部分算法主要采用以优质数据驱动的深度学习来实现细粒度图像识别，在很大程度上依赖于大规模的人工标注的数据，而这些数据集的收集之难以及数据标注成本之高已经成为制约其推广和普及的瓶颈。At present, most algorithms mainly use high-quality data-driven deep learning to achieve fine-grained image recognition, which largely relies on large-scale manually labeled data, and the difficulty of collecting these datasets and the high cost of data labeling have already Become a bottleneck restricting its promotion and popularization.

在互联网高速发展的当下，网络上有大量的弱标签数据可用于缓解目前细粒度图像识别算法对人工标注的依赖，即将网络检索所得的数据用于训练神经网络模型。然而，网络检索的数据中包含一定比例的噪声标签，这会对模型的训练产生不良影响。此外，细粒度图像中固有的类间方差小和类内方差大的特点进一步提高了识别难度。With the rapid development of the Internet, a large amount of weakly labeled data on the Internet can be used to alleviate the current fine-grained image recognition algorithm's dependence on manual annotation, that is, the data retrieved from the network is used to train the neural network model. However, the data retrieved by the network contains a certain proportion of noisy labels, which will adversely affect the training of the model. In addition, the small inter-class variance and large intra-class variance inherent in fine-grained images further increase the difficulty of recognition.

目前的现有技术公开了基于类间相似度的分布式标签的细粒度图像识别算法，包括以下步骤：使用骨干网络提取输入图像的特征表示；利用中心损失模块通过特征表示计算中心损失并更新类别中心；分类损失模块利用特征表示和最终标签分布计算分类损失(例如交叉熵损失)，其中的最终标签分布通过计算独热标签分布和由类别中心生成的分布式标签分布的加权和得到；由中心损失和分类损失加权求和得到最终的目标损失函数，以此优化整个模型；现有技术中的方法能够通过降低模型预测的确信度缓解过拟合的问题，能够有效学习细粒度数据的辨别性特征，在一定程度上提高区分不同细粒度类别数据的准确性；但现有技术中的方法主要采用以优质数据驱动的深度学习来区分从属类别，依赖于大规模的人工标注的图像数据，数据收集及标注成本较高，在进行细粒度图像识别时常常费时费力，存在着效率和准确率均较低的问题。The current state-of-the-art discloses a fine-grained image recognition algorithm based on inter-class similarity distributed labels, including the following steps: using the backbone network to extract the feature representation of the input image; using the center loss module to calculate the center loss and update the category through the feature representation center; the classification loss module calculates the classification loss (e.g., cross-entropy loss) using the feature representation and the final label distribution, where the final label distribution is obtained by calculating the weighted sum of the one-hot label distribution and the distributed label distribution generated by the category center; by the center The weighted sum of loss and classification loss is used to obtain the final target loss function, so as to optimize the entire model; the method in the prior art can alleviate the problem of overfitting by reducing the certainty of model prediction, and can effectively learn the discriminability of fine-grained data features, to a certain extent, improve the accuracy of distinguishing different fine-grained category data; but the methods in the prior art mainly use high-quality data-driven deep learning to distinguish subordinate categories, relying on large-scale manually labeled image data, data The cost of collection and labeling is high, and it is often time-consuming and labor-intensive when performing fine-grained image recognition, and there are problems of low efficiency and accuracy.

发明内容Contents of the invention

本发明为克服上述现有技术在进行细粒度图像识别时效率和准确率低下的缺陷，提供一种基于深度学习的网络监督细粒度图像识别方法和系统，能够高效准确地对图像进行细粒度识别。In order to overcome the defects of low efficiency and accuracy in the fine-grained image recognition of the above-mentioned prior art, the present invention provides a network-supervised fine-grained image recognition method and system based on deep learning, which can efficiently and accurately perform fine-grained image recognition .

为解决上述技术问题，本发明的技术方案如下：In order to solve the problems of the technologies described above, the technical solution of the present invention is as follows:

一种基于深度学习的网络监督细粒度图像识别方法，包括以下步骤：A network supervised fine-grained image recognition method based on deep learning, comprising the following steps:

S1：从互联网中获取含有噪声标签的输入图像；S1: Obtain an input image with noisy labels from the Internet;

S2：对所述含有噪声标签的输入图像进行特征提取，获取区域判别特征图和整体特征图；S2: Perform feature extraction on the input image containing the noise label, and obtain a region discrimination feature map and an overall feature map;

S3：根据所获得的区域判别特征图和整体特征图，获取含有噪声标签特征的实例图；S3: According to the obtained region discriminant feature map and overall feature map, obtain an instance map containing noise label features;

S4：根据所获取的含有噪声标签特征的实例图，为每个类别构造图原型；S4: Construct a graph prototype for each category according to the obtained instance graph containing noise label features;

S5：将所获得的含有噪声标签特征的实例图与图原型输入预置的图匹配神经网络模型中进行训练，获得优化后的图匹配神经网络模型；S5: Input the obtained instance graph and graph prototype containing noise label features into the preset graph matching neural network model for training, and obtain the optimized graph matching neural network model;

S6：获取待识别图像，提取待识别图像特征后，利用所述优化后的图匹配神经网络模型对待识别图像进行识别，获得待识别图像的识别结果。S6: Acquiring the image to be recognized, extracting the features of the image to be recognized, using the optimized graph matching neural network model to recognize the image to be recognized, and obtaining a recognition result of the image to be recognized.

优选地，所述步骤S2中，对所述含有噪声标签的输入图像进行特征提取，获取区域判别特征图和整体特征图，具体方法为：Preferably, in the step S2, the feature extraction is performed on the input image containing the noise label, and the region discriminant feature map and the overall feature map are obtained, the specific method is:

用特征提取器对所述含有噪声标签的输入图像进行特征提取，获取整体特征图；将所述整体特征图通过一个卷积层，获取均值滤波后的整体特征图；对所述均值滤波后的整体特征图基于通道数计算每个位置的均值，获取整体均值特征图；搜寻整体均值特征图中的最大响应值区域，并定位最大响应值区域的坐标，根据最大响应值区域的坐标获取区域判别特征图。Carry out feature extraction to the input image that contains noise label with feature extractor, obtain overall feature map; Pass described overall feature map through a convolutional layer, obtain the overall feature map after mean value filtering; After described mean value filtering The overall feature map calculates the mean value of each position based on the number of channels, and obtains the overall mean feature map; searches for the maximum response value area in the overall mean feature map, and locates the coordinates of the maximum response value area, and obtains area discrimination based on the coordinates of the maximum response value area feature map.

优选地，所述搜寻整体均值特征图中的最大响应值区域，并定位最大响应值区域的坐标的具体方法为：Preferably, the specific method of searching for the maximum response value area in the overall mean feature map and locating the coordinates of the maximum response value area is:

根据以下公式进行搜寻整体均值特征图中的最大响应值区域，并定位最大响应值区域的坐标：Search for the maximum response value area in the overall mean feature map according to the following formula, and locate the coordinates of the maximum response value area:

其中，

表示整体均值特征图，f‘_g表示均值滤波后的整体特征图，C表示均值滤波后的整体特征图的通道数，

表示搜寻最大响应值区域对应的行和列，(i,j)表示最大响应值区域的坐标。in,

Represents the overall mean feature map, f' _g represents the overall feature map after mean filtering, C represents the number of channels of the overall feature map after mean filtering,

Indicates the row and column corresponding to the region of the maximum response value searched, and (i,j) represents the coordinates of the region of the maximum response value.

优选地，所述步骤S3中，根据所获得的区域判别特征图和整体特征图，获取含有噪声标签特征的实例图，具体方法为：Preferably, in the step S3, according to the obtained region discrimination feature map and the overall feature map, an instance map containing noise label features is obtained, and the specific method is as follows:

将所获得的区域判别特征图采用双线性插值的方法变换为相同的维度，获取相同维度的区域特征图；利用全局平均池化的方法对整体特征图和相同维度的区域特征图进行降维，获取降维后的整体特征图和降维后的区域特征图；根据降维后的整体特征图和降维后的区域特征图获取含有噪声标签特征的实例图：The obtained region discriminant feature map is transformed into the same dimension by bilinear interpolation method, and the regional feature map of the same dimension is obtained; the global average pooling method is used to reduce the dimensionality of the overall feature map and the regional feature map of the same dimension , to obtain the overall feature map after dimensionality reduction and the regional feature map after dimensionality reduction; according to the overall feature map after dimensionality reduction and the regional feature map after dimensionality reduction, an instance map containing noise label features is obtained:

G_ins＝<V_ins,E_ins>G _ins ＝<V _ins ,E _ins >

其中，G_ins表示含有噪声标签特征的实例图，V_ins表示降维后的整体特征图和降维后的区域特征图中所有特征点的集合，E_ins表示含有噪声标签特征的实例图中特征点之间连接的邻接矩阵。Among them, G _ins represents the instance map containing noise label features, V _ins represents the set of all feature points in the overall feature map after dimension reduction and the region feature map after dimension reduction, and E _ins represents the feature in the instance map containing noise label features Adjacency matrix of connections between points.

优选地，所述步骤S4中，根据所获取的含有噪声标签特征的实例图，构造图原型的具体方法为：Preferably, in the step S4, according to the obtained instance graph containing noise label features, the specific method of constructing the graph prototype is as follows:

根据所获取的含有噪声标签特征的实例图，为每个类别构造一个与所述含有噪声标签特征的实例图相同结构的图原型，图原型采用移动平均的方式进行更新：According to the obtained instance graph containing noise label features, construct a graph prototype with the same structure as the instance graph containing noise label features for each category, and the graph prototype is updated by moving average:

G_k＝<V_k,E_k>G _k ＝<V _k , E _k >

其中，G_k表示所构建的第k个类别的图原型，V_k表示第k个类别的图原型中所有特征点的集合，E_k表示第k个类别的图原型中特征点之间连接的邻接矩阵，G'_k为更新后的图原型，m为预设参数。Among them, G _k represents the constructed graph prototype of the k-th category, V _k represents the collection of all feature points in the graph prototype of the k-th category, and E _k represents the connection between feature points in the graph prototype of the k-th category Adjacency matrix, G' _k is the updated graph prototype, m is the preset parameter.

优选地，所述步骤S5中，将所获得的含有噪声标签特征的实例图与图原型输入预置的图匹配神经网络模型中进行训练，获得优化后的图匹配神经网络模型，具体方法为：Preferably, in the step S5, the obtained instance graph and graph prototype containing noise label features are input into a preset graph matching neural network model for training to obtain an optimized graph matching neural network model, the specific method is:

所述预置的图匹配神经网络模型包括图内传播层、图聚合层、图间传播层和图匹配层，获得优化后的图匹配神经网络模型包括以下步骤；The preset graph matching neural network model includes a graph intra-graph propagation layer, a graph aggregation layer, an inter-graph propagation layer and a graph matching layer, and obtaining the optimized graph matching neural network model includes the following steps;

S5.1：将所获得的含有噪声标签特征的实例图G_ins与图原型G_k输入图内传播层，获得第一特征矩阵和第二特征矩阵，将第一特征矩阵和第二特征矩阵分别通过图卷积操作进行迭代更新；S5.1: Input the obtained instance graph G _ins and graph prototype G _k containing noise label features into the in-graph propagation layer to obtain the first feature matrix and the second feature matrix, and the first feature matrix and the second feature matrix respectively Iterative update by graph convolution operation;

S5.2：将迭代更新后的第一特征矩阵和第二特征矩阵输入所述图聚合层进行特征结合，获得聚合特征向量；S5.2: Input the iteratively updated first feature matrix and second feature matrix into the graph aggregation layer for feature combination to obtain an aggregated feature vector;

S5.3：将所述聚合特征向量输入图间传播层进行图卷积操作，并迭代更新所述聚合特征向量，获得第一特征表达f_ins和第二特征表达Z_k；S5.3: Input the aggregated feature vector into the inter-graph propagation layer to perform a graph convolution operation, and iteratively update the aggregated feature vector to obtain the first feature expression f _ins and the second feature expression Z _k ;

S5.4：将第一特征表达f_ins和第二特征表达Z_k输入图匹配层计算相似度S_k，根据相似度S_k计算图匹配损失

S5.4: Input the first feature expression f _ins and the second feature expression Z _k into the graph matching layer to calculate the similarity S _k , and calculate the graph matching loss according to the similarity S _k

S5.5：对含有噪声标签特征的实例图中的噪声标签进行修正以及对离群样本进行剔除；S5.5: Correct the noise labels in the instance graph containing noise label features and eliminate outlier samples;

S5.6：计算分类交叉熵损失

和总损失

根据总损失

对所述图匹配神经网络模型进行优化，获得优化后的图匹配神经网络模型。S5.6: Compute categorical cross-entropy loss

and total loss

According to the total loss

The graph matching neural network model is optimized to obtain the optimized graph matching neural network model.

优选地，所述步骤S5.4中，将第一特征表达f_ins和第二特征表达Z_k输入图匹配层计算相似度S_k，根据相似度S_k计算图匹配损失

具体为：Preferably, in the step S5.4, the first feature expression f _ins and the second feature expression Z _k are input into the graph matching layer to calculate the similarity S _k , and the graph matching loss is calculated according to the similarity S _k

Specifically:

将所述第一特征表达f_ins和第二特征表达Z_k输入图匹配层进行图匹配，并计算相似度S_k，具体为：The first feature expression f _ins and the second feature expression Z _k are input into the graph matching layer for graph matching, and the similarity S _k is calculated, specifically:

所述图匹配层设置图匹配损失函数，根据相似度S_k计算图匹配损失，所述图匹配损失函数具体为：The graph matching layer sets the graph matching loss function, and calculates the graph matching loss according to the similarity S _k , and the graph matching loss function is specifically:

其中，

为图匹配损失，y_i表示原始标签，k表示图原型的类别，K表示图原型的类别总数。in,

is the graph matching loss, y _i represents the original label, k represents the category of the graph prototype, and K represents the total number of categories of the graph prototype.

优选地，所述步骤S5.5中，对含有噪声标签特征的实例图中的噪声标签进行修正以及对离群样本进行剔除，具体方法为：Preferably, in the step S5.5, the noise labels in the instance graph containing noise label features are corrected and the outlier samples are eliminated, the specific method is:

所述图内传播层设置有分类器，将所述含有噪声标签特征的实例图输入分类器中，获得分类器分布概率p_i，计算图匹配分布概率d_i，根据分类器分布概率p_i和图匹配分布概率d_i计算总概率q_i，具体为：The propagation layer in the graph is provided with a classifier, and the instance graph containing noise label features is input into the classifier to obtain the classifier distribution probability p _i , and calculate the graph matching distribution probability d _i , according to the classifier distribution probability p _i and The graph matching distribution probability d _i calculates the total probability q _i , specifically:

q_i＝αp_i+(1-α)d_i q _i =αp _i +(1-α)d _i

其中，α为预设参数，τ为温度系数；Among them, α is a preset parameter, and τ is a temperature coefficient;

根据总概率q_i和预设阈值T对含有噪声标签特征的实例图中的噪声标签进行修正以及对离群样本OOD进行剔除，具体为：According to the total probability q _i and the preset threshold T, the noise labels in the instance graph containing noise label features are corrected and the outlier sample OOD is eliminated, specifically:

其中，

为伪标签，T为预设阈值，当总概率q_i的最大值大于T时，将总概率q_i最大值对应的类别作为伪标签；当总概率q_i大于类别平均概率时，将原始标签y_i作为伪标签，实现对含有噪声标签特征的实例图中的噪声标签进行修正；其他情况将OOD作为伪标签，OOD表示离群样本，实现对离群样本的剔除。in,

is a pseudo label, T is a preset threshold, when the maximum value of the total probability q _i is greater than T, the category corresponding to the maximum value of the total probability q _i is used as a pseudo label; when the total probability q _i is greater than the average probability of the category, the original label y _i is used as a pseudo-label to correct the noise label in the instance graph containing noise label features; in other cases, OOD is used as a pseudo-label, and OOD represents outlier samples to realize the elimination of outlier samples.

优选地，所述步骤S5.6中，计算分类交叉熵损失

和总损失

根据总损失

对所述图匹配神经网络模型进行优化，获得优化后的图匹配神经网络模型，具体方法为：Preferably, in said step S5.6, the classification cross entropy loss is calculated

and total loss

According to the total loss

Optimizing the graph matching neural network model to obtain the optimized graph matching neural network model, the specific method is:

所述图内传播层设置有分类交叉熵损失函数，具体为：The propagation layer in the graph is provided with a classification cross-entropy loss function, specifically:

其中，

为分类交叉熵损失，p_ij为第i张含有噪声标签特征的实例图相对第j个类别的分类器分布概率，

为第i张含有噪声标签特征的实例图相对第j个类别的伪标签；in,

is the classification cross-entropy loss, p _ij is the classifier distribution probability of the i-th instance image containing noise label features relative to the j-th category,

is the pseudo-label of the i-th instance image containing noise label features relative to the j-th category;

根据分类交叉熵损失函数和图匹配损失函数构建总损失函数，所述总损失函数具体为：The total loss function is constructed according to the classification cross-entropy loss function and the graph matching loss function, and the total loss function is specifically:

其中，

为总损失，λ_pro为比例系数；in,

is the total loss, λ _pro is the proportional coefficient;

根据总损失

对所述图匹配神经网络模型进行优化，获得优化后的图匹配神经网络模型。According to the total loss

本发明还提供一种基于深度学习的网络监督细粒度图像识别系统，应用上述一种基于深度学习的网络监督细粒度图像识别方法，包括：The present invention also provides a network-supervised fine-grained image recognition system based on deep learning, applying the above-mentioned network-supervised fine-grained image recognition method based on deep learning, including:

图像获取单元：用来从互联网中获取含有噪声标签的输入图像；Image acquisition unit: used to acquire input images containing noise labels from the Internet;

特征提取单元：用来对所述含有噪声标签的输入图像进行特征提取，获取区域判别特征图和整体特征图；Feature extraction unit: used to perform feature extraction on the input image containing the noise label, and obtain a region discriminant feature map and an overall feature map;

实例图生成单元：用来根据所获得的区域判别特征图和整体特征图，获取含有噪声标签特征的实例图；Instance map generation unit: used to obtain an instance map containing noise label features according to the obtained regional discrimination feature map and overall feature map;

图原型构造单元：用来根据所获取的含有噪声标签特征的实例图，为每个类别构造图原型；Graph prototype construction unit: used to construct a graph prototype for each category based on the obtained instance graph containing noise label features;

图匹配单元：用来将所获得的含有噪声标签特征的实例图与图原型输入预置的图匹配神经网络模型中进行训练，获得优化后的图匹配神经网络模型；Graph matching unit: used to input the obtained instance graph and graph prototype containing noise label features into the preset graph matching neural network model for training, and obtain the optimized graph matching neural network model;

图像识别单元：用来获取待识别图像，提取待识别图像特征后，利用所述优化后的图匹配神经网络模型对待识别图像进行识别，获得待识别图像的识别结果。Image recognition unit: used to obtain the image to be recognized, extract the features of the image to be recognized, use the optimized graph matching neural network model to recognize the image to be recognized, and obtain the recognition result of the image to be recognized.

与现有技术相比，本发明技术方案的有益效果是：Compared with the prior art, the beneficial effects of the technical solution of the present invention are:

本发明提供一种基于深度学习的网络监督细粒度图像识别方法和系统，该方法通过对含有噪声标签的输入图像进行特征处理，获取含有噪声标签特征的实例图，利用含有噪声标签特征的实例图为每个类别构建一个对应的图原型，用所获得的含有噪声标签特征的实例图与图原型对预置的图像匹配神经网络模型中进行训练以及噪声标签的修正，利用优化后的图像匹配神经网络模型进行细粒度图像的识别；该方法基于深度学习进行网络监督细粒度图像的识别，通过引入图原型与含有噪声标签特征的实例图进行对比学习，能够有效地对噪声标签进行校正，显著提高了细粒度图像识别的效率和准确率。The present invention provides a method and system for network-supervised fine-grained image recognition based on deep learning. The method obtains an instance map containing noise label features by performing feature processing on an input image containing noise labels, and utilizes the instance map containing noise label features Construct a corresponding graph prototype for each category, use the obtained instance graph and graph prototype containing noise label features to train the preset image matching neural network model and correct the noise label, and use the optimized image matching neural network Network model for fine-grained image recognition; this method is based on deep learning for network-supervised fine-grained image recognition. By introducing graph prototypes and instance graphs with noise label features for comparative learning, noise labels can be effectively corrected and significantly improved. Improve the efficiency and accuracy of fine-grained image recognition.

附图说明Description of drawings

图1为实施例1所提供的一种基于深度学习的网络监督细粒度图像识别方法流程图。FIG. 1 is a flow chart of a network-supervised fine-grained image recognition method based on deep learning provided in Embodiment 1.

图2为实施例2所提供的一种基于深度学习的网络监督细粒度图像识别方法示意图。FIG. 2 is a schematic diagram of a network-supervised fine-grained image recognition method based on deep learning provided in Embodiment 2.

图3为实施例3所提供的一种基于深度学习的网络监督细粒度图像识别系统结构图。FIG. 3 is a structural diagram of a network-supervised fine-grained image recognition system based on deep learning provided in Embodiment 3.

301-图像获取单元，302-特征提取单元，303-实例图生成单元，304-图原型构造单元，305-图匹配单元，306-图像识别单元。301-image acquisition unit, 302-feature extraction unit, 303-instance graph generation unit, 304-graph prototype construction unit, 305-graph matching unit, 306-image recognition unit.

具体实施方式detailed description

附图仅用于示例性说明，不能理解为对本专利的限制；The accompanying drawings are for illustrative purposes only and cannot be construed as limiting the patent;

为了更好说明本实施例，附图某些部件会有省略、放大或缩小，并不代表实际产品的尺寸；In order to better illustrate this embodiment, some parts in the drawings will be omitted, enlarged or reduced, and do not represent the size of the actual product;

对于本领域技术人员来说，附图中某些公知结构及其说明可能省略是可以理解的。For those skilled in the art, it is understandable that some well-known structures and descriptions thereof may be omitted in the drawings.

下面结合附图和实施例对本发明的技术方案做进一步的说明。The technical solutions of the present invention will be further described below in conjunction with the accompanying drawings and embodiments.

实施例1Example 1

如图1所示，本实施例提供一种基于深度学习的网络监督细粒度图像识别方法，包括以下步骤：As shown in Fig. 1, the present embodiment provides a network supervision fine-grained image recognition method based on deep learning, comprising the following steps:

在具体实施过程中，首先通过网络检索获取含有噪声标签的输入图像，之后用CNN卷积神经网络对所述含有噪声标签的输入图像进行特征提取，获取区域判别特征图和整体特征图，之后根据所获得的区域判别特征图和整体特征图获取含有噪声标签特征的实例图，之后根据含有噪声标签特征的实例图为每个类别构建一个对应的图原型，之后将所获得的含有噪声标签特征的实例图与图原型输入预置的图匹配神经网络模型中进行训练，并计算图匹配损失和分类交叉熵损失进行优化神经网络，获得优化后的图匹配神经网络模型，最后利用所述优化后的图匹配神经网络模型对待识别图像进行识别，获得待识别图像的识别结果；In the specific implementation process, firstly, the input image containing the noise label is obtained through network retrieval, and then the CNN convolutional neural network is used to extract the feature of the input image containing the noise label, and the region discrimination feature map and the overall feature map are obtained, and then according to The obtained region discriminative feature map and the overall feature map obtain the instance map containing the noise label feature, and then construct a corresponding graph prototype for each category according to the instance map containing the noise label feature, and then use the obtained noise label feature The instance graph and graph prototype are input into the preset graph matching neural network model for training, and the graph matching loss and classification cross-entropy loss are calculated to optimize the neural network to obtain the optimized graph matching neural network model, and finally use the optimized graph matching neural network model The graph matching neural network model recognizes the image to be recognized, and obtains the recognition result of the image to be recognized;

该方法基于深度学习进行细粒度图像的识别，通过引入图原型与含有噪声标签特征的实例图进行对比学习，能够有效地对噪声标签进行校正，显著提高了细粒度图像识别的效率和准确率。This method is based on deep learning for fine-grained image recognition. By introducing graph prototypes and instance graphs with noise label features for comparative learning, it can effectively correct noise labels and significantly improve the efficiency and accuracy of fine-grained image recognition.

实施例2Example 2

如图2所示，本实施例提供一种基于深度学习的网络监督细粒度图像识别方法，包括以下步骤：As shown in Figure 2, the present embodiment provides a network supervision fine-grained image recognition method based on deep learning, including the following steps:

S2：对所述含有噪声标签的输入图像进行特征提取，获取区域判别特征图和整体特征图，具体方法为：S2: Perform feature extraction on the input image containing the noise label, and obtain the region discriminant feature map and the overall feature map, the specific method is:

用特征提取器对所述含有噪声标签的输入图像进行特征提取，获取整体特征图；将所述整体特征图通过一个卷积层，获取均值滤波后的整体特征图；对所述均值滤波后的整体特征图基于通道数计算每个位置的均值，获取整体均值特征图；搜寻整体均值特征图中的最大响应值区域，并定位最大响应值区域的坐标，根据最大响应值区域的坐标获取区域判别特征图；Carry out feature extraction to the input image that contains noise label with feature extractor, obtain overall feature map; Pass described overall feature map through a convolutional layer, obtain the overall feature map after mean value filtering; After described mean value filtering The overall feature map calculates the mean value of each position based on the number of channels, and obtains the overall mean feature map; searches for the maximum response value area in the overall mean feature map, and locates the coordinates of the maximum response value area, and obtains area discrimination based on the coordinates of the maximum response value area feature map;

所述搜寻整体均值特征图中的最大响应值区域，并定位最大响应值区域的坐标的具体方法为：The specific method of searching for the maximum response value area in the overall mean feature map and locating the coordinates of the maximum response value area is:

其中，

表示搜寻最大响应值区域对应的行和列，(i,j)表示最大响应值区域的坐标；in,

Indicates the row and column corresponding to the area of maximum response value searched, and (i,j) indicates the coordinates of the area of maximum response value;

S3：根据所获得的区域判别特征图和整体特征图，获取含有噪声标签特征的实例图，具体方法为：S3: According to the obtained region discriminant feature map and overall feature map, obtain an instance map containing noise label features, the specific method is:

G_ins＝<V_ins,E_ins>G _ins ＝<V _ins ,E _ins >

其中，G_ins表示含有噪声标签特征的实例图，V_ins表示降维后的整体特征图和降维后的区域特征图中所有特征点的集合，E_ins表示含有噪声标签特征的实例图中特征点之间连接的邻接矩阵；Among them, G _ins represents the instance map containing noise label features, V _ins represents the set of all feature points in the overall feature map after dimension reduction and the region feature map after dimension reduction, and E _ins represents the feature in the instance map containing noise label features adjacency matrix of connections between points;

S4：根据所获取的含有噪声标签特征的实例图，为每个类别构造图原型，具体方法为：S4: Construct a graph prototype for each category according to the obtained instance graph containing noise label features, the specific method is:

G_k＝<V_k,E_k>G _k ＝<V _k , E _k >

其中，G_k表示所构建的第k个类别的图原型，V_k表示第k个类别的图原型中所有特征点的集合，E_k表示第k个类别的图原型中特征点之间连接的邻接矩阵，G'_k为更新后的图原型，m为预设参数；Among them, G _k represents the constructed graph prototype of the k-th category, V _k represents the collection of all feature points in the graph prototype of the k-th category, and E _k represents the connection between feature points in the graph prototype of the k-th category Adjacency matrix, G' _k is the updated graph prototype, m is the preset parameter;

所述预置的图匹配神经网络模型包括图内传播层、图聚合层、图间传播层和图匹配层，获取优化后的图匹配神经网络模型包括以下步骤；The preset graph matching neural network model includes a graph intra-graph propagation layer, a graph aggregation layer, an inter-graph propagation layer and a graph matching layer, and obtaining the optimized graph matching neural network model includes the following steps;

S5.1：将所获得的含有噪声标签特征的实例图G_ins与图原型G_k输入图内传播层，获得第一特征矩阵和第二特征矩阵，将第一特征矩阵和第二特征矩阵分别通过图卷积操作进行迭代更新，具体为：S5.1: Input the obtained instance graph G _ins and graph prototype G _k containing noise label features into the in-graph propagation layer to obtain the first feature matrix and the second feature matrix, and the first feature matrix and the second feature matrix respectively Iterative updates are performed through graph convolution operations, specifically:

将所获得的含有噪声标签特征的实例图G_ins与图原型G_k输入图内传播层，将降维后的整体特征图和降维后的区域特征图中所有特征点的集合V_ins重构为第一特征矩阵

其中，n₁为含有噪声标签特征的实例图所有特征点的数量，c₁为含有噪声标签特征的实例图中每个特征点对应的维度；Input the obtained instance graph G _ins containing noise label features and the graph prototype G _k into the in-graph propagation layer, and reconstruct the set V _ins of all feature points in the overall feature map after dimensionality reduction and the regional feature map after dimensionality reduction is the first characteristic matrix

Among them, n ₁ is the number of all feature points in the instance graph containing noise label features, and c ₁ is the dimension corresponding to each feature point in the instance graph containing noise label features;

将图原型中所有特征点的集合V_k重构为第二特征矩阵

其中，n₂为图原型中所有特征点的数量，c₂为图原型中每个特征点对应的维度；Reconstruct the set V _k of all feature points in the graph prototype into the second feature matrix

Among them, n ₂ is the number of all feature points in the graph prototype, and c ₂ is the dimension corresponding to each feature point in the graph prototype;

对所述第一特征矩阵和第二特征矩阵分别进行图卷积操作，并迭代更新所述第一特征矩阵和第二特征矩阵，具体为：Perform graph convolution operations on the first feature matrix and the second feature matrix, and iteratively update the first feature matrix and the second feature matrix, specifically:

其中，

为第l次迭代更新后的第一特征矩阵，

为第l次迭代更新后的第二特征矩阵，

和

为图内传播层的参数；in,

is the first feature matrix after the l-th iteration update,

is the second feature matrix after the l-th iteration update,

with

is the parameter of the propagation layer in the graph;

S5.2：将迭代更新后的第一特征矩阵和第二特征矩阵输入所述图聚合层进行特征结合，获得聚合特征向量，具体为：S5.2: Input the iteratively updated first feature matrix and second feature matrix into the graph aggregation layer for feature combination to obtain an aggregated feature vector, specifically:

将迭代更新后的第一特征矩阵和第二特征矩阵输入所述图像聚合层进行特征结合，获得聚合特征向量，具体为：Input the iteratively updated first feature matrix and the second feature matrix into the image aggregation layer for feature combination to obtain an aggregated feature vector, specifically:

其中，

为聚合特征向量，

为更新后的第一特征矩阵，

为更新后的第二特征矩阵；in,

is the aggregated feature vector,

is the updated first feature matrix,

is the updated second feature matrix;

S5.3：将所述聚合特征向量输入图间传播层进行图卷积操作，并迭代更新所述聚合特征向量，获得第一特征表达f_ins和第二特征表达Z_k，具体为：S5.3: Input the aggregated feature vector into the inter-graph propagation layer for graph convolution operation, and iteratively update the aggregated feature vector to obtain the first feature expression f _ins and the second feature expression Z _k , specifically:

将所述聚合特征向量输入图间传播层进行图卷积操作，并迭代更新所述聚合特征向量，具体为：Input the aggregated feature vector into the inter-graph propagation layer for graph convolution operation, and iteratively update the aggregated feature vector, specifically:

其中，

为第l次迭代更新后的聚合特征向量，E_cross为聚合特征向量的邻接矩阵，

和

为图间传播层的参数；in,

E _cross is the adjacency matrix of the aggregated feature vector,

with

is the parameter of the inter-graph propagation layer;

根据第l次迭代更新后的聚合特征向量获得第一特征表达f_ins和第二特征表达Z_k；Obtain the first feature expression f _ins and the second feature expression Z _k according to the aggregated feature vector updated in the l iteration;

具体为：S5.4: Input the first feature expression f _ins and the second feature expression Z _k into the graph matching layer to calculate the similarity S _k , and calculate the graph matching loss according to the similarity S _k

Specifically:

其中，

为图匹配损失，y_i表示原始标签，k表示图原型的类别，K表示图原型的类别总数；in,

is the graph matching loss, y _i represents the original label, k represents the category of the graph prototype, and K represents the total number of categories of the graph prototype;

S5.5：对含有噪声标签特征的实例图中的噪声标签进行修正以及对离群样本进行剔除，具体为：S5.5: Correct the noise labels in the instance graph containing noise label features and eliminate outlier samples, specifically:

q_i＝αp_i+(1-α)d_i q _i =αp _i +(1-α)d _i

其中，

为伪标签，T为预设阈值，当总概率q_i的最大值大于T时，将总概率q_i最大值对应的类别作为伪标签；当总概率q_i大于类别平均概率时，将原始标签y_i作为伪标签，实现对含有噪声标签特征的实例图中的噪声标签进行修正；其他情况将OOD作为伪标签，OOD表示离群样本，实现对离群样本的剔除；in,

is a pseudo label, T is a preset threshold, when the maximum value of the total probability q _i is greater than T, the category corresponding to the maximum value of the total probability q _i is used as a pseudo label; when the total probability q _i is greater than the average probability of the category, the original label y _i is used as a pseudo-label to correct the noise label in the instance graph containing noise label features; in other cases, OOD is used as a pseudo-label, and OOD represents an outlier sample to realize the elimination of outlier samples;

S5.6：计算分类交叉熵损失

和总损失

根据总损失

对所述图匹配神经网络模型进行优化，获得优化后的图匹配神经网络模型，具体为：S5.6: Compute categorical cross-entropy loss

and total loss

According to the total loss

The graph matching neural network model is optimized to obtain the optimized graph matching neural network model, specifically:

其中，

其中，

为总损失，λ_pro为比例系数；in,

is the total loss, λ _pro is the proportional coefficient;

根据总损失

对所述图匹配神经网络模型进行优化，获得优化后的图匹配神经网络模型；According to the total loss

Optimizing the graph matching neural network model to obtain the optimized graph matching neural network model;

S6：获取待识别图像，提取待识别图像特征后，利用所述优化后的图匹配神经网络模型对待识别图像进行识别，获得待识别图像的识别结果。。S6: Acquiring the image to be recognized, extracting the features of the image to be recognized, using the optimized graph matching neural network model to recognize the image to be recognized, and obtaining a recognition result of the image to be recognized. .

在具体实施过程中，首先通过网络检索获取含有噪声标签的输入图像，本实施例中所使用的数据集为WebFG-496，该数据集由三个子数据集组成，分别为Web-Bird、Web-Aircraft和Web-Car，所述含有噪声标签的输入图像尺寸为448×448；In the specific implementation process, firstly, the input image containing noise labels is retrieved through the network. The data set used in this embodiment is WebFG-496, which consists of three sub-data sets, namely Web-Bird, Web-Bird, and Web-Bird. Aircraft and Web-Car, the size of the input image containing noise labels is 448×448;

之后设置以ResNet50-varian作为骨干CNN的卷积神经网络，用特征提取器对所述含有噪声标签的输入图像进行特征提取，获取整体特征图，所述整体特征图维度为14×14×2048；将所述整体特征图通过一个卷积层，获取均值滤波后的整体特征图；对所述均值滤波后的整体特征图基于通道数计算每个位置的均值，获取整体均值特征图；Afterwards, the convolutional neural network with ResNet50-varian as the backbone CNN is set, and the feature extractor is used to extract the features of the input image containing the noise label to obtain the overall feature map, and the overall feature map dimension is 14 * 14 * 2048; passing the overall feature map through a convolutional layer to obtain an overall feature map after mean filtering; calculating the mean value of each position based on the number of channels for the overall feature map after mean filtering to obtain an overall mean feature map;

其中，

根据所获得的最大值响应区域的坐标在所述整体特征图中截取若干不同大小的局部区域，本实施例设置三种不同的面积大小S₁、S₂、S₃以及三种不同的长宽比A₁、A₂、A₃共9种组合，对所述整体特征图进行截取，其中三种不同面积大小S₁、S₂、S₃分别为整体特征图面积的二分之一、三分之一、三分之二，三类不同的长宽比值A₁、A₂、A₃分别为1、0.5、2；According to the coordinates of the obtained maximum response area, several local areas of different sizes are intercepted in the overall feature map. In this embodiment, three different area sizes S ₁ , S ₂ , S ₃ and three different lengths and widths are set. Compared with A ₁ , A ₂ , and A ₃ in total 9 combinations, the overall feature map is intercepted, and the three different area sizes S ₁ , S ₂ , and S ₃ are respectively one-half and three times the area of the overall feature map. One-third, two-thirds, three different aspect ratios A ₁ , A ₂ , A ₃ are 1, 0.5, 2 respectively;

用特征提取器对所截取的若干不同大小的局部区域进行特征提取，获取区域判别特征图；Use the feature extractor to perform feature extraction on several intercepted local areas of different sizes, and obtain the area discriminant feature map;

构建含有噪声标签特征的实例图和每个类别对应的图原型，将得到的含有噪声标签特征的实例图和图原型分别输入图内传播层GCN进行图卷积操作，本实施例中，输出通道数分别为1024和2048；将输出的含有噪声标签特征的实例图和图原型特征进行聚合，并获得第一特征表达f_ins和第二特征表达Z_k；根据第一特征表达f_ins和第二特征表达Z_k分别计算图匹配损失和分类交叉熵损失来对图匹配神经网络模型进行优化；Construct an instance graph containing noise label features and a graph prototype corresponding to each category, and input the obtained instance graph and graph prototype containing noise label features into the graph propagation layer GCN for graph convolution operation. In this embodiment, the output channel The numbers are 1024 and 2048 respectively; aggregate the output instance graphs containing noise label features and graph-based features, and obtain the first feature expression f _ins and the second feature expression Z _k ; according to the first feature expression f _ins and the second The feature expression Z _k calculates the graph matching loss and classification cross entropy loss respectively to optimize the graph matching neural network model;

本实施例中，α＝0.5，τ＝0.1，T＝0.75，λ_pro＝1；In this embodiment, α=0.5, τ=0.1, T=0.75, λ _pro =1;

从CUB200-2011、FGVC-Aircraft和Stanford Cars中获取待识别图像作为验证数据，提取待识别图像的特征后，利用所述优化后的图像匹配神经网络模型对待识别图像进行识别，获得待识别图像的识别结果；Obtain the image to be recognized from CUB200-2011, FGVC-Aircraft and Stanford Cars as verification data, extract the features of the image to be recognized, use the optimized image matching neural network model to recognize the image to be recognized, and obtain the image to be recognized recognition result;

如下表所示，为不同方法细粒度图像的识别准确率对比图：As shown in the table below, it is a comparison chart of the recognition accuracy of fine-grained images by different methods:

表1-不同方法细粒度图像的识别准确率对比图Table 1 - Comparison of recognition accuracy of fine-grained images by different methods

与基本模型进行比较，本实施例中的方法在三个数据集上的性能表现都远超于各类基本模型，本实施例使用的骨干网络为ResNet-50，相比于单独ResNet-50模型，本实施例的方法在三个数据集上都有了大幅度的提升，平均识别准确率提升了20.14％；为了进行公平的比较，统一使用ResNet-50作为骨干网络，由图3可知，当使用ResNet-50作为骨干网络时，本实施例的方法取得最高的83.53％的平均准确率，而在Web-Bird、Web-Aircraft和Web-Car上的准确率分别为76.62％、85.79％和82.09％，比目前较为先进的方法Peer-learning高出2.23％、4.2％和1.94％；更进一步地使用其它模型如B-CNN作为骨干网络，从比较结果中可知，本实施例的方法可与不同的骨干网络进行适配，从而在细粒度图像识别中得到较为明显的性能提升；Compared with the basic model, the performance of the method in this example on the three data sets is far superior to that of various basic models. The backbone network used in this example is ResNet-50. Compared with the single ResNet-50 model , the method of this embodiment has greatly improved on the three data sets, and the average recognition accuracy rate has increased by 20.14%. In order to make a fair comparison, ResNet-50 is uniformly used as the backbone network. When using ResNet-50 as the backbone network, the method of this embodiment achieves the highest average accuracy rate of 83.53%, while the accuracy rates on Web-Bird, Web-Aircraft and Web-Car are 76.62%, 85.79% and 82.09% respectively %, 2.23%, 4.2% and 1.94% higher than the currently more advanced method Peer-learning; further use other models such as B-CNN as the backbone network, as can be seen from the comparison results, the method of this embodiment can be different from The backbone network is adapted to obtain a more obvious performance improvement in fine-grained image recognition;

该方法基于深度学习进行网络监督细粒度图像的识别，通过引入图原型与含有噪声标签特征的实例图进行对比学习，能够有效地对噪声标签进行校正，显著提高了细粒度图像识别的效率和准确率。This method is based on deep learning for network-supervised fine-grained image recognition. By introducing graph prototypes and instance graphs with noisy label features for comparative learning, it can effectively correct noisy labels and significantly improve the efficiency and accuracy of fine-grained image recognition. Rate.

实施例3Example 3

如图3所示，本实施例提供一种基于深度学习的网络监督细粒度图像识别系统，应用实施例1或2所述的基于深度学习的网络监督细粒度图像识别方法，包括：As shown in FIG. 3 , this embodiment provides a network-supervised fine-grained image recognition system based on deep learning, applying the deep-learning-based network-supervised fine-grained image recognition method described in Embodiment 1 or 2, including:

图像获取单元301：用来从互联网中获取含有噪声标签的输入图像；Image acquisition unit 301: used to acquire input images containing noise labels from the Internet;

特征提取单元302：用来对所述含有噪声标签的输入图像进行特征提取，获取区域判别特征图和整体特征图；Feature extraction unit 302: used to perform feature extraction on the input image containing noise labels, and obtain a region discrimination feature map and an overall feature map;

实例图生成单元303：用来根据所获得的区域判别特征图和整体特征图，获取含有噪声标签特征的实例图；Instance map generation unit 303: used to obtain an instance map containing noise label features according to the obtained region discrimination feature map and overall feature map;

图原型构造单元304：用来根据所获取的含有噪声标签特征的实例图，为每个类别构造图原型；Graph prototype construction unit 304: used to construct a graph prototype for each category according to the obtained instance graph containing noise label features;

图匹配单元305：用来将所获得的含有噪声标签特征的实例图与图原型输入预置的图匹配神经网络模型中进行训练，获得优化后的图匹配神经网络模型；Graph matching unit 305: used to input the obtained instance graph and graph prototype containing noise label features into the preset graph matching neural network model for training, and obtain the optimized graph matching neural network model;

图像识别单元306：用来获取待识别图像，提取待识别图像特征后，利用所述优化后的图匹配神经网络模型对待识别图像进行识别，获得待识别图像的识别结果；Image recognition unit 306: used to acquire the image to be recognized, extract the features of the image to be recognized, use the optimized graph matching neural network model to recognize the image to be recognized, and obtain the recognition result of the image to be recognized;

在具体实施过程中，首先利用图像获取单元301进行网络检索，获取含有噪声标签的输入图像；之后利用特征提取单元302对所述含有噪声标签的输入图像进行特征提取，获取区域判别特征图和整体特征图；利用实例图生成单元303根据所获得的区域判别特征图和整体特征图，获取含有噪声标签特征的实例图；之后根据所获取的含有噪声标签特征的实例图，利用图原型构造单元304为每个类别构造图原型；之后利用图匹配单元305将所获得的含有噪声标签特征的实例图与图原型输入预置的图匹配神经网络模型中进行训练，获得优化后的图匹配神经网络模型；最后图像识别单元306获取待识别图像，提取待识别图像特征后，利用所述优化后的图像匹配神经网络模型对待识别图像进行识别，获得待识别图像的识别结果；In the specific implementation process, first use the image acquisition unit 301 to perform network retrieval to obtain the input image containing noise labels; then use the feature extraction unit 302 to perform feature extraction on the input image containing noise labels to obtain the region discrimination feature map and the overall Feature map: Utilize the example map generation unit 303 to obtain an example map containing noise label features according to the obtained region discrimination feature map and overall feature map; then use the graph prototype construction unit 304 according to the obtained example map containing noise label features Construct a graph prototype for each category; then use the graph matching unit 305 to input the obtained instance graph containing noise label features and the graph prototype into the preset graph matching neural network model for training, and obtain the optimized graph matching neural network model ; Finally, the image recognition unit 306 acquires the image to be recognized, extracts the features of the image to be recognized, uses the optimized image matching neural network model to recognize the image to be recognized, and obtains the recognition result of the image to be recognized;

该系统基于深度学习进行细粒度图像的识别，通过引入图原型与含有噪声标签特征的实例图进行对比学习，能够有效地对噪声标签进行校正，显著提高了细粒度图像识别的效率和准确率。The system recognizes fine-grained images based on deep learning, and can effectively correct noisy labels by introducing graph prototypes and instance graphs containing noise label features for comparative learning, significantly improving the efficiency and accuracy of fine-grained image recognition.

相同或相似的标号对应相同或相似的部件；The same or similar reference numerals correspond to the same or similar components;

附图中描述位置关系的用语仅用于示例性说明，不能理解为对本专利的限制；The terms describing the positional relationship in the drawings are only for illustrative purposes and cannot be interpreted as limitations on this patent;

显然，本发明的上述实施例仅仅是为清楚地说明本发明所作的举例，而并非是对本发明的实施方式的限定。对于所属领域的普通技术人员来说，在上述说明的基础上还可以做出其它不同形式的变化或变动。这里无需也无法对所有的实施方式予以穷举。凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等，均应包含在本发明权利要求的保护范围之内。Apparently, the above-mentioned embodiments of the present invention are only examples for clearly illustrating the present invention, rather than limiting the implementation of the present invention. For those of ordinary skill in the art, other changes or changes in different forms can be made on the basis of the above description. It is not necessary and impossible to exhaustively list all the implementation manners here. All modifications, equivalent replacements and improvements made within the spirit and principles of the present invention shall be included within the protection scope of the claims of the present invention.

Claims

1. A network supervision fine-grained image recognition method based on deep learning is characterized by comprising the following steps:

s1: acquiring an input image containing a noise label from the Internet;

s2: performing feature extraction on the input image containing the noise label to obtain a region discrimination feature map and an overall feature map;

s3: acquiring an example graph containing noise label features according to the obtained region discrimination feature graph and the overall feature graph;

s4: constructing a graph prototype for each category according to the obtained example graph containing the noise label characteristics;

s5: inputting the obtained example graph containing the noise label characteristics and the graph prototype into a preset graph matching neural network model for training to obtain an optimized graph matching neural network model;

s6: and acquiring an image to be recognized, extracting the characteristics of the image to be recognized, and recognizing the image to be recognized by using the optimized graph matching neural network model to obtain a recognition result of the image to be recognized.

2. The method according to claim 1, wherein in step S2, feature extraction is performed on the input image containing the noise label to obtain a region discrimination feature map and an overall feature map, and the specific method is as follows:

performing feature extraction on the input image containing the noise label by using a feature extractor to obtain an overall feature map; passing the integral characteristic diagram through a convolution layer to obtain an integral characteristic diagram after mean value filtering; calculating the average value of each position of the overall characteristic diagram after the average value filtering based on the number of channels to obtain an overall average value characteristic diagram; searching a maximum response value area in the overall mean value characteristic diagram, positioning the coordinate of the maximum response value area, and acquiring an area judgment characteristic diagram according to the coordinate of the maximum response value area.

3. The method for identifying the network supervision fine-grained image based on the deep learning according to claim 2, wherein the specific method for searching the maximum response value area in the overall mean value feature map and locating the coordinate of the maximum response value area comprises the following steps:

searching a maximum response value area in the overall mean value characteristic diagram according to the following formula, and positioning the coordinates of the maximum response value area:

wherein,

feature graph representing the overall mean, f _g ' denotes a mean-filtered global feature map, C denotes the number of channels of the mean-filtered global feature map,

the method is characterized in that the row and the column corresponding to the area with the maximum response value are searched, and (i, j) represents the coordinate of the area with the maximum response value.

4. The method according to claim 3, wherein in step S3, an instance graph containing noise label features is obtained according to the obtained region discrimination feature map and the global feature map, and the specific method is as follows:

converting the obtained region distinguishing feature map into the same dimension by a bilinear interpolation method to obtain a region feature map with the same dimension; reducing dimensions of the overall feature map and the regional feature map with the same dimensions by using a global average pooling method to obtain the overall feature map after dimension reduction and the regional feature map after dimension reduction; acquiring an example graph containing noise label features according to the overall feature graph after dimensionality reduction and the regional feature graph after dimensionality reduction:

G _ins ＝<V _ins ,E _ins >

wherein, G _ins Example graph, V, representing features containing noise labels _ins Representing the set of all feature points in the overall feature map after dimension reduction and the regional feature map after dimension reduction, E _ins A adjacency matrix representing the connections between feature points in the example graph containing the noise label features.

5. The method according to claim 4, wherein in step S4, according to the obtained example graph containing the noise label feature, a concrete method for constructing a graph prototype comprises:

according to the obtained example graph containing the noise label features, constructing a graph prototype with the same structure as the example graph containing the noise label features for each category, wherein the graph prototype is updated in a moving average mode:

G _k ＝<V _k ,E _k >

wherein G is _k Graph primitive type, V, representing the kth class constructed _k Set of all feature points in the prototype of the graph representing the kth class, E _k Adjacent matrix, G' _k For the updated graph prototype, m is a preset parameter.

6. The method for identifying network supervision fine-grained images based on deep learning according to claim 5, wherein in the step S5, the obtained example graph containing the noise label features and the graph primitive type are input into a preset graph matching neural network model for training to obtain an optimized graph matching neural network model, and the specific method is as follows:

the preset graph matching neural network model comprises an intra-graph propagation layer, a graph aggregation layer, an inter-graph propagation layer and a graph matching layer, and the step of obtaining the optimized graph matching neural network model comprises the following steps;

s5.1: the obtained example graph G containing the noise label characteristics _ins And graph original form G _k Inputting a propagation layer in the graph, obtaining a first characteristic matrix and a second characteristic matrix, and respectively carrying out iterative updating on the first characteristic matrix and the second characteristic matrix through graph convolution operation;

s5.2: inputting the first feature matrix and the second feature matrix after iterative updating into the graph aggregation layer for feature combination to obtain an aggregation feature vector;

s5.3: inputting the aggregation characteristic vector into an inter-graph propagation layer for graph convolution operation, and iteratively updating the aggregation characteristicEigenvector to obtain a first feature expression f _ins And a second characteristic expression Z _k ；

S5.4: expressing the first characteristic f _ins And a second characteristic expression Z _k Input graph matching layer calculation similarity S _k According to the similarity S _k Calculating graph match penalty

S5.5: correcting the noise labels in the example graph containing the noise label characteristics and removing outlier samples;

s5.6: computing categorical cross entropy loss

And total loss

According to total loss

And optimizing the graph matching neural network model to obtain the optimized graph matching neural network model.

7. The method as claimed in claim 6, wherein in step S5.4, the first feature is expressed as f _ins And the second feature expression Z _k Calculating similarity S of input graph matching layer _k According to the similarity S _k Calculating graph match penalty

The method specifically comprises the following steps:

expressing the first characteristic f _ins And a second characteristic expression Z _k Inputting a graph matching layer to perform graph matching and calculating the similarity S _k The method specifically comprises the following steps:

the graph matching layer sets a graph matching loss function according to the similarity S _k Calculating a graph matching loss, wherein the graph matching loss function is specifically as follows:

wherein,

for graph matching loss, y _i Representing the original label, K representing the category of the diagram prototype, K representing the total number of categories of the diagram prototype.

8. The method for identifying network supervision fine-grained images based on deep learning according to claim 7, wherein in step S5.5, the noise labels in the example graph containing the noise label features are corrected and outlier samples are eliminated, and the specific method is as follows:

the propagation layer in the graph is provided with a classifier, the example graph containing the noise label characteristics is input into the classifier, and the distribution probability p of the classifier is obtained _i Calculating the probability d of the distribution of the matching of the graph _i According to the classifier distribution probability p _i Probability d of distribution of matching with the map _i Calculating the total probability q _i The method specifically comprises the following steps:

q _i ＝αp _i +(1-α)d _i

wherein alpha is a preset parameter, and tau is a temperature coefficient;

according to the total probability q _i And correcting the noise label in the example graph containing the noise label characteristic by a preset threshold T and removing the outlier sample OOD, wherein the method specifically comprises the following steps:

wherein,

is a false label, T is a preset threshold value, when the total probability q _i Is greater than T, the total probability q is determined _i The category corresponding to the maximum value is used as a pseudo label; when total probability q _i When the probability is larger than the class average probability, the original label y is labeled _i As a pseudo tag, correcting the noise tag in the example graph containing the noise tag characteristic; in other cases, OOD is used as a pseudo label, OOD represents outlier samples, and outlier samples are removed.

9. The method for identifying network supervision fine-grained images based on deep learning as claimed in claim 8, wherein in the step S5.6, classification cross entropy loss is calculated

And total loss

According to total loss

Optimizing the graph matching neural network model to obtain the optimized graph matching neural network model, wherein the specific method comprises the following steps of:

the propagation layer in the graph is provided with a classified cross entropy loss function, which specifically comprises the following steps:

wherein,

to classify cross entropy losses, p _ij For the ith example graph containing noise label features to the probability of classifier distribution of the jth class,

the ith example graph containing the noise label characteristics is relative to the jth category of pseudo labels;

constructing a total loss function according to the classified cross entropy loss function and the graph matching loss function, wherein the total loss function specifically comprises the following steps:

wherein,

for total loss, λ _pro Is a proportionality coefficient;

according to total loss

10. A network supervision fine-grained image recognition system based on deep learning, which applies the network supervision fine-grained image recognition method based on deep learning in any one of claims 1 to 9, and is characterized by comprising the following steps:

an image acquisition unit: the method comprises the steps of obtaining an input image containing a noise label from the Internet;

a feature extraction unit: the system is used for extracting the characteristics of the input image containing the noise label to obtain an area discrimination characteristic diagram and an integral characteristic diagram;

example graph generation unit: the method is used for obtaining an example graph containing the noise label characteristics according to the obtained region distinguishing characteristic graph and the whole characteristic graph;

the figure prototype structure unit: the prototype of the graph is constructed for each category according to the acquired example graph containing the noise label characteristic;

a graph matching unit: the graph prototype model is used for inputting the obtained example graph containing the noise label characteristics and the graph prototype into a preset graph matching neural network model for training to obtain an optimized graph matching neural network model;

an image recognition unit: the method is used for obtaining an image to be recognized, recognizing the image to be recognized by utilizing the optimized graph matching neural network model after extracting the characteristics of the image to be recognized, and obtaining the recognition result of the image to be recognized.