CN110796183A - Weak supervision fine-grained image classification algorithm based on relevance-guided discriminant learning - Google Patents

Weak supervision fine-grained image classification algorithm based on relevance-guided discriminant learning Download PDF

Info

Publication number
CN110796183A
CN110796183A CN201910986800.8A CN201910986800A CN110796183A CN 110796183 A CN110796183 A CN 110796183A CN 201910986800 A CN201910986800 A CN 201910986800A CN 110796183 A CN110796183 A CN 110796183A
Authority
CN
China
Prior art keywords
feature
discriminant
correlation
layer
discriminative
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201910986800.8A
Other languages
Chinese (zh)
Inventor
王智慧
王世杰
李豪杰
唐涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University of Technology
Original Assignee
Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University of Technology filed Critical Dalian University of Technology
Priority to CN201910986800.8A priority Critical patent/CN110796183A/en
Publication of CN110796183A publication Critical patent/CN110796183A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques

Abstract

The invention belongs to the technical field of computer vision, and provides a correlation-guided weak-supervision fine-grained image classification algorithm for discriminant learning. An end-to-end relevance guided discriminant learning model is provided to fully mine and utilize the relevance of weak supervision fine-grained image classification to improve discriminability. First, a discriminative area grouping sub-network is proposed that first establishes correlations between areas and then enhances each area by weighting together all correlations from other areas to guide the network to find more discriminative area groups. Finally, discriminative feature-enhancing sub-networks have been proposed to mine and learn the intra-spatial correlations between the feature vector elements of each patch, and to improve its local discriminative power by jointly enhancing information elements while suppressing useless elements. A large number of experiments prove that the DRG and DFS are effective and achieve the most advanced performance.

Description

Weak supervision fine-grained image classification algorithm based on relevance-guided discriminant learning
Technical Field
The invention belongs to the technical field of computer vision, and provides a weakly supervised fine grained image classification algorithm based on relevance-guided discriminant learning, which takes improvement of fine grained image classification accuracy and efficiency as a starting point.
Background
Unlike general image classification, weakly supervised fine grained image classification (WFGIC) uses only image-level tags to identify objects at a more detailed class and granularity. WFGIC has attracted a great deal of attention in academia and industry due to its large number of potential applications in image understanding and computer vision systems. WFGIC is an open problem in the field of computer vision for two reasons. First, images belonging to the same sub-category vary greatly in size, pose, color, and background, while images of different sub-categories may be very similar in these respects. Second, WFGIC only provides image-level tags, in addition to object or region annotations, which presents greater difficulty in extracting valid discriminative features to distinguish subtle differences between sub-categories,
because the key differences between fine-grained subcategory images are subtle and often local to some specific portion of the object, the latest best-performing WFGIC system is working on finding local discriminant patches using heuristic based schemes or learning methods. Objects are first located using saliency extraction and co-segmentation, and then two defined spatial constraints are applied to select distinguishable parts from a large number of candidate patches. The limitation of heuristic approaches is that they do not guarantee that the selected patch is sufficiently discriminative. Therefore, recent work has focused on designing an end-to-end deep learning process to guide the automatic discovery of discriminative patch through appropriate loss functions. However, all previous work attempted to find the discriminative regions/patches independently, only using region features, ignoring the correlation between regions. We believe that using this correlation is very helpful in distinguishing fine-grained images, since regional combinations are more descriptive and discriminant than single regions. This prompted us to incorporate the correlation between regions into the discriminant patch selection. To this end, we propose a discriminative zone grouping (DRG) sub-network to model the correlation between zones and implicitly find a discriminative zone group that is more powerful for WFGIC by learning the correlation. Figure 1 shows our motivation and from (b) we can see that the head and chest are more prominent when each region is considered independently. After taking into account the correlation (c), the discrimination scores for head and tail become large, as head-tail combinations may be more effective in distinguishing this type of bird from other subcategories.
The feature representation is another key point of the WFGIC. Recently, some work has been done to encode CNN feature vectors into higher-order information through an end-to-end mechanism to improve the discrimination of features. Their methods are effective because of their invariance to the translation and pose of the object. Because the feature vectors are based on local image feature aggregation in a chaotic way, the feature vectors are translation invariant in design. However, these methods ignore the internal spatial correlation. In addition, there is some context with less discriminant or noise in the discriminant patch, such as the background region in fig. 1(d) (e). Such background information or less discriminative information may be detrimental to fine-grained classification because all subcategories have similar background information (e.g., all birds typically live on trees or fly in the sky). Based on the above intuitive but important observations and analyses, we propose a discriminative feature enhancement subnetwork to explore the intra-spatial correlations between discriminative elements in the feature vector to obtain better discriminative power. We achieve this goal by jointly learning interdependencies between feature vector elements and emphasizing informative elements while suppressing less discriminative elements
Disclosure of Invention
The invention provides a weak supervision fine-grained image classification algorithm based on relevance-guided discriminant learning, which is shown in figure 2.
The technical scheme of the invention is as follows:
a weakly supervised fine grained image classification algorithm based on relevance guided discriminant learning comprises two sub-networks:
(1) discriminative area packet (DRG) subnetworks
In this sub-network we propose a new method to establish the association between the zones. Given input profileShow MI∈RC×H×WWe propose to input the input feature representation into the discriminative area grouping module F:
MR=f(MI), (1)
wherein F is composed of three region-generating layers, a relation layer and a fusion layer. MR∈RC×H×WWherein W, H represent the width and height of the feature representation, and C represents the number of channels.
The region generation layer is calculated by a simple convolution operation and matrix transformation as follows:
MT=f(WI·MI+bT), (2)
wherein WT∈RC×1×1×CAnd bTWhich are the learning weight parameters and the bias vectors of the convolutional layer, respectively. 1 × 1 is the size of the convolution kernel. MT∈RC×H×WRepresenting a new feature map. Specifically, we consider a 1 × 1 convolution filter as a small area detector. MTEach V on a channel at a fixed spatial positionT∈RC×1×1The vector represents a small region of the corresponding position in the original image.
In order to obtain the correlation weight coefficient between the regions, a correlation layer is introduced to perform two feature maps calculated by the upper generation layer
Figure BDA0002236947820000031
And
Figure BDA0002236947820000032
comparison of the multiplication fields of (1).
Let us take just a single correlation of two positions as an example. P in the first feature map1And p of the second feature map2The correlation between two positions is defined as
Figure BDA0002236947820000033
Wherein V1And V2Respectively representing regional characteristics in different characteristic diagramsAnd (5) vector quantity. In actual operation, for each position p in the first profile1We calculate its correlation to all locations in the second map.
For each combination of two positions, we obtain a correlation value. Specifically, we organize the relative displacement in the channel and obtain an output correlation characteristic map MC∈RK×H×WWhere K ═ W × H is the region of the input feature map. Then, MCPassing through softmax layer to generate a discriminant correlation weight map R E RK×H×W
In the forward propagation process, the higher the discriminant of the regions, the greater the correlation between them. For back propagation, we derive for each bottom blob accordingly. When the probability value of the classification is low, the penalty will be propagated backwards to reduce the relevance weight of the two regions and at the same time update the feature representation computed by the region generation layer.
Next, we will generate feature vectors by the third region generator layerAnd the relevance weight graph R is input into the fusion layer f:
Figure BDA0002236947820000043
Figure BDA0002236947820000044
whereinIs that
Figure BDA0002236947820000046
W ofthRows and whthVector of columns, RijkIs the iththLine j (th)thColumn kthThe weight coefficient of the channel. At MFMiddle (i)thLine j (th)thVector of columns
Figure BDA0002236947820000047
Can be calculated by combining all position vectors with corresponding correlation coefficients, wherein the feature map
Figure BDA0002236947820000048
The index mapping relationship with the correlation weight coefficient map R is k ═ W-1 × W + h. In this way, the discrimination capability of the region aggregation is taken into account.
Inspired by ResNet, we propose residual learning:
MR=α·MF+MI, (7)
where α is the adaptive weight parameter and is gradually trained to learn to assign more weight to discriminant-related features, its range is [0, 1 ]]The initialization value is approximately 0. MRIncluding adaptive discriminant correlation features and raw input features to pick out more discriminant patches. Integrating global semantic information and local detail information may lead to more stable performance.
(2) Pick discriminant patch
In this work, we generate a default patch from three different scales of feature maps, based on the heuristics of target detection. The profiles of the different layers have different Receptive Fields (RF). We elaborated the scaling size, scaling step size and aspect ratio of the patch based on the respective RF of each feature map so that different feature maps can account for different sized discriminative regions.
Let us use the feature map M onlyRFor example. We will residual feature MRAnd inputting the scoring layer. Specifically, we add a 1 × 1 × N convolutional layer and a sigmoid function σ to learn the discriminative probability map Sedi RK×H×WThis indicates the effect of the discriminative region on the final classification result.
S=σ(Ws·MR+bS), (8)
Wherein WS∈RC×1×1×NIs a parameter of the convolution kernel, N is a feature map MRThe default number of taps in a given position, bSThe deviation is indicated.
At the same time, we assign a discriminant probability value to each default patch as pi,j,k. Each patch has its default coordinates (t)x,ty,tw,th) And a discriminative probability value si,j,kWherein s isi,j,kDenotes the iththLine j (th)thColumn kthThe value of the channel:
pi,j,k=[tx,ty,tw,th,si,j,k], (9)
finally, the network selects the first M patches with discriminative probability values, where M is a hyperparameter.
(3) Discriminative feature enhancement (DFS) subnetworks
The selected patch typically contains noise, so the extracted features tend to contain non-discriminative information. At the same time, most current work forms the feature representation of a region directly from the output of CNN, rarely considering spatial correlation in the feature vector. To solve the above problem, we propose a discriminant feature enhancer network to mine and exploit the correlation between feature vector elements, which consists of a feature perception filter layer and an enhancement layer. The feature-aware filtering layer aims to generate a global filter to filter out useless information by a non-linear operation that inverts negative values in feature vectors. The enhancement layer is used to adaptively learn the interdependence relationship by using the weighted sum between the discriminative elements in the feature vector to improve the discriminative power of the feature vector.
We will refer to the feature vector V'P∈RC×1Input to the feature perception filter to filter out unwanted information as follows:
VP=ReLU(BN(W*V'P+bP)) (10)
wherein WPAnd bPIs the weight matrix and bias of the linear layer, BN and ReLU represent batch regressionNormalized and linear correction unit (ReLU) functions. V P∈RC×1Representing the filtered discriminative feature vector.
Then, we will V PInput to the enhancement layer. Specifically, the interdependence score graph S of the discriminant elementsE∈RC×CIs through V PAnd V PIs generated as follows:
where σ is the softmax function used for normalization.
Figure BDA0002236947820000062
Is the ith before normalizationthIndividual discriminant element and jthThe interdependence between the discriminant elements,
Figure BDA0002236947820000063
represents the ith after normalizationthIndividual discriminant element and jthInterdependencies between discriminant elements; the larger the discrimination between any two elements, the stronger their interdependencies.
Next, we pass the patch feature vector V PAnd interdependence score plot S EThe matrix multiplication between the two improves the discrimination capability of the feature vector:
V=V P⊙S E(12)
taking into account the inter-spatial dependencies between discriminant elements of the feature vector, information elements can be enhanced while suppressing less powerful elements. We also introduced a residual learning mechanism to ensure the robustness of the network:
V=β·V+VP, (13)
wherein β is a weight that learns gradually from 0 and adjusts to an accurate value by back-propagation。VContaining enhanced feature vectors V and original input feature vectors V for final classificationP
(4) Loss function
The complete multitask penalty function L can be expressed as:
whereinRepresenting a fine-grained classification penalty.
Figure BDA0002236947820000073
And
Figure BDA0002236947820000074
indicating the guiding loss, the correlation loss and the grade loss, respectively. The balance between these losses is determined by the hyperparameter λ1,λ2,λ3And (5) controlling. Through multiple times of experimental verification, the parameter lambda is set1=λ2=λ3=1。
We denote the selected discriminant patch as P ═ P1,P2,...,PNAnd represent the corresponding discrimination probability score as S ═ S1,S2,...,SN}. Then, the guidance loss and the associated loss and the rank loss are defined as follows:
Figure BDA0002236947820000075
Figure BDA0002236947820000076
Figure BDA0002236947820000077
where X is the original image and function C is a summary reflecting the classification into the correct classConfidence function of rate, PcIs a concatenation of all selected patch features.
The purpose of the steering loss function is to steer the network to select a more discriminative area. When the prediction probability value of the selected region is lower than that of the whole image, the network will be penalized and weight adjusted by back propagation. The correlation loss function may ensure that the prediction probability of the combined feature is greater than the prediction probability of a single patch feature. The rank penalty stimulates the discriminant score and the final classification probability value in equal order, trying to keep both of the selected patches consistent.
The invention has the beneficial effects that:
(1) to our knowledge, we are the first approach to explore and use the correlation between discriminant regions and feature vector elements to improve the discrimination ability of regions and their representatives for WFGIC.
(2) We propose an end-to-end correlation-guided discriminant learning (CDL) model that incorporates discriminant region grouping and discriminant feature enhancement into a unified framework, so that two levels of correlation can be efficiently and jointly learned.
(3) We evaluated our proposed method on the challenging Caltech-UCSD libraries-200-. Experimental results show that the method achieves the best performance in both classification precision and efficiency. In particular, our method achieves an accuracy improvement of about 1.4% and a speed of 12FPS operation faster than the best previous techniques.
Drawings
Fig. 1 is a motivation illustration diagram of a relevance guide discriminant learning method proposed by the present invention.
FIG. 2 is a network framework diagram of a relevance-guided discriminant learning (CDL) model proposed by the present invention.
Fig. 3 is an explanatory diagram of the discrimination area grouping proposed by the present invention.
Fig. 4 is an illustration of the discrimination feature enhancement proposed by the present invention.
Fig. 5 is a visualization result of the region correlation of the present invention, and (a) is an original image. (b) (c) (d) (e) represents the correlation between the area of a particular location and all other areas.
FIG. 6 is a visual intermediate of the discrimination area grouping of the present invention, where (a) is the original image. (b) Indicating a dependency aggregation feature graph. (c) Representing a residual feature map. (d) Is the positioning result.
Fig. 7 shows the result of visual localization with or without the region correlation contrast according to the present invention, where (a) is the original image. (b) (c) a discriminative score map by scoring phase without and with correlation, respectively. (d) And (e) the positioning results with no correlation and with correlation respectively.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following detailed description of the embodiments of the present invention is provided.
Data set: experimental evaluation was performed on two reference datasets: Caltech-UCSD Birds-200 and Stanford Cars, which are widely used sets of contest data for fine-grained image classification. The CUB-200-2011 data set covers 200 birds and contains 11788 bird images, which are divided into a training set of 5994 images and a test set of 5794 images. The stanford car dataset contained 16,185 images in 196 categories, with approximately 50 groupings per category.
Implementation details: in all our experiments, all images were resized to 448 × 448. We used the full convolution network ResNet-50 as the feature extractor and applied "batch normalization" as the regularizer. Our optimizer uses Momentum SGD with an initial learning rate of 0.001 and multiplies 0.1 after every 60 epochs. We set the weighted decay rate to 1 e-4. To reduce patch redundancy, we apply a nonmaximum suppression (NMS) to the default patch based on its discriminant score and set the NMS threshold to 0.25.
Ablation experiment: the main advantage of our method is to select a more discriminative patch based on the correlation between regions and enhance the feature vector by mining the interdependencies of discriminative elements in the feature vector. We performed a few ablation experiments to illustrate the effectiveness of our proposed module, including the impact of discrimination region grouping and discrimination feature enhancement.
First, we extract features from the entire image through Resnet-50, do no object or partial annotation to make fine-grained classification, and set it as baseline. Then, we select default patch as the local feature to improve classification accuracy. However, a large number of redundant default patches results in a low classification speed. When we introduce a scoring mechanism to retain only highly discriminative patches and reduce the number of patches to single-digit numbers, the top-1 classification accuracy of the CUB-200-2011 data set is improved by 0.6%, and real-time classification is achieved at a speed of 50 fps. In addition, after the discrimination capability of the region aggregation is considered, the classification precision is improved by 1.3 percent. Finally, a feature perception filter is introduced, the interdependence of feature vector values is mined, and the classification precision reaches 88.4% of the latest result. We also analyzed the feature-aware filter in DFS and demonstrated its effectiveness without adding additional computational cost. The results are reported in table 2. Ablation experiments show that the network proposed by the user really learns the discriminant region, useless information is filtered, the discriminant characteristic value is enhanced, and accuracy is effectively improved.
TABLE 2 identification of ablation experiments for different variations of the method of the invention
Figure BDA0002236947820000091
Figure BDA0002236947820000101
And (3) qualitative comparison: and (3) accuracy comparison: our comparison focuses on weakly supervised approaches, since the proposed model uses only image-level annotations, and does not use any object or region annotations. In tables 3 and 4, we show the performance of the different methods on the CUB-200-2011 dataset and the Stanford Cars-196 dataset, respectively. From top to bottom of each table, the methods are divided into six groups, respectively, (1) supervised multi-level methods, (2) weakly supervised multi-level frameworks, (3) weakly supervised end-to-end feature coding, (4) end-to-end location classification sub-networks, (5) other methods (e.g. reinforcement learning, knowledge representation) and (6) our CDL.
TABLE 3 comparison of the different methods on CUB-200-
Figure BDA0002236947820000102
Figure BDA0002236947820000111
TABLE 4 comparison of the different procedures on Stanford Cars-196
Figure BDA0002236947820000121
Early multi-stage methods may generally rely on object and even site annotation and therefore may achieve better results. However, using object or site annotations limits performance because the annotations only give coordinates and not actual discriminative area information. The weakly supervised multi-stage framework gradually defeats the powerful supervision approach by picking out the discriminative regions. The end-to-end feature coding method has good performance by coding CNN feature vectors into high-order information, while resulting in higher computational cost. Although location classification subnetworks may work well on a variety of data sets, they still lack correlation between discrimination regions. Other methods also achieve good performance due to the use of additional information (e.g., semantic embedding). Our end-to-end CDL approach achieves the best results without any additional comments and has consistent performance across various data sets.
Our approach outperforms these powerful supervised approaches in the first group, which suggests that the proposed approach can find the discriminant patch without any supervised annotation. Compared with other weak supervision methods, the method can achieve the best performance. The performance of the proposed CDL on CUB is 1.4% higher than that of KERL, because we can perform region representation from global image level and local region level, and encode richer information. The DT-RAM selects accurate discriminant regions by using reinforcement learning, and more discriminant patches are selected by learning the correlation among the regions and mining the interdependency of elements in the feature vector to emphasize information elements and restrain useless elements, so that the method is more excellent than the DT-RAM in performance, the accuracy is improved by 2.4% on CUB and 1.1% on an automobile.
And (3) speed comparison: we performed experiments on Titan X graphics at a batch size 8 measurement speed. Table 4 shows a comparison with other methods. WSDL also applies multi-scale features to generate a patch, and selects the patch by detecting a score. Although we have chosen 2 discriminant patches based on the discriminant score map, we outperform other methods in both speed and accuracy. When we increase the number of discriminant patches from 2 to 4, the proposed model can achieve the most advanced classification accuracy and can maintain the real-time performance of 40 fps.
TABLE 5 comparison of efficiency and effectiveness of different methods on CUB-200-
Figure BDA0002236947820000131
Quantitative analysis: to better illustrate the impact of the correlation between regions, we visualize the correlation weight coefficient map in fig. 5. The correlation coefficient map indicates the correlation between a certain fixed region and all regions. We can observe that the feature maps learned by association tend to pay attention to some fixed regions (highlighted regions). The more discriminative the regions, the greater their correlation. The most discriminative regions occupy a higher proportion in the clustering process.
As shown in fig. 6, we visualize the correlation aggregate feature map obtained by the operation of the weight sum and the residual feature map combining all the regions. The residual feature map is obtained by fusing the original feature map and the correlation aggregation feature map. The raw features map a response to a particular size of the region of discriminant and focus on a number of local details. The relevance aggregation profile has a global view, noting the most discriminating regions. The residual feature map contains both local detailed information and global discriminant information to achieve stable performance.
To illustrate the effectiveness of the discriminative area grouping module, we visualize the discriminative scoring graph with and without discriminative area grouping sub-networks in FIG. 7. We can see that the discriminant score map without the correlation stage focuses only on one discriminant region, and the selected patch concentrates on its neighboring regions. However, our discrimination area grouping sub-network can notice a plurality of valid areas as shown in fig. 7 (c). To present the image more intuitively, we show the positioning results in the original image. It can be observed that the selected patch is concentrated in several different regions, thus resulting in a region-gathering feature that is more discriminative.
In the method, a CDL method is provided for classifying weakly supervised fine grained images, and the method integrates a discriminant region grouping sub-network and a discriminant feature enhancement sub-network into a unified framework. The discriminative region grouping subnetwork may learn the correlation weight coefficients between regions to guide finding the discriminative patch, while the discriminative feature enhancement subnetwork may mine the interdependencies between internal discriminative elements in the feature vector to enhance informational elements and suppress unwanted elements. Experiments have shown that our method has consistent improved results on both fine-grained image datasets. We achieved the most advanced accuracy and real-time speed of 42 fps.
While the invention has been described in connection with specific embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (1)

1. A weakly supervised fine grained image classification algorithm based on relevance guided discriminant learning is characterized by comprising two sub-networks:
(1) discriminative area packet DRG subnetwork
A new method is proposed in the sub-network to establish the connection between the areas; given input feature representation MI∈RC ×H×WInputting the input feature representation into a discrimination region grouping module F:
MR=f(MI), (1)
wherein F is composed of three region-generating layers, a relation layer and a fusion layer; mR∈RC×H×WWherein W, H represent the width and height of the feature representation, and C represents the number of channels;
the area generation layer is calculated by convolution operation and matrix transformation as follows:
MT=f(WI·MI+bT), (2)
wherein, WT∈RC×1×1×CAnd bTLearning weight parameters and deviation vectors of the convolutional layer respectively; 1 × 1 is the size of the convolution kernel; mT∈RC×H×WRepresenting a new feature map; specifically, a 1 × 1 convolution filter is considered to be a small area detector; mTEach V on a channel at a fixed spatial positionT∈RC×1×1The vector represents a small area of the corresponding position in the original image;
in order to obtain the correlation weight coefficient between the regions, a relation layer is introduced to carry out two feature maps calculated by a region generation layer
Figure FDA0002236947810000011
And
Figure FDA0002236947810000012
comparison of the multiplicative fields of;
single correlation of two positions: p in the first feature map1And p of the second feature map2The correlation between two positions is defined as
Figure FDA0002236947810000013
Wherein, V1And V2Respectively representing regional characteristic vectors in different characteristic graphs; in actual operation, for each position p in the first profile1Calculating the correlation between the first map and all the positions in the second map;
for each combination of two positions, a correlation value is obtained; specifically, relative displacement of tissue in the channel is obtained, and an output correlation characteristic map M is obtainedC∈RK×H×WWhere K ═ W × H is the region of the input feature map; then, MCPassing through softmax layer to generate a discriminant correlation weight map R E RK×H×W
Figure FDA0002236947810000021
In the forward propagation process, the higher the discriminant of the regions is, the greater the correlation between the regions is; for back propagation, the derivation is performed for each bottom blob accordingly; when the probability value of the classification is low, the punishment is propagated reversely to reduce the relevance weight of the two regions, and the feature representation calculated by the region generation layer is updated at the same time;
next, the feature vectors generated by the third region generator layerAnd the relevance weight graph R is input into the fusion layer f:
Figure FDA0002236947810000023
Figure FDA0002236947810000024
wherein the content of the first and second substances,is that
Figure FDA0002236947810000026
W ofthRows and whthVector of columns, RijkIs the iththLine j (th)thColumn kthA weight coefficient of the channel; at MFMiddle (i)thLine j (th)thVector of columns
Figure FDA0002236947810000027
By combining all position vectors with corresponding correlation coefficients, wherein the feature mapThe index mapping relation with the correlation weight coefficient graph R is k ═ W-1 × W + h;
residual learning is proposed:
MR=α·MF+MI, (7)
wherein α is the adaptive weight parameter and is gradually trained and learned to assign more weight to discriminant-related features, α is the range [0, 1%]The initialization value is 0; mRThe method comprises the steps of including self-adaptive discriminant correlation characteristics and original input characteristics to select more discriminant patches;
(2) pick discriminant patch
Generating a default patch from three feature maps with different scales according to target detection; the profiles of the different layers have different reception fields RF; scaling the step size and the aspect ratio according to the scale size of the corresponding RF design patch of each feature map so as to enable different feature maps to be responsible for different sized discrimination areas;
for the feature map MRThe residual error characteristics MRInputting a rating layer; specifically, a 1 × 1 × N convolutional layer and a sigmoid function σ are added to learn the discriminative probability map Sedi RK×H×WThe influence of the discriminant region on the final classification result is shown;
S=σ(Ws·MR+bS), (8)
wherein, WS∈RC×1×1×NIs a parameter of the convolution kernel, N is a feature map MRThe default number of taps in a given position, bSIndicating a deviation;
meanwhile, a discriminant probability value is assigned to each default patch as pi,j,k(ii) a Each patch has its default coordinates (t)x,ty,tw,th) And a discriminative probability value si,j,kWherein s isi,j,kDenotes the iththLine j (th)thColumn kthThe value of the channel:
pi,j,k=[tx,ty,tw,th,si,j,k], (9)
finally, the network selects the first M patches with discriminant probability values, wherein M is a hyper-parameter;
(3) discriminative feature enhanced DFS subnetworks
A discriminant feature enhancer network to mine and exploit correlations between feature vector elements, consisting of a feature-aware filter layer and an enhancement layer; the feature-aware filtering layer is intended to generate a global filter to filter the features by negating negative values in the feature vectors; the enhancement layer is used for adaptively learning the interdependence relationship by using the weighted sum of the discriminant elements in the feature vector;
feature vector V'P∈RC×1Input to the feature perception filter to filter out unwanted information as follows:
VP=ReLU(BN(W*V'P+bP))(10)
wherein W and bPIs the weight matrix and deviation of the linear layer, BN and ReLU represent batch normalization and linear correction unit functions; v P∈RC×1Representing the filtered discriminant feature vector;
then, V is put PInput to the enhancement layer; specifically, the interdependence score graph S of the discriminant elementsE∈RC×CIs through V PAnd V PBy a matrix between transposes ofThe method operation is generated as follows:
Figure FDA0002236947810000041
where σ is the softmax function used for normalization;
Figure FDA0002236947810000042
is the ith before normalizationthIndividual discriminant element and jthThe interdependence between the discriminant elements,
Figure FDA0002236947810000043
represents the ith after normalizationthIndividual discriminant element and jthInterdependencies between discriminant elements; the larger the discrimination value between any two elements, the stronger their interdependencies;
next, pass the patch feature vector V PAnd interdependence score plot S EThe matrix multiplication between the two improves the discrimination capability of the feature vector:
V=V P⊙S E(12)
the internal space interdependence relation between discriminant elements of the feature vector is considered, information elements are enhanced, and elements with small inhibition effect are obtained; a residual learning mechanism is also introduced to ensure the robustness of the network:
V=β·V+VP,(13)
where β is a weight that learns gradually from 0 and adjusts to an accurate value by back-propagation, VContaining enhanced feature vectors V and original input feature vectors V for final classificationP
(4) Loss function
The complete multitask penalty function L is expressed as:
wherein the content of the first and second substances,
Figure FDA0002236947810000045
represents a fine-grained classification penalty;and
Figure FDA0002236947810000047
respectively representing a guide loss, a correlation loss and a grade loss; the balance between these losses is determined by the hyperparameter λ1,λ2,λ3Controlling; through multiple times of experimental verification, setting parameter lambda1=λ2=λ3=1;
The selected discriminant patch is expressed as P ═ P1,P2,...,PNAnd represent the corresponding discrimination probability score as S ═ S1,S2,...,SN}; then, the guidance loss and the associated loss and the rank loss are defined as follows:
Figure FDA0002236947810000051
Figure FDA0002236947810000052
Figure FDA0002236947810000053
where X is the original image, function C is a confidence function reflecting the probability of classifying into the correct class, PcIs a concatenation of all selected patch features;
the purpose of the steering loss function is to steer the network to select a more discriminative area; when the prediction probability value of the selected area is lower than that of the whole image, the network is punished and carries out weight adjustment through back propagation; the correlation loss function ensures that the prediction probability of the combined feature is greater than that of a single patch feature; the rank penalty stimulates the discriminant score and the final classification probability value in equal order, trying to keep both of the selected patches consistent.
CN201910986800.8A 2019-10-17 2019-10-17 Weak supervision fine-grained image classification algorithm based on relevance-guided discriminant learning Withdrawn CN110796183A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910986800.8A CN110796183A (en) 2019-10-17 2019-10-17 Weak supervision fine-grained image classification algorithm based on relevance-guided discriminant learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910986800.8A CN110796183A (en) 2019-10-17 2019-10-17 Weak supervision fine-grained image classification algorithm based on relevance-guided discriminant learning

Publications (1)

Publication Number Publication Date
CN110796183A true CN110796183A (en) 2020-02-14

Family

ID=69439314

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910986800.8A Withdrawn CN110796183A (en) 2019-10-17 2019-10-17 Weak supervision fine-grained image classification algorithm based on relevance-guided discriminant learning

Country Status (1)

Country Link
CN (1) CN110796183A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111507403A (en) * 2020-04-17 2020-08-07 腾讯科技(深圳)有限公司 Image classification method and device, computer equipment and storage medium
CN112541463A (en) * 2020-12-21 2021-03-23 上海眼控科技股份有限公司 Model training method, appearance segmentation method, device and storage medium
CN117173422A (en) * 2023-08-07 2023-12-05 广东第二师范学院 Fine granularity image recognition method based on graph fusion multi-scale feature learning

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101894275A (en) * 2010-06-29 2010-11-24 武汉大学 Weakly supervised method for classifying SAR images
US20160132750A1 (en) * 2014-11-07 2016-05-12 Adobe Systems Incorporated Local feature representation for image recognition
US20160140424A1 (en) * 2014-11-13 2016-05-19 Nec Laboratories America, Inc. Object-centric Fine-grained Image Classification
US20160140438A1 (en) * 2014-11-13 2016-05-19 Nec Laboratories America, Inc. Hyper-class Augmented and Regularized Deep Learning for Fine-grained Image Classification
US20160210533A1 (en) * 2015-01-19 2016-07-21 Ebay Inc Fine-grained categorization
US20160307072A1 (en) * 2015-04-17 2016-10-20 Nec Laboratories America, Inc. Fine-grained Image Classification by Exploring Bipartite-Graph Labels
CN107766890A (en) * 2017-10-31 2018-03-06 天津大学 The improved method that identification segment learns in a kind of fine granularity identification
CN109002834A (en) * 2018-06-15 2018-12-14 东南大学 Fine granularity image classification method based on multi-modal characterization
WO2019136946A1 (en) * 2018-01-15 2019-07-18 中山大学 Deep learning-based weakly supervised salient object detection method and system
WO2019140767A1 (en) * 2018-01-18 2019-07-25 苏州大学张家港工业技术研究院 Recognition system for security check and control method thereof
CN110097067A (en) * 2018-12-25 2019-08-06 西北工业大学 It is a kind of based on layer into the Weakly supervised fine granularity image classification method of formula eigentransformation
CN110097090A (en) * 2019-04-10 2019-08-06 东南大学 A kind of image fine granularity recognition methods based on multi-scale feature fusion
CN110135502A (en) * 2019-05-17 2019-08-16 东南大学 A kind of image fine granularity recognition methods based on intensified learning strategy
CN110147834A (en) * 2019-05-10 2019-08-20 上海理工大学 Fine granularity image classification method based on rarefaction bilinearity convolutional neural networks
CN110309858A (en) * 2019-06-05 2019-10-08 大连理工大学 Based on the fine granularity image classification algorithms for differentiating study

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101894275A (en) * 2010-06-29 2010-11-24 武汉大学 Weakly supervised method for classifying SAR images
US20160132750A1 (en) * 2014-11-07 2016-05-12 Adobe Systems Incorporated Local feature representation for image recognition
US20160140424A1 (en) * 2014-11-13 2016-05-19 Nec Laboratories America, Inc. Object-centric Fine-grained Image Classification
US20160140438A1 (en) * 2014-11-13 2016-05-19 Nec Laboratories America, Inc. Hyper-class Augmented and Regularized Deep Learning for Fine-grained Image Classification
US20160210533A1 (en) * 2015-01-19 2016-07-21 Ebay Inc Fine-grained categorization
US20160307072A1 (en) * 2015-04-17 2016-10-20 Nec Laboratories America, Inc. Fine-grained Image Classification by Exploring Bipartite-Graph Labels
CN107766890A (en) * 2017-10-31 2018-03-06 天津大学 The improved method that identification segment learns in a kind of fine granularity identification
WO2019136946A1 (en) * 2018-01-15 2019-07-18 中山大学 Deep learning-based weakly supervised salient object detection method and system
WO2019140767A1 (en) * 2018-01-18 2019-07-25 苏州大学张家港工业技术研究院 Recognition system for security check and control method thereof
CN109002834A (en) * 2018-06-15 2018-12-14 东南大学 Fine granularity image classification method based on multi-modal characterization
CN110097067A (en) * 2018-12-25 2019-08-06 西北工业大学 It is a kind of based on layer into the Weakly supervised fine granularity image classification method of formula eigentransformation
CN110097090A (en) * 2019-04-10 2019-08-06 东南大学 A kind of image fine granularity recognition methods based on multi-scale feature fusion
CN110147834A (en) * 2019-05-10 2019-08-20 上海理工大学 Fine granularity image classification method based on rarefaction bilinearity convolutional neural networks
CN110135502A (en) * 2019-05-17 2019-08-16 东南大学 A kind of image fine granularity recognition methods based on intensified learning strategy
CN110309858A (en) * 2019-06-05 2019-10-08 大连理工大学 Based on the fine granularity image classification algorithms for differentiating study

Non-Patent Citations (9)

* Cited by examiner, † Cited by third party
Title
JIALI XI ET AL.: "Fine-Grained Fusion With Distractor Suppression for Video-Based Person Re-Identification", 《IEEE ACCESS》 *
LIN WU ET AL.: "Deep Attention-Based Spatially Recursive Networks for Fine-Grained Visual Recognition", 《IEEE TRANSACTIONS ON CYBERNETICS》 *
PENG ZHANG ET AL.: "REAPS: Towards Better Recognition of Fine-grained Images by Region Attending and Part Sequencing", 《ARXIV》 *
ZHIHUI WANG ET AL.: "accurate and fast fine-grained image classification via discriminative learning", 《2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME)》 *
ZHIHUI WANG ET AL.: "Weakly Supervised Fine-grained Image Classification via Correlation-guided Discriminative Learning", 《PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA》 *
杨娟 等: "区域建议网络的细粒度车型识别", 《中国图象图形学报》 *
王虹: "多特征融合的细粒度图像检索算法", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 *
郑光剑: "基于点击深度模型的细粒度图像识别", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 *
金科: "基于双线性卷积神经网络的图像分类方法研究", 《万方》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111507403A (en) * 2020-04-17 2020-08-07 腾讯科技(深圳)有限公司 Image classification method and device, computer equipment and storage medium
CN112541463A (en) * 2020-12-21 2021-03-23 上海眼控科技股份有限公司 Model training method, appearance segmentation method, device and storage medium
CN117173422A (en) * 2023-08-07 2023-12-05 广东第二师范学院 Fine granularity image recognition method based on graph fusion multi-scale feature learning
CN117173422B (en) * 2023-08-07 2024-02-13 广东第二师范学院 Fine granularity image recognition method based on graph fusion multi-scale feature learning

Similar Documents

Publication Publication Date Title
CN110837836B (en) Semi-supervised semantic segmentation method based on maximized confidence
CN110796183A (en) Weak supervision fine-grained image classification algorithm based on relevance-guided discriminant learning
Wang et al. Salient object detection based on multi-scale contrast
CN103679132B (en) A kind of nude picture detection method and system
CN111062438B (en) Image propagation weak supervision fine granularity image classification algorithm based on correlation learning
CN108920643B (en) Weighted multi-feature fusion fine-grained image retrieval method
CN110309858B (en) Fine-grained image classification method based on discriminant learning
CN108921047B (en) Multi-model voting mean value action identification method based on cross-layer fusion
Lachaize et al. Evidential framework for error correcting output code classification
CN105138672A (en) Multi-feature fusion image retrieval method
US20220277192A1 (en) Visual Analytics System to Assess, Understand, and Improve Deep Neural Networks
Liu et al. Cross-part learning for fine-grained image classification
Ge et al. Semantic-guided reinforced region embedding for generalized zero-shot learning
CN115412324A (en) Air-space-ground network intrusion detection method based on multi-mode conditional countermeasure field adaptation
CN105096293A (en) Method and device used for processing to-be-processed block of urine sediment image
Lin et al. MCCH: A novel convex hull prior based solution for saliency detection
Pang et al. Over-sampling strategy-based class-imbalanced salient object detection and its application in underwater scene
CN114359742B (en) Weighted loss function calculation method for optimizing small target detection
Xiang et al. Double-branch fusion network with a parallel attention selection mechanism for camouflaged object detection
CN111242102B (en) Fine-grained image recognition algorithm of Gaussian mixture model based on discriminant feature guide
CN116109649A (en) 3D point cloud instance segmentation method based on semantic error correction
Carlson et al. Application of a weighted projection measure for robust hidden Markov model based speech recognition
CN112836511A (en) Knowledge graph context embedding method based on cooperative relationship
CN114168780A (en) Multimodal data processing method, electronic device, and storage medium
Faria et al. Classifier selection based on the correlation of diversity measures: When fewer is more

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20200214