CN110796183A - Weak supervision fine-grained image classification algorithm based on relevance-guided discriminant learning - Google Patents
Weak supervision fine-grained image classification algorithm based on relevance-guided discriminant learning Download PDFInfo
- Publication number
- CN110796183A CN110796183A CN201910986800.8A CN201910986800A CN110796183A CN 110796183 A CN110796183 A CN 110796183A CN 201910986800 A CN201910986800 A CN 201910986800A CN 110796183 A CN110796183 A CN 110796183A
- Authority
- CN
- China
- Prior art keywords
- feature
- discriminant
- correlation
- layer
- discriminative
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/217—Validation; Performance evaluation; Active pattern learning techniques
Abstract
The invention belongs to the technical field of computer vision, and provides a correlation-guided weak-supervision fine-grained image classification algorithm for discriminant learning. An end-to-end relevance guided discriminant learning model is provided to fully mine and utilize the relevance of weak supervision fine-grained image classification to improve discriminability. First, a discriminative area grouping sub-network is proposed that first establishes correlations between areas and then enhances each area by weighting together all correlations from other areas to guide the network to find more discriminative area groups. Finally, discriminative feature-enhancing sub-networks have been proposed to mine and learn the intra-spatial correlations between the feature vector elements of each patch, and to improve its local discriminative power by jointly enhancing information elements while suppressing useless elements. A large number of experiments prove that the DRG and DFS are effective and achieve the most advanced performance.
Description
Technical Field
The invention belongs to the technical field of computer vision, and provides a weakly supervised fine grained image classification algorithm based on relevance-guided discriminant learning, which takes improvement of fine grained image classification accuracy and efficiency as a starting point.
Background
Unlike general image classification, weakly supervised fine grained image classification (WFGIC) uses only image-level tags to identify objects at a more detailed class and granularity. WFGIC has attracted a great deal of attention in academia and industry due to its large number of potential applications in image understanding and computer vision systems. WFGIC is an open problem in the field of computer vision for two reasons. First, images belonging to the same sub-category vary greatly in size, pose, color, and background, while images of different sub-categories may be very similar in these respects. Second, WFGIC only provides image-level tags, in addition to object or region annotations, which presents greater difficulty in extracting valid discriminative features to distinguish subtle differences between sub-categories,
because the key differences between fine-grained subcategory images are subtle and often local to some specific portion of the object, the latest best-performing WFGIC system is working on finding local discriminant patches using heuristic based schemes or learning methods. Objects are first located using saliency extraction and co-segmentation, and then two defined spatial constraints are applied to select distinguishable parts from a large number of candidate patches. The limitation of heuristic approaches is that they do not guarantee that the selected patch is sufficiently discriminative. Therefore, recent work has focused on designing an end-to-end deep learning process to guide the automatic discovery of discriminative patch through appropriate loss functions. However, all previous work attempted to find the discriminative regions/patches independently, only using region features, ignoring the correlation between regions. We believe that using this correlation is very helpful in distinguishing fine-grained images, since regional combinations are more descriptive and discriminant than single regions. This prompted us to incorporate the correlation between regions into the discriminant patch selection. To this end, we propose a discriminative zone grouping (DRG) sub-network to model the correlation between zones and implicitly find a discriminative zone group that is more powerful for WFGIC by learning the correlation. Figure 1 shows our motivation and from (b) we can see that the head and chest are more prominent when each region is considered independently. After taking into account the correlation (c), the discrimination scores for head and tail become large, as head-tail combinations may be more effective in distinguishing this type of bird from other subcategories.
The feature representation is another key point of the WFGIC. Recently, some work has been done to encode CNN feature vectors into higher-order information through an end-to-end mechanism to improve the discrimination of features. Their methods are effective because of their invariance to the translation and pose of the object. Because the feature vectors are based on local image feature aggregation in a chaotic way, the feature vectors are translation invariant in design. However, these methods ignore the internal spatial correlation. In addition, there is some context with less discriminant or noise in the discriminant patch, such as the background region in fig. 1(d) (e). Such background information or less discriminative information may be detrimental to fine-grained classification because all subcategories have similar background information (e.g., all birds typically live on trees or fly in the sky). Based on the above intuitive but important observations and analyses, we propose a discriminative feature enhancement subnetwork to explore the intra-spatial correlations between discriminative elements in the feature vector to obtain better discriminative power. We achieve this goal by jointly learning interdependencies between feature vector elements and emphasizing informative elements while suppressing less discriminative elements
Disclosure of Invention
The invention provides a weak supervision fine-grained image classification algorithm based on relevance-guided discriminant learning, which is shown in figure 2.
The technical scheme of the invention is as follows:
a weakly supervised fine grained image classification algorithm based on relevance guided discriminant learning comprises two sub-networks:
(1) discriminative area packet (DRG) subnetworks
In this sub-network we propose a new method to establish the association between the zones. Given input profileShow MI∈RC×H×WWe propose to input the input feature representation into the discriminative area grouping module F:
MR=f(MI), (1)
wherein F is composed of three region-generating layers, a relation layer and a fusion layer. MR∈RC×H×WWherein W, H represent the width and height of the feature representation, and C represents the number of channels.
The region generation layer is calculated by a simple convolution operation and matrix transformation as follows:
MT=f(WI·MI+bT), (2)
wherein WT∈RC×1×1×CAnd bTWhich are the learning weight parameters and the bias vectors of the convolutional layer, respectively. 1 × 1 is the size of the convolution kernel. MT∈RC×H×WRepresenting a new feature map. Specifically, we consider a 1 × 1 convolution filter as a small area detector. MTEach V on a channel at a fixed spatial positionT∈RC×1×1The vector represents a small region of the corresponding position in the original image.
In order to obtain the correlation weight coefficient between the regions, a correlation layer is introduced to perform two feature maps calculated by the upper generation layerAndcomparison of the multiplication fields of (1).
Let us take just a single correlation of two positions as an example. P in the first feature map1And p of the second feature map2The correlation between two positions is defined as
Wherein V1And V2Respectively representing regional characteristics in different characteristic diagramsAnd (5) vector quantity. In actual operation, for each position p in the first profile1We calculate its correlation to all locations in the second map.
For each combination of two positions, we obtain a correlation value. Specifically, we organize the relative displacement in the channel and obtain an output correlation characteristic map MC∈RK×H×WWhere K ═ W × H is the region of the input feature map. Then, MCPassing through softmax layer to generate a discriminant correlation weight map R E RK×H×W:
In the forward propagation process, the higher the discriminant of the regions, the greater the correlation between them. For back propagation, we derive for each bottom blob accordingly. When the probability value of the classification is low, the penalty will be propagated backwards to reduce the relevance weight of the two regions and at the same time update the feature representation computed by the region generation layer.
Next, we will generate feature vectors by the third region generator layerAnd the relevance weight graph R is input into the fusion layer f:
whereinIs thatW ofthRows and whthVector of columns, RijkIs the iththLine j (th)thColumn kthThe weight coefficient of the channel. At MFMiddle (i)thLine j (th)thVector of columnsCan be calculated by combining all position vectors with corresponding correlation coefficients, wherein the feature mapThe index mapping relationship with the correlation weight coefficient map R is k ═ W-1 × W + h. In this way, the discrimination capability of the region aggregation is taken into account.
Inspired by ResNet, we propose residual learning:
MR=α·MF+MI, (7)
where α is the adaptive weight parameter and is gradually trained to learn to assign more weight to discriminant-related features, its range is [0, 1 ]]The initialization value is approximately 0. MRIncluding adaptive discriminant correlation features and raw input features to pick out more discriminant patches. Integrating global semantic information and local detail information may lead to more stable performance.
(2) Pick discriminant patch
In this work, we generate a default patch from three different scales of feature maps, based on the heuristics of target detection. The profiles of the different layers have different Receptive Fields (RF). We elaborated the scaling size, scaling step size and aspect ratio of the patch based on the respective RF of each feature map so that different feature maps can account for different sized discriminative regions.
Let us use the feature map M onlyRFor example. We will residual feature MRAnd inputting the scoring layer. Specifically, we add a 1 × 1 × N convolutional layer and a sigmoid function σ to learn the discriminative probability map Sedi RK×H×WThis indicates the effect of the discriminative region on the final classification result.
S=σ(Ws·MR+bS), (8)
Wherein WS∈RC×1×1×NIs a parameter of the convolution kernel, N is a feature map MRThe default number of taps in a given position, bSThe deviation is indicated.
At the same time, we assign a discriminant probability value to each default patch as pi,j,k. Each patch has its default coordinates (t)x,ty,tw,th) And a discriminative probability value si,j,kWherein s isi,j,kDenotes the iththLine j (th)thColumn kthThe value of the channel:
pi,j,k=[tx,ty,tw,th,si,j,k], (9)
finally, the network selects the first M patches with discriminative probability values, where M is a hyperparameter.
(3) Discriminative feature enhancement (DFS) subnetworks
The selected patch typically contains noise, so the extracted features tend to contain non-discriminative information. At the same time, most current work forms the feature representation of a region directly from the output of CNN, rarely considering spatial correlation in the feature vector. To solve the above problem, we propose a discriminant feature enhancer network to mine and exploit the correlation between feature vector elements, which consists of a feature perception filter layer and an enhancement layer. The feature-aware filtering layer aims to generate a global filter to filter out useless information by a non-linear operation that inverts negative values in feature vectors. The enhancement layer is used to adaptively learn the interdependence relationship by using the weighted sum between the discriminative elements in the feature vector to improve the discriminative power of the feature vector.
We will refer to the feature vector V'P∈RC×1Input to the feature perception filter to filter out unwanted information as follows:
VP=ReLU(BN(W*V'P+bP)) (10)
wherein WPAnd bPIs the weight matrix and bias of the linear layer, BN and ReLU represent batch regressionNormalized and linear correction unit (ReLU) functions. V~ P∈RC×1Representing the filtered discriminative feature vector.
Then, we will V~ PInput to the enhancement layer. Specifically, the interdependence score graph S of the discriminant elementsE∈RC×CIs through V~ PAnd V~ PIs generated as follows:
where σ is the softmax function used for normalization.Is the ith before normalizationthIndividual discriminant element and jthThe interdependence between the discriminant elements,represents the ith after normalizationthIndividual discriminant element and jthInterdependencies between discriminant elements; the larger the discrimination between any two elements, the stronger their interdependencies.
Next, we pass the patch feature vector V~ PAnd interdependence score plot S~ EThe matrix multiplication between the two improves the discrimination capability of the feature vector:
V=V~ P⊙S~ E(12)
taking into account the inter-spatial dependencies between discriminant elements of the feature vector, information elements can be enhanced while suppressing less powerful elements. We also introduced a residual learning mechanism to ensure the robustness of the network:
V~=β·V+VP, (13)
wherein β is a weight that learns gradually from 0 and adjusts to an accurate value by back-propagation。V~Containing enhanced feature vectors V and original input feature vectors V for final classificationP。
(4) Loss function
The complete multitask penalty function L can be expressed as:
whereinRepresenting a fine-grained classification penalty.Andindicating the guiding loss, the correlation loss and the grade loss, respectively. The balance between these losses is determined by the hyperparameter λ1,λ2,λ3And (5) controlling. Through multiple times of experimental verification, the parameter lambda is set1=λ2=λ3=1。
We denote the selected discriminant patch as P ═ P1,P2,...,PNAnd represent the corresponding discrimination probability score as S ═ S1,S2,...,SN}. Then, the guidance loss and the associated loss and the rank loss are defined as follows:
where X is the original image and function C is a summary reflecting the classification into the correct classConfidence function of rate, PcIs a concatenation of all selected patch features.
The purpose of the steering loss function is to steer the network to select a more discriminative area. When the prediction probability value of the selected region is lower than that of the whole image, the network will be penalized and weight adjusted by back propagation. The correlation loss function may ensure that the prediction probability of the combined feature is greater than the prediction probability of a single patch feature. The rank penalty stimulates the discriminant score and the final classification probability value in equal order, trying to keep both of the selected patches consistent.
The invention has the beneficial effects that:
(1) to our knowledge, we are the first approach to explore and use the correlation between discriminant regions and feature vector elements to improve the discrimination ability of regions and their representatives for WFGIC.
(2) We propose an end-to-end correlation-guided discriminant learning (CDL) model that incorporates discriminant region grouping and discriminant feature enhancement into a unified framework, so that two levels of correlation can be efficiently and jointly learned.
(3) We evaluated our proposed method on the challenging Caltech-UCSD libraries-200-. Experimental results show that the method achieves the best performance in both classification precision and efficiency. In particular, our method achieves an accuracy improvement of about 1.4% and a speed of 12FPS operation faster than the best previous techniques.
Drawings
Fig. 1 is a motivation illustration diagram of a relevance guide discriminant learning method proposed by the present invention.
FIG. 2 is a network framework diagram of a relevance-guided discriminant learning (CDL) model proposed by the present invention.
Fig. 3 is an explanatory diagram of the discrimination area grouping proposed by the present invention.
Fig. 4 is an illustration of the discrimination feature enhancement proposed by the present invention.
Fig. 5 is a visualization result of the region correlation of the present invention, and (a) is an original image. (b) (c) (d) (e) represents the correlation between the area of a particular location and all other areas.
FIG. 6 is a visual intermediate of the discrimination area grouping of the present invention, where (a) is the original image. (b) Indicating a dependency aggregation feature graph. (c) Representing a residual feature map. (d) Is the positioning result.
Fig. 7 shows the result of visual localization with or without the region correlation contrast according to the present invention, where (a) is the original image. (b) (c) a discriminative score map by scoring phase without and with correlation, respectively. (d) And (e) the positioning results with no correlation and with correlation respectively.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following detailed description of the embodiments of the present invention is provided.
Data set: experimental evaluation was performed on two reference datasets: Caltech-UCSD Birds-200 and Stanford Cars, which are widely used sets of contest data for fine-grained image classification. The CUB-200-2011 data set covers 200 birds and contains 11788 bird images, which are divided into a training set of 5994 images and a test set of 5794 images. The stanford car dataset contained 16,185 images in 196 categories, with approximately 50 groupings per category.
Implementation details: in all our experiments, all images were resized to 448 × 448. We used the full convolution network ResNet-50 as the feature extractor and applied "batch normalization" as the regularizer. Our optimizer uses Momentum SGD with an initial learning rate of 0.001 and multiplies 0.1 after every 60 epochs. We set the weighted decay rate to 1 e-4. To reduce patch redundancy, we apply a nonmaximum suppression (NMS) to the default patch based on its discriminant score and set the NMS threshold to 0.25.
Ablation experiment: the main advantage of our method is to select a more discriminative patch based on the correlation between regions and enhance the feature vector by mining the interdependencies of discriminative elements in the feature vector. We performed a few ablation experiments to illustrate the effectiveness of our proposed module, including the impact of discrimination region grouping and discrimination feature enhancement.
First, we extract features from the entire image through Resnet-50, do no object or partial annotation to make fine-grained classification, and set it as baseline. Then, we select default patch as the local feature to improve classification accuracy. However, a large number of redundant default patches results in a low classification speed. When we introduce a scoring mechanism to retain only highly discriminative patches and reduce the number of patches to single-digit numbers, the top-1 classification accuracy of the CUB-200-2011 data set is improved by 0.6%, and real-time classification is achieved at a speed of 50 fps. In addition, after the discrimination capability of the region aggregation is considered, the classification precision is improved by 1.3 percent. Finally, a feature perception filter is introduced, the interdependence of feature vector values is mined, and the classification precision reaches 88.4% of the latest result. We also analyzed the feature-aware filter in DFS and demonstrated its effectiveness without adding additional computational cost. The results are reported in table 2. Ablation experiments show that the network proposed by the user really learns the discriminant region, useless information is filtered, the discriminant characteristic value is enhanced, and accuracy is effectively improved.
TABLE 2 identification of ablation experiments for different variations of the method of the invention
And (3) qualitative comparison: and (3) accuracy comparison: our comparison focuses on weakly supervised approaches, since the proposed model uses only image-level annotations, and does not use any object or region annotations. In tables 3 and 4, we show the performance of the different methods on the CUB-200-2011 dataset and the Stanford Cars-196 dataset, respectively. From top to bottom of each table, the methods are divided into six groups, respectively, (1) supervised multi-level methods, (2) weakly supervised multi-level frameworks, (3) weakly supervised end-to-end feature coding, (4) end-to-end location classification sub-networks, (5) other methods (e.g. reinforcement learning, knowledge representation) and (6) our CDL.
TABLE 3 comparison of the different methods on CUB-200-
TABLE 4 comparison of the different procedures on Stanford Cars-196
Early multi-stage methods may generally rely on object and even site annotation and therefore may achieve better results. However, using object or site annotations limits performance because the annotations only give coordinates and not actual discriminative area information. The weakly supervised multi-stage framework gradually defeats the powerful supervision approach by picking out the discriminative regions. The end-to-end feature coding method has good performance by coding CNN feature vectors into high-order information, while resulting in higher computational cost. Although location classification subnetworks may work well on a variety of data sets, they still lack correlation between discrimination regions. Other methods also achieve good performance due to the use of additional information (e.g., semantic embedding). Our end-to-end CDL approach achieves the best results without any additional comments and has consistent performance across various data sets.
Our approach outperforms these powerful supervised approaches in the first group, which suggests that the proposed approach can find the discriminant patch without any supervised annotation. Compared with other weak supervision methods, the method can achieve the best performance. The performance of the proposed CDL on CUB is 1.4% higher than that of KERL, because we can perform region representation from global image level and local region level, and encode richer information. The DT-RAM selects accurate discriminant regions by using reinforcement learning, and more discriminant patches are selected by learning the correlation among the regions and mining the interdependency of elements in the feature vector to emphasize information elements and restrain useless elements, so that the method is more excellent than the DT-RAM in performance, the accuracy is improved by 2.4% on CUB and 1.1% on an automobile.
And (3) speed comparison: we performed experiments on Titan X graphics at a batch size 8 measurement speed. Table 4 shows a comparison with other methods. WSDL also applies multi-scale features to generate a patch, and selects the patch by detecting a score. Although we have chosen 2 discriminant patches based on the discriminant score map, we outperform other methods in both speed and accuracy. When we increase the number of discriminant patches from 2 to 4, the proposed model can achieve the most advanced classification accuracy and can maintain the real-time performance of 40 fps.
TABLE 5 comparison of efficiency and effectiveness of different methods on CUB-200-
Quantitative analysis: to better illustrate the impact of the correlation between regions, we visualize the correlation weight coefficient map in fig. 5. The correlation coefficient map indicates the correlation between a certain fixed region and all regions. We can observe that the feature maps learned by association tend to pay attention to some fixed regions (highlighted regions). The more discriminative the regions, the greater their correlation. The most discriminative regions occupy a higher proportion in the clustering process.
As shown in fig. 6, we visualize the correlation aggregate feature map obtained by the operation of the weight sum and the residual feature map combining all the regions. The residual feature map is obtained by fusing the original feature map and the correlation aggregation feature map. The raw features map a response to a particular size of the region of discriminant and focus on a number of local details. The relevance aggregation profile has a global view, noting the most discriminating regions. The residual feature map contains both local detailed information and global discriminant information to achieve stable performance.
To illustrate the effectiveness of the discriminative area grouping module, we visualize the discriminative scoring graph with and without discriminative area grouping sub-networks in FIG. 7. We can see that the discriminant score map without the correlation stage focuses only on one discriminant region, and the selected patch concentrates on its neighboring regions. However, our discrimination area grouping sub-network can notice a plurality of valid areas as shown in fig. 7 (c). To present the image more intuitively, we show the positioning results in the original image. It can be observed that the selected patch is concentrated in several different regions, thus resulting in a region-gathering feature that is more discriminative.
In the method, a CDL method is provided for classifying weakly supervised fine grained images, and the method integrates a discriminant region grouping sub-network and a discriminant feature enhancement sub-network into a unified framework. The discriminative region grouping subnetwork may learn the correlation weight coefficients between regions to guide finding the discriminative patch, while the discriminative feature enhancement subnetwork may mine the interdependencies between internal discriminative elements in the feature vector to enhance informational elements and suppress unwanted elements. Experiments have shown that our method has consistent improved results on both fine-grained image datasets. We achieved the most advanced accuracy and real-time speed of 42 fps.
While the invention has been described in connection with specific embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (1)
1. A weakly supervised fine grained image classification algorithm based on relevance guided discriminant learning is characterized by comprising two sub-networks:
(1) discriminative area packet DRG subnetwork
A new method is proposed in the sub-network to establish the connection between the areas; given input feature representation MI∈RC ×H×WInputting the input feature representation into a discrimination region grouping module F:
MR=f(MI), (1)
wherein F is composed of three region-generating layers, a relation layer and a fusion layer; mR∈RC×H×WWherein W, H represent the width and height of the feature representation, and C represents the number of channels;
the area generation layer is calculated by convolution operation and matrix transformation as follows:
MT=f(WI·MI+bT), (2)
wherein, WT∈RC×1×1×CAnd bTLearning weight parameters and deviation vectors of the convolutional layer respectively; 1 × 1 is the size of the convolution kernel; mT∈RC×H×WRepresenting a new feature map; specifically, a 1 × 1 convolution filter is considered to be a small area detector; mTEach V on a channel at a fixed spatial positionT∈RC×1×1The vector represents a small area of the corresponding position in the original image;
in order to obtain the correlation weight coefficient between the regions, a relation layer is introduced to carry out two feature maps calculated by a region generation layerAndcomparison of the multiplicative fields of;
single correlation of two positions: p in the first feature map1And p of the second feature map2The correlation between two positions is defined as
Wherein, V1And V2Respectively representing regional characteristic vectors in different characteristic graphs; in actual operation, for each position p in the first profile1Calculating the correlation between the first map and all the positions in the second map;
for each combination of two positions, a correlation value is obtained; specifically, relative displacement of tissue in the channel is obtained, and an output correlation characteristic map M is obtainedC∈RK×H×WWhere K ═ W × H is the region of the input feature map; then, MCPassing through softmax layer to generate a discriminant correlation weight map R E RK×H×W:
In the forward propagation process, the higher the discriminant of the regions is, the greater the correlation between the regions is; for back propagation, the derivation is performed for each bottom blob accordingly; when the probability value of the classification is low, the punishment is propagated reversely to reduce the relevance weight of the two regions, and the feature representation calculated by the region generation layer is updated at the same time;
next, the feature vectors generated by the third region generator layerAnd the relevance weight graph R is input into the fusion layer f:
wherein the content of the first and second substances,is thatW ofthRows and whthVector of columns, RijkIs the iththLine j (th)thColumn kthA weight coefficient of the channel; at MFMiddle (i)thLine j (th)thVector of columnsBy combining all position vectors with corresponding correlation coefficients, wherein the feature mapThe index mapping relation with the correlation weight coefficient graph R is k ═ W-1 × W + h;
residual learning is proposed:
MR=α·MF+MI, (7)
wherein α is the adaptive weight parameter and is gradually trained and learned to assign more weight to discriminant-related features, α is the range [0, 1%]The initialization value is 0; mRThe method comprises the steps of including self-adaptive discriminant correlation characteristics and original input characteristics to select more discriminant patches;
(2) pick discriminant patch
Generating a default patch from three feature maps with different scales according to target detection; the profiles of the different layers have different reception fields RF; scaling the step size and the aspect ratio according to the scale size of the corresponding RF design patch of each feature map so as to enable different feature maps to be responsible for different sized discrimination areas;
for the feature map MRThe residual error characteristics MRInputting a rating layer; specifically, a 1 × 1 × N convolutional layer and a sigmoid function σ are added to learn the discriminative probability map Sedi RK×H×WThe influence of the discriminant region on the final classification result is shown;
S=σ(Ws·MR+bS), (8)
wherein, WS∈RC×1×1×NIs a parameter of the convolution kernel, N is a feature map MRThe default number of taps in a given position, bSIndicating a deviation;
meanwhile, a discriminant probability value is assigned to each default patch as pi,j,k(ii) a Each patch has its default coordinates (t)x,ty,tw,th) And a discriminative probability value si,j,kWherein s isi,j,kDenotes the iththLine j (th)thColumn kthThe value of the channel:
pi,j,k=[tx,ty,tw,th,si,j,k], (9)
finally, the network selects the first M patches with discriminant probability values, wherein M is a hyper-parameter;
(3) discriminative feature enhanced DFS subnetworks
A discriminant feature enhancer network to mine and exploit correlations between feature vector elements, consisting of a feature-aware filter layer and an enhancement layer; the feature-aware filtering layer is intended to generate a global filter to filter the features by negating negative values in the feature vectors; the enhancement layer is used for adaptively learning the interdependence relationship by using the weighted sum of the discriminant elements in the feature vector;
feature vector V'P∈RC×1Input to the feature perception filter to filter out unwanted information as follows:
VP=ReLU(BN(W*V'P+bP))(10)
wherein W and bPIs the weight matrix and deviation of the linear layer, BN and ReLU represent batch normalization and linear correction unit functions; v~ P∈RC×1Representing the filtered discriminant feature vector;
then, V is put~ PInput to the enhancement layer; specifically, the interdependence score graph S of the discriminant elementsE∈RC×CIs through V~ PAnd V~ PBy a matrix between transposes ofThe method operation is generated as follows:
where σ is the softmax function used for normalization;is the ith before normalizationthIndividual discriminant element and jthThe interdependence between the discriminant elements,represents the ith after normalizationthIndividual discriminant element and jthInterdependencies between discriminant elements; the larger the discrimination value between any two elements, the stronger their interdependencies;
next, pass the patch feature vector V~ PAnd interdependence score plot S~ EThe matrix multiplication between the two improves the discrimination capability of the feature vector:
V=V~ P⊙S~ E(12)
the internal space interdependence relation between discriminant elements of the feature vector is considered, information elements are enhanced, and elements with small inhibition effect are obtained; a residual learning mechanism is also introduced to ensure the robustness of the network:
V~=β·V+VP,(13)
where β is a weight that learns gradually from 0 and adjusts to an accurate value by back-propagation, V~Containing enhanced feature vectors V and original input feature vectors V for final classificationP;
(4) Loss function
The complete multitask penalty function L is expressed as:
wherein the content of the first and second substances,represents a fine-grained classification penalty;andrespectively representing a guide loss, a correlation loss and a grade loss; the balance between these losses is determined by the hyperparameter λ1,λ2,λ3Controlling; through multiple times of experimental verification, setting parameter lambda1=λ2=λ3=1;
The selected discriminant patch is expressed as P ═ P1,P2,...,PNAnd represent the corresponding discrimination probability score as S ═ S1,S2,...,SN}; then, the guidance loss and the associated loss and the rank loss are defined as follows:
where X is the original image, function C is a confidence function reflecting the probability of classifying into the correct class, PcIs a concatenation of all selected patch features;
the purpose of the steering loss function is to steer the network to select a more discriminative area; when the prediction probability value of the selected area is lower than that of the whole image, the network is punished and carries out weight adjustment through back propagation; the correlation loss function ensures that the prediction probability of the combined feature is greater than that of a single patch feature; the rank penalty stimulates the discriminant score and the final classification probability value in equal order, trying to keep both of the selected patches consistent.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910986800.8A CN110796183A (en) | 2019-10-17 | 2019-10-17 | Weak supervision fine-grained image classification algorithm based on relevance-guided discriminant learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910986800.8A CN110796183A (en) | 2019-10-17 | 2019-10-17 | Weak supervision fine-grained image classification algorithm based on relevance-guided discriminant learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110796183A true CN110796183A (en) | 2020-02-14 |
Family
ID=69439314
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910986800.8A Withdrawn CN110796183A (en) | 2019-10-17 | 2019-10-17 | Weak supervision fine-grained image classification algorithm based on relevance-guided discriminant learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110796183A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111507403A (en) * | 2020-04-17 | 2020-08-07 | 腾讯科技(深圳)有限公司 | Image classification method and device, computer equipment and storage medium |
CN112541463A (en) * | 2020-12-21 | 2021-03-23 | 上海眼控科技股份有限公司 | Model training method, appearance segmentation method, device and storage medium |
CN117173422A (en) * | 2023-08-07 | 2023-12-05 | 广东第二师范学院 | Fine granularity image recognition method based on graph fusion multi-scale feature learning |
Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101894275A (en) * | 2010-06-29 | 2010-11-24 | 武汉大学 | Weakly supervised method for classifying SAR images |
US20160132750A1 (en) * | 2014-11-07 | 2016-05-12 | Adobe Systems Incorporated | Local feature representation for image recognition |
US20160140424A1 (en) * | 2014-11-13 | 2016-05-19 | Nec Laboratories America, Inc. | Object-centric Fine-grained Image Classification |
US20160140438A1 (en) * | 2014-11-13 | 2016-05-19 | Nec Laboratories America, Inc. | Hyper-class Augmented and Regularized Deep Learning for Fine-grained Image Classification |
US20160210533A1 (en) * | 2015-01-19 | 2016-07-21 | Ebay Inc | Fine-grained categorization |
US20160307072A1 (en) * | 2015-04-17 | 2016-10-20 | Nec Laboratories America, Inc. | Fine-grained Image Classification by Exploring Bipartite-Graph Labels |
CN107766890A (en) * | 2017-10-31 | 2018-03-06 | 天津大学 | The improved method that identification segment learns in a kind of fine granularity identification |
CN109002834A (en) * | 2018-06-15 | 2018-12-14 | 东南大学 | Fine granularity image classification method based on multi-modal characterization |
WO2019136946A1 (en) * | 2018-01-15 | 2019-07-18 | 中山大学 | Deep learning-based weakly supervised salient object detection method and system |
WO2019140767A1 (en) * | 2018-01-18 | 2019-07-25 | 苏州大学张家港工业技术研究院 | Recognition system for security check and control method thereof |
CN110097067A (en) * | 2018-12-25 | 2019-08-06 | 西北工业大学 | It is a kind of based on layer into the Weakly supervised fine granularity image classification method of formula eigentransformation |
CN110097090A (en) * | 2019-04-10 | 2019-08-06 | 东南大学 | A kind of image fine granularity recognition methods based on multi-scale feature fusion |
CN110135502A (en) * | 2019-05-17 | 2019-08-16 | 东南大学 | A kind of image fine granularity recognition methods based on intensified learning strategy |
CN110147834A (en) * | 2019-05-10 | 2019-08-20 | 上海理工大学 | Fine granularity image classification method based on rarefaction bilinearity convolutional neural networks |
CN110309858A (en) * | 2019-06-05 | 2019-10-08 | 大连理工大学 | Based on the fine granularity image classification algorithms for differentiating study |
-
2019
- 2019-10-17 CN CN201910986800.8A patent/CN110796183A/en not_active Withdrawn
Patent Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101894275A (en) * | 2010-06-29 | 2010-11-24 | 武汉大学 | Weakly supervised method for classifying SAR images |
US20160132750A1 (en) * | 2014-11-07 | 2016-05-12 | Adobe Systems Incorporated | Local feature representation for image recognition |
US20160140424A1 (en) * | 2014-11-13 | 2016-05-19 | Nec Laboratories America, Inc. | Object-centric Fine-grained Image Classification |
US20160140438A1 (en) * | 2014-11-13 | 2016-05-19 | Nec Laboratories America, Inc. | Hyper-class Augmented and Regularized Deep Learning for Fine-grained Image Classification |
US20160210533A1 (en) * | 2015-01-19 | 2016-07-21 | Ebay Inc | Fine-grained categorization |
US20160307072A1 (en) * | 2015-04-17 | 2016-10-20 | Nec Laboratories America, Inc. | Fine-grained Image Classification by Exploring Bipartite-Graph Labels |
CN107766890A (en) * | 2017-10-31 | 2018-03-06 | 天津大学 | The improved method that identification segment learns in a kind of fine granularity identification |
WO2019136946A1 (en) * | 2018-01-15 | 2019-07-18 | 中山大学 | Deep learning-based weakly supervised salient object detection method and system |
WO2019140767A1 (en) * | 2018-01-18 | 2019-07-25 | 苏州大学张家港工业技术研究院 | Recognition system for security check and control method thereof |
CN109002834A (en) * | 2018-06-15 | 2018-12-14 | 东南大学 | Fine granularity image classification method based on multi-modal characterization |
CN110097067A (en) * | 2018-12-25 | 2019-08-06 | 西北工业大学 | It is a kind of based on layer into the Weakly supervised fine granularity image classification method of formula eigentransformation |
CN110097090A (en) * | 2019-04-10 | 2019-08-06 | 东南大学 | A kind of image fine granularity recognition methods based on multi-scale feature fusion |
CN110147834A (en) * | 2019-05-10 | 2019-08-20 | 上海理工大学 | Fine granularity image classification method based on rarefaction bilinearity convolutional neural networks |
CN110135502A (en) * | 2019-05-17 | 2019-08-16 | 东南大学 | A kind of image fine granularity recognition methods based on intensified learning strategy |
CN110309858A (en) * | 2019-06-05 | 2019-10-08 | 大连理工大学 | Based on the fine granularity image classification algorithms for differentiating study |
Non-Patent Citations (9)
Title |
---|
JIALI XI ET AL.: "Fine-Grained Fusion With Distractor Suppression for Video-Based Person Re-Identification", 《IEEE ACCESS》 * |
LIN WU ET AL.: "Deep Attention-Based Spatially Recursive Networks for Fine-Grained Visual Recognition", 《IEEE TRANSACTIONS ON CYBERNETICS》 * |
PENG ZHANG ET AL.: "REAPS: Towards Better Recognition of Fine-grained Images by Region Attending and Part Sequencing", 《ARXIV》 * |
ZHIHUI WANG ET AL.: "accurate and fast fine-grained image classification via discriminative learning", 《2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME)》 * |
ZHIHUI WANG ET AL.: "Weakly Supervised Fine-grained Image Classification via Correlation-guided Discriminative Learning", 《PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA》 * |
杨娟 等: "区域建议网络的细粒度车型识别", 《中国图象图形学报》 * |
王虹: "多特征融合的细粒度图像检索算法", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 * |
郑光剑: "基于点击深度模型的细粒度图像识别", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 * |
金科: "基于双线性卷积神经网络的图像分类方法研究", 《万方》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111507403A (en) * | 2020-04-17 | 2020-08-07 | 腾讯科技(深圳)有限公司 | Image classification method and device, computer equipment and storage medium |
CN112541463A (en) * | 2020-12-21 | 2021-03-23 | 上海眼控科技股份有限公司 | Model training method, appearance segmentation method, device and storage medium |
CN117173422A (en) * | 2023-08-07 | 2023-12-05 | 广东第二师范学院 | Fine granularity image recognition method based on graph fusion multi-scale feature learning |
CN117173422B (en) * | 2023-08-07 | 2024-02-13 | 广东第二师范学院 | Fine granularity image recognition method based on graph fusion multi-scale feature learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110837836B (en) | Semi-supervised semantic segmentation method based on maximized confidence | |
CN110796183A (en) | Weak supervision fine-grained image classification algorithm based on relevance-guided discriminant learning | |
Wang et al. | Salient object detection based on multi-scale contrast | |
CN103679132B (en) | A kind of nude picture detection method and system | |
CN111062438B (en) | Image propagation weak supervision fine granularity image classification algorithm based on correlation learning | |
CN108920643B (en) | Weighted multi-feature fusion fine-grained image retrieval method | |
CN110309858B (en) | Fine-grained image classification method based on discriminant learning | |
CN108921047B (en) | Multi-model voting mean value action identification method based on cross-layer fusion | |
Lachaize et al. | Evidential framework for error correcting output code classification | |
CN105138672A (en) | Multi-feature fusion image retrieval method | |
US20220277192A1 (en) | Visual Analytics System to Assess, Understand, and Improve Deep Neural Networks | |
Liu et al. | Cross-part learning for fine-grained image classification | |
Ge et al. | Semantic-guided reinforced region embedding for generalized zero-shot learning | |
CN115412324A (en) | Air-space-ground network intrusion detection method based on multi-mode conditional countermeasure field adaptation | |
CN105096293A (en) | Method and device used for processing to-be-processed block of urine sediment image | |
Lin et al. | MCCH: A novel convex hull prior based solution for saliency detection | |
Pang et al. | Over-sampling strategy-based class-imbalanced salient object detection and its application in underwater scene | |
CN114359742B (en) | Weighted loss function calculation method for optimizing small target detection | |
Xiang et al. | Double-branch fusion network with a parallel attention selection mechanism for camouflaged object detection | |
CN111242102B (en) | Fine-grained image recognition algorithm of Gaussian mixture model based on discriminant feature guide | |
CN116109649A (en) | 3D point cloud instance segmentation method based on semantic error correction | |
Carlson et al. | Application of a weighted projection measure for robust hidden Markov model based speech recognition | |
CN112836511A (en) | Knowledge graph context embedding method based on cooperative relationship | |
CN114168780A (en) | Multimodal data processing method, electronic device, and storage medium | |
Faria et al. | Classifier selection based on the correlation of diversity measures: When fewer is more |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20200214 |