CN111860068A - Fine-grained bird identification method based on cross-layer simplified bilinear network - Google Patents
Fine-grained bird identification method based on cross-layer simplified bilinear network Download PDFInfo
- Publication number
- CN111860068A CN111860068A CN201910360985.1A CN201910360985A CN111860068A CN 111860068 A CN111860068 A CN 111860068A CN 201910360985 A CN201910360985 A CN 201910360985A CN 111860068 A CN111860068 A CN 111860068A
- Authority
- CN
- China
- Prior art keywords
- bilinear
- cross
- feature
- network
- grained
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 21
- 239000010410 layer Substances 0.000 claims abstract description 25
- 238000012549 training Methods 0.000 claims abstract description 13
- 230000003993 interaction Effects 0.000 claims abstract description 6
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 5
- 239000011229 interlayer Substances 0.000 claims abstract description 4
- 238000012360 testing method Methods 0.000 claims abstract description 4
- 239000013598 vector Substances 0.000 claims description 9
- 238000000605 extraction Methods 0.000 claims description 6
- 238000005457 optimization Methods 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 6
- 230000006870 function Effects 0.000 description 4
- 230000004044 response Effects 0.000 description 4
- 230000004913 activation Effects 0.000 description 3
- 238000010606 normalization Methods 0.000 description 3
- 230000004927 fusion Effects 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 238000011176 pooling Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 230000004083 survival effect Effects 0.000 description 1
- 238000009827 uniform distribution Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a fine-grained bird identification method based on a cross-layer simplified bilinear network. The method comprises the following steps: 5994 training pictures and 5794 test pictures in the CUB-200-2011 data set are preprocessed, and then the processed images are input into a VGG-16 convolutional neural network to extract a feature map of the bird image. In order to consider the interlayer feature interaction, three groups of simplified bilinear feature representations are extracted from the obtained feature maps of different high-level convolutions, normalized and then cascaded to be sent to a softmax classifier. And finally, optimizing the whole network by utilizing cross entropy loss and assisting in pair confusion loss. The identification method described by the invention has the advantages of low feature dimension, less calculation amount, high identification rate, strong robustness and the like, has a certain use value aiming at the specific field of fine-grained image classification, and can be practically applied.
Description
Technical Field
The invention designs a fine-grained bird identification method based on a cross-layer simplified bilinear network, and relates to deep learning and fine-grained image classification.
Background
Fine-grained classification is primarily aimed at distinguishing its numerous sub-categories, such as different kinds of birds, flowers, etc., under the same basic category. Compared with a coarse-grained image, the difference between classes of the fine-grained image is slight, the intra-class difference is obvious, the fine-grained characteristic is often more complex to obtain, the complex parameter in the model is determined by relying on the labeling of the image, and the overfitting phenomenon caused by a small amount of data is avoided as much as possible. The early fine-grained identification method relies on manually marked local information to carry out strong supervised learning on a classification model. Local labeling usually needs experts in the corresponding field to complete, so that the manual participation degree of the method is high. In recent years, a weakly supervised learning method that only needs an image class label becomes a research focus.
The mainstream fine-grained classification method based on weak supervision information mainly has two types. The first type employs a structure that "locates" sub-networks to assist in "classifying" the primary network, enhancing the learning capabilities of the classification network by locating local information (e.g., component locations or segmentation masks) provided by the network. Such approaches require a trade-off between location and identification capabilities, which may degrade the performance of a single network. This trade-off is also reflected in the practice that training usually involves alternating optimization of the two networks or training the two networks separately and then jointly adjusting. The second type is end-to-end feature coding, which enhances the learning capabilities of convolutional neural networks by coding the higher order statistics of the convolutional feature map. Such methods seek a robust representation of the image, and conventional representations include VLAD, Fisher vectors with SIFT features. Such models capture local feature interactions in a translation-invariant manner, which is particularly useful for texture and fine-grained recognition tasks.
The invention provides a fine-grained bird identification method based on a cross-layer simplified Bilinear network (BCNN) based on a simplified Bilinear network of end-to-end coding, which makes full use of the inter-layer characteristic correlation and the interactivity of characteristic maps from different Convolutional layers and regularizes a cross entropy loss function by pairwise confusion. The method makes up the inadequacy of the bilinear feature obtained by a single convolution layer, has lower dimensionality and less calculation amount compared with the BCNN feature, and obtains the recognition rate of 86.6 percent on the CUB-200-plus-2011 data set.
Disclosure of Invention
The invention realizes the purpose through the following technical scheme, which comprises the following steps:
(1) and (5) bird image feature extraction. 5994 training pictures and 5794 test pictures in the CUB-200-2011 data set are preprocessed, and then the processed images are input into a convolutional neural network to extract depth characterization vectors of the images.
(2) And (4) cross-layer simplified bilinear feature fusion. In order to consider the interlayer feature interaction, the feature maps of different high-level convolutions obtained in the step (1) are subjected to simplified bilinear operation to obtain three groups of bilinear feature representations, and the three groups of bilinear feature representations are subjected to normalization operation, then are cascaded and then are sent to a softmax classifier.
(3) The cross-entropy loss is utilized and assisted to optimize the network by the pair-wise confusion loss. Randomly dividing a sample in a training batch into two groups of picture pairs, and if the picture pairs have the same label, directly calculating cross entropy loss; if the picture pairs have different labels, adding paired Euclidean loss as a regularization term on the basis of cross entropy loss.
Drawings
FIG. 1 bird feature image extraction network
FIG. 2 is a schematic diagram of different high-level convolution activation responses
FIG. 3 is a simplified bilinear operation diagram
FIG. 4 is a block diagram of a cross-layer reduced bilinear network
FIG. 5 network training method with pairwise confusion loss
Detailed Description
The invention is further described below with reference to the accompanying drawings:
FIG. 1 is a VGG-16 based bird feature image extraction network. The image feature extractor selects VGG-16, and removes the fifth pooling layer pool5 and three full-connected layers fc6, fc7 and fc 8. Firstly, preprocessing a data set picture, and scaling the data set picture to 512 × S according to the length and width. In the training stage, pictures are disordered, horizontally turned and randomly cut, and the input size is 448 multiplied by 448; the testing stage performs center cropping only on the picture.
FIG. 2 is a diagram of different high-level convolution activation responses in a feature extraction network. As shown in fig. 2, the discrimination of each component in the input image differs between different convolutional layers. As in the first row of pictures in fig. 2, conv5_1 has a strong response to all of the tail, head and wings of the black-legged geoduck, while conv5_3 retains only the activation response to the head. Inspired by the observation, in order to better capture the characteristic relation between layers, the invention provides a cross-layer simplified bilinear pooling method. The method considers the characteristic interaction between layers, integrates a plurality of cross-layer bilinear characteristics and carries out characteristic fusion before final classification so as to enhance the representation capability of the characteristics, and avoids additional training parameters. In contrast to BCNN, which only utilizes features from a single convolutional layer, the present method treats each convolutional layer in a convolutional neural network as a partial attribute extractor, utilizing partial feature interactions from multiple layers.
FIG. 3 is a simplified bilinear operation diagram. The concrete implementation steps are as follows:
(1) first, the feature vector f is transformed using the Count Sketch function Ψk∈RcMapping to a feature space, k 1, 2. Two vectors s are definedk∈{-1,1},hkE {1, 1.., d }, the initialization is subject to uniform distribution, and the value is fixed in subsequent operations. h iskFor finding fkThe ith element fk(i) Corresponding index j ═ h in feature spacek(i) Then there is
Ψ(fk,hk,sk)={Q1,Q2,...,Qd}
Wherein: i ∈ {1,..., C }; j ∈ {1,..., d }.
(2) The Tensor Sketch algorithm indicates that a Count Sketch of the two vector outer products can be obtained by calculating the convolution of two feature vectors, Count Sketch, which can be expressed as
Where denotes a convolution operation. The convolution theorem states that convolution in the time domain is equivalent to a product in the frequency domain. Thus, the above formula can be represented as
Wherein F represents a fast Fourier transform, F-1Which represents the inverse of the fourier transform,representing the multiplication of pairs of elements.
(3) And carrying out normalization operation on the three groups of bilinear eigenvectors obtained in the step. Firstly, the bilinear feature x ═ Ψ (i) is obtained by the square root of the symbolThen l2 normalization is carried out (z ← y/| | y | survival circuitry)2)。
Fig. 4 is a block diagram of a cross-layer reduced bilinear network. The concrete implementation steps are as follows:
(1) VGG-16 is selected as a feature extractor, output feature maps of different high-level convolutions are obtained from bird feature image extraction networks and recorded as f 1(x,y),f2(x,y),f3(x, y) wherein f1、f2、f3The characteristic functions respectively correspond to the output characteristic functions of the fifth convolution layers conv5_1, conv5_2 and conv5_3 of the VGG-16.
(2) Combining the output characteristic maps f of different layers according to the method of FIG. 3AWith another layer profile fBAnd carrying out simplified bilinear operation to obtain three groups of bilinear eigenvectors.
(3) And (4) the normalized features are vector-valued and sent into a softmax classifier for classification.
FIG. 5 is a network training method with pairwise confusion loss. The core idea of pair-wise obfuscation is: randomly dividing a sample in a training batch into two groups of picture pairs, and if the picture pairs have the same label, directly calculating cross entropy loss; if the picture pairs have different labels, adding paired Euclidean loss as a regularization term on the basis of cross entropy loss. The method mainly comprises the following steps:
(1) samples in a training batch are randomly divided into two groups (x)1,y1)、(x2,y2)。
(2) Obtain the class label vector label (x) of the two sets of samples1) And label (x)2)。
(3) If two groups of samples have the same label, the cross entropy loss is directly calculatedIf the two groups of samples have different labels, adding Euclidean pairwise confusion loss as a regularization term on the basis of cross entropy loss, namelyWherein DECEpoendo's distance, L ECTable cross entropy loss, pθ(y|xi) Probability vector output by softmax classifier.
(4) Back-propagating the losses and updating the network parameters.
(5) Enter the next batch and jump to step (1).
Claims (5)
1. A fine-grained bird identification method based on a cross-layer simplified bilinear network is characterized by comprising the following steps:
(1) firstly, 5994 training pictures and 5794 testing pictures in a CUB-200-2011 data set are preprocessed, and then the processed images are input into a convolutional neural network VGG-16 to extract a feature map of a bird image;
(2) in order to consider interlayer feature interaction, three groups of simplified bilinear feature vectors are extracted from the feature maps of different high-level convolutions obtained in the step (1) in a cross-layer mode, normalized, cascaded and sent to a softmax classifier;
(3) cross entropy loss is utilized and assisted to pair-wise confusion optimization networks.
3. The three sets of reduced bilinear eigenvectors of claim 1, wherein the output eigenvector dimension takes the value of 8192.
4. The optimization network using cross entropy loss according to claim 1, wherein samples in a training batch are randomly divided into two groups of picture pairs, and if the picture pairs have the same label, cross entropy loss is directly calculated; if the picture pairs have different labels, adding paired Euclidean loss as a regularization term on the basis of cross entropy loss.
5. The Euclidean loss weight of claim 4 takes a value of 20 and the cross-entropy loss weight takes a value of 1.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910360985.1A CN111860068A (en) | 2019-04-30 | 2019-04-30 | Fine-grained bird identification method based on cross-layer simplified bilinear network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910360985.1A CN111860068A (en) | 2019-04-30 | 2019-04-30 | Fine-grained bird identification method based on cross-layer simplified bilinear network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111860068A true CN111860068A (en) | 2020-10-30 |
Family
ID=72966490
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910360985.1A Pending CN111860068A (en) | 2019-04-30 | 2019-04-30 | Fine-grained bird identification method based on cross-layer simplified bilinear network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111860068A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114638993A (en) * | 2022-03-21 | 2022-06-17 | 华南师范大学 | Image fine-grained classification method and device based on deep learning |
CN114648667A (en) * | 2022-03-31 | 2022-06-21 | 北京工业大学 | Bird image fine-granularity identification method based on lightweight bilinear CNN model |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107704877A (en) * | 2017-10-09 | 2018-02-16 | 哈尔滨工业大学深圳研究生院 | A kind of image privacy cognitive method based on deep learning |
CN108549926A (en) * | 2018-03-09 | 2018-09-18 | 中山大学 | A kind of deep neural network and training method for refining identification vehicle attribute |
CN108875827A (en) * | 2018-06-15 | 2018-11-23 | 广州深域信息科技有限公司 | A kind of method and system of fine granularity image classification |
CN109002845A (en) * | 2018-06-29 | 2018-12-14 | 西安交通大学 | Fine granularity image classification method based on depth convolutional neural networks |
CN109086792A (en) * | 2018-06-26 | 2018-12-25 | 上海理工大学 | Based on the fine granularity image classification method for detecting and identifying the network architecture |
WO2019018063A1 (en) * | 2017-07-19 | 2019-01-24 | Microsoft Technology Licensing, Llc | Fine-grained image recognition |
CN109685115A (en) * | 2018-11-30 | 2019-04-26 | 西北大学 | A kind of the fine granularity conceptual model and learning method of bilinearity Fusion Features |
-
2019
- 2019-04-30 CN CN201910360985.1A patent/CN111860068A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019018063A1 (en) * | 2017-07-19 | 2019-01-24 | Microsoft Technology Licensing, Llc | Fine-grained image recognition |
CN107704877A (en) * | 2017-10-09 | 2018-02-16 | 哈尔滨工业大学深圳研究生院 | A kind of image privacy cognitive method based on deep learning |
CN108549926A (en) * | 2018-03-09 | 2018-09-18 | 中山大学 | A kind of deep neural network and training method for refining identification vehicle attribute |
CN108875827A (en) * | 2018-06-15 | 2018-11-23 | 广州深域信息科技有限公司 | A kind of method and system of fine granularity image classification |
CN109086792A (en) * | 2018-06-26 | 2018-12-25 | 上海理工大学 | Based on the fine granularity image classification method for detecting and identifying the network architecture |
CN109002845A (en) * | 2018-06-29 | 2018-12-14 | 西安交通大学 | Fine granularity image classification method based on depth convolutional neural networks |
CN109685115A (en) * | 2018-11-30 | 2019-04-26 | 西北大学 | A kind of the fine granularity conceptual model and learning method of bilinearity Fusion Features |
Non-Patent Citations (6)
Title |
---|
ABHIMANYU DUBEY等: "Pairwise Confusion for Fine-Grained Visual Classification", 《PROCEEDINGS OF THE EUROPEAN CONFERENCE ON COMPUTER VISION》 * |
AKIRA FUKUI等: "Multimodal compact bilinear pooling for visual question answering and visual grounding", 《COMPUTER VISION AND PATTERN RECOGNITION》 * |
CHAOJIAN YU等: "Hierarchical Bilinear Pooling for Fine-Grained Visual Recognition", 《PROCEEDINGS OF THE EUROPEAN CONFERENCE ON COMPUTER VISION》 * |
YANG GAO等: "Compact Bilinear Pooling", 《2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》 * |
单倩文等: "基于改进多尺度特征图的目标快速检测与识别算法", 《激光与光电子学进展》 * |
张阳: "细粒度图像分类算法研究", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114638993A (en) * | 2022-03-21 | 2022-06-17 | 华南师范大学 | Image fine-grained classification method and device based on deep learning |
CN114648667A (en) * | 2022-03-31 | 2022-06-21 | 北京工业大学 | Bird image fine-granularity identification method based on lightweight bilinear CNN model |
CN114648667B (en) * | 2022-03-31 | 2024-06-07 | 北京工业大学 | Bird image fine-granularity recognition method based on lightweight bilinear CNN model |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Ding et al. | Semi-supervised locality preserving dense graph neural network with ARMA filters and context-aware learning for hyperspectral image classification | |
Yue-Hei Ng et al. | Exploiting local features from deep networks for image retrieval | |
CN111723675B (en) | Remote sensing image scene classification method based on multiple similarity measurement deep learning | |
CN106845510B (en) | Chinese traditional visual culture symbol recognition method based on depth level feature fusion | |
CN104268593B (en) | The face identification method of many rarefaction representations under a kind of Small Sample Size | |
Cao et al. | Landmark recognition with sparse representation classification and extreme learning machine | |
CN105138973B (en) | The method and apparatus of face authentication | |
CN110348399B (en) | Hyperspectral intelligent classification method based on prototype learning mechanism and multidimensional residual error network | |
CN109978041B (en) | Hyperspectral image classification method based on alternative updating convolutional neural network | |
CN103927531B (en) | It is a kind of based on local binary and the face identification method of particle group optimizing BP neural network | |
CN109766858A (en) | Three-dimensional convolution neural network hyperspectral image classification method combined with bilateral filtering | |
CN112580590A (en) | Finger vein identification method based on multi-semantic feature fusion network | |
CN107330355B (en) | Deep pedestrian re-identification method based on positive sample balance constraint | |
CN105528595A (en) | Method for identifying and positioning power transmission line insulators in unmanned aerial vehicle aerial images | |
CN109002755B (en) | Age estimation model construction method and estimation method based on face image | |
CN110097060B (en) | Open set identification method for trunk image | |
CN108021869A (en) | A kind of convolutional neural networks tracking of combination gaussian kernel function | |
CN105718889A (en) | Human face identity recognition method based on GB(2D)2PCANet depth convolution model | |
CN109344898A (en) | Convolutional neural networks image classification method based on sparse coding pre-training | |
CN104966075B (en) | A kind of face identification method and system differentiating feature based on two dimension | |
CN112329818B (en) | Hyperspectral image non-supervision classification method based on graph convolution network embedded characterization | |
CN109543546B (en) | Gait age estimation method based on depth sequence distribution regression | |
CN109598711A (en) | A kind of thermal image defect extracting method based on feature mining and neural network | |
Priyankara et al. | Computer assisted plant identification system for Android | |
CN112150359B (en) | Unmanned aerial vehicle image fast splicing method based on machine learning and feature point identification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20201030 |