CN110188816B - Image fine granularity identification method based on multi-stream multi-scale cross bilinear features - Google Patents

Image fine granularity identification method based on multi-stream multi-scale cross bilinear features Download PDF

Info

Publication number
CN110188816B
CN110188816B CN201910450570.3A CN201910450570A CN110188816B CN 110188816 B CN110188816 B CN 110188816B CN 201910450570 A CN201910450570 A CN 201910450570A CN 110188816 B CN110188816 B CN 110188816B
Authority
CN
China
Prior art keywords
stream
features
bilinear
image
cross
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910450570.3A
Other languages
Chinese (zh)
Other versions
CN110188816A (en
Inventor
李春国
邓亭强
杨绿溪
徐琴珍
俞菲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN201910450570.3A priority Critical patent/CN110188816B/en
Publication of CN110188816A publication Critical patent/CN110188816A/en
Application granted granted Critical
Publication of CN110188816B publication Critical patent/CN110188816B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention provides an image fine granularity identification method based on multi-stream multi-scale cross bilinear features. Aiming at the problems of insufficient extraction and insufficient utilization of the fine-grained features of the images, the method utilizes a multi-stream network to extract cross bilinear features, the features can characterize finer local features of the images, and the problem of insufficient feature extraction is solved; the method for enhancing and fusing the multi-scale bottom layer bilinear features by using the image random mixing solves the problem of insufficient feature utilization. Experiments prove that the recognition accuracy of the fine granularity recognition method based on the multi-scale cross bilinear features of the multi-stream network fusion on the CUB-200-2011 public data set is remarkably improved compared with that of the conventional method, and the optimal fine granularity recognition accuracy is respectively achieved.

Description

Image fine granularity identification method based on multi-stream multi-scale cross bilinear features
Technical Field
The invention relates to the fields of computer vision, artificial intelligence and multimedia signal processing, in particular to an image fine granularity recognition method based on multi-stream multi-scale cross bilinear features.
Background
With the continuous development of the deep convolutional neural network, the technology such as deep learning and the like continuously improves the precision and reasoning efficiency of tasks such as target detection, semantic segmentation, target tracking, image classification and the like in computer vision, and the technology is mainly beneficial to the improvement of the powerful nonlinear modeling capability of the convolutional neural network, the current massive data and the computing power of hardware equipment. This has also led to tremendous development in the computer vision task of fine-grained image recognition. At present, a method for image classification tasks is relatively mature, which is reflected in that the identification index on an image Net data set is at a relatively high level, and the image fine-granularity identification tasks have a wider development space and a more valuable application space requirement because of the relatively difficult identification subclasses.
Fine-grained identification of images is relative to coarse-grained identification, which generally refers to the completion of identification of different kinds of classification with large differences, such as people, chairs, cars, cats, etc.; while the task of fine-grained identification is to identify subclasses in a target large class, such as 200 Birds identification in the california college Birds database (CUB-200-2011, caltech-UCSD Birds-200-2011) data set, 196 class Cars in the Stanford Cars data set (Stanford Cars) proposed by the university of stanforum, and the like. Therefore, the fine-granularity recognition task has the characteristics of small variances among the subclasses and large variances inside the subclasses, and compared with the coarse-granularity recognition of images, the fine-granularity image subclasses are easy to be confused, the distinguishable information area points are few, the similar features among the subclasses are many, and the like, so that the difficulty of fine-granularity recognition of the images is increased.
Disclosure of Invention
Aiming at the fine granularity recognition task of the image target subclass, the invention provides an image fine granularity recognition method based on multi-stream multi-scale cross bilinear features, which uses a multi-stream network to extract fine granularity image features, calculates cross bilinear features, utilizes the cross features after fusion to predict fine granularity categories, and for the purpose, the invention provides an image fine granularity recognition method based on the multi-stream multi-scale cross bilinear features, uses the multi-stream network to extract fine granularity image features, calculates cross bilinear features, and utilizes the cross features after fusion to predict fine granularity categories, and the method comprises the following steps:
(1) Data augmentation is performed on the input image;
(2) Extracting image features by using a multi-stream basic network, and calculating cross bilinear features and bottom bilinear features;
(3) And predicting the fine granularity category by using the fused characteristics.
As a further improvement of the invention, the image is amplified in the step (1), and the specific steps are as follows:
step 2.1: the data is enhanced by using offline rotation and online rotation, wherein the offline rotation is to rotate a data set every 10 degrees at [0,359], the online rotation is to randomly rotate a picture input into a network at a certain angle, and besides, the brightness enhancement is also used, and the data enhancement is performed by using a random clipping mode;
step 2.2: data augmentation by random image blending enhancement, let U (ε) be [0,1 ]]Upper part of the cylinderRandom probability distribution, each randomly sampling ε -U (ε), for two sets of training samples x 1 And x 2 Random combination is carried out according to probability distribution to obtain epsilon x 1 +(1-ε)x 2 The corresponding label is epsilon h 1 +(1-ε)h 2 This completes the random image blending enhancement.
As a further improvement of the present invention, in the step (2), image features are extracted by using a multi-stream base network and cross bilinear features are calculated:
step 3.1: and extracting the characteristics of the image after data augmentation by using a multi-stream network. Feeding the amplified pictures into a K-path convolutional neural network, wherein the K-path convolutional neural network Stream 1, stream 2 and Stream 3 respectively adopt a ResNet-34 network, a ResNet-50 network and a VGG-16 network, and the K-path convolutional neural network, the ResNet-50 network and the VGG-16 network are used as extraction networks of basic characteristics, so that the characteristics of fine-grained images are obtained;
step 3.2: the cross bilinear characteristic of the multi-Stream network is calculated, and the cross bilinear characteristic of the Stream 1 and the Stream 2, the bilinear characteristic of the Stream 1 and the Stream 3 and the bilinear characteristic of the Stream 2 and the Stream 3 are respectively extracted, so that the cross bilinear characteristic of the K-path convolutional neural network is obtained, and the calculation method of the bilinear characteristic is as follows: inputting two paths of convolutional neural network feature graphs, namely A and B, respectively, transposing the A and multiplying the A by the B, normalizing the result, and regularizing the L2;
step 3.3: calculating bilinear features of the bottom layer, wherein the bottom layer bilinear features are obtained by performing second-order bilinear pooling by using the bottom layer and the bottom layer, wherein the bottom layer is selected from a ResNet-5a layer of Stream 1, namely a first layer of a fifth bottleneck block, a ResNet-5a layer of Stream 2, namely a first layer of a fifth bottleneck block, and a Conv5_1 layer of Stream 3, namely a first layer of a fifth convolution block, and the bilinear features of the bottom layer and the cross bilinear features of the high layer are fused.
As a further improvement of the present invention, the fine granularity category is predicted in the step (3) by using the fused features:
step 4.1: fusing the cross bilinear features and the bottom bilinear features, wherein two feature fusion modes, namely a splicing mode and an element addition mode, are adopted, and finally, the fused features are sent to a full-connection layer for classification, and a softmax vector is calculated to obtain a predicted result;
wherein the loss function is a cross entropy loss function to guide the training and learning process;
Figure BDA0002075010750000021
wherein y is i A true category label is indicated and,
Figure BDA0002075010750000022
category label information representing network predictions. C is the total number of categories on the training dataset.
So far, the image fine granularity identification method based on the multi-stream multi-scale cross bilinear features is completed.
The invention provides an image fine granularity identification method based on multi-stream multi-scale cross bilinear features. Aiming at the problems of insufficient extraction and insufficient utilization of the fine-grained features of the images, the method utilizes a multi-stream network to extract cross bilinear features, the features can characterize finer local features of the images, and the problem of insufficient feature extraction is solved; the method for enhancing and fusing the multi-scale bottom layer bilinear features by using the image random mixing solves the problem of insufficient feature utilization. Experiments prove that the recognition accuracy of the fine granularity recognition method based on the multi-scale cross bilinear features of the multi-stream network fusion on the CUB-200-2011 public data set is remarkably improved compared with that of the conventional method, and the optimal fine granularity recognition accuracy is respectively achieved.
Drawings
FIG. 1 is a fine particle data augmentation schematic of the present invention.
FIG. 2 is a diagram of an image fine granularity recognition method based on multi-stream multi-scale cross bilinear features of the present invention.
FIG. 3 is a graph showing the change of accuracy rate with training wheel number on a CUB-200-2011 test data set according to the present invention
FIG. 4 the present invention discloses a partial test sample (upper left corner is the predictive category of the present invention) on the CUB-200-2011 dataset.
Detailed Description
The invention is described in further detail below with reference to the attached drawings and detailed description:
the invention provides an image fine granularity recognition method based on multi-stream multi-scale cross bilinear features, which uses a multi-stream network to extract fine granularity image features, calculates cross bilinear features, and predicts fine granularity categories by utilizing the fused cross features.
The following takes a fine granularity disclosure data set as an example, and a specific embodiment of an image fine granularity recognition method based on multi-stream multi-scale cross bilinear features is further described in detail with reference to the accompanying drawings. The invention uses a multi-flow network to extract the fine granularity image characteristics, calculates the cross bilinear characteristics, and predicts the fine granularity category by utilizing the cross characteristics after fusion. The method comprises the following steps:
(1) The input image is first data augmented.
Step 1.1: the data is enhanced by using offline rotation, which is to rotate the data set every 10 degrees at [0,359], and online rotation, which is to randomly rotate the picture of the input network at a certain angle, and besides, the data is enhanced by using brightness enhancement and random clipping.
Step 1.2: data augmentation by random image blending enhancement, as shown in FIG. 1, let U (ε) be [0,1 ]]Random probability distribution on each random sampling epsilon-U (epsilon) for two sets of training samples x 1 And x 2 Random combination is carried out according to probability distribution to obtain epsilon x 1 +(1-ε)x 2 The corresponding label is epsilon h 1 +(1-ε)h 2 This completes the random image blending enhancement.
(2) And extracting image features by using a multi-stream basic network, and calculating cross bilinear features and bottom bilinear features. The method comprises the following specific steps:
step 2.1: and extracting the characteristics of the image after data augmentation by using a multi-stream network. The amplified pictures are fed into a K-path convolutional neural network, wherein the K-path convolutional neural networks Stream 1, stream 2 and Stream 3 respectively adopt a ResNet-34 network, a ResNet-50 network and a VGG-16 network, and the K-path convolutional neural networks are used as extraction networks of basic characteristics. As shown in fig. 2, the features of the fine-grained image are thus obtained. Where K takes a value of 3.
Step 2.2: the cross bilinear features of the multi-stream network are calculated. The bilinear features of Stream 1 and Stream 2, the bilinear features of Stream 1 and Stream 3, and the bilinear features of Stream 2 and Stream 3 are extracted respectively, so that the cross bilinear features of the K-way convolutional neural network are obtained. The calculating method of the bilinear features comprises the following steps: the input is two paths of convolutional neural network characteristic diagrams, namely A and B, and the A is transposed and then multiplied by the B. And carrying out normalization operation on the result and L2 regularization.
Figure BDA0002075010750000041
Step 2.3: and calculating bilinear features of the bottom layer. The bottom bilinear feature is obtained by performing second-order bilinear pooling by itself and itself, wherein the bottom layer is selected from the ResNet-5a layer of Stream 1 (the first layer of the fifth bottleneck block), the ResNet-5a layer of Stream 2 (the first layer of the fifth bottleneck block) and the Conv5_1 layer of Stream 3 (the first layer of the fifth convolution block). These underlying bilinear features are fused with the higher level cross bilinear features.
(3) And predicting the fine granularity category by using the fused characteristics. The method comprises the following specific steps:
step 3.1: the cross bilinear features and the bottom bilinear features are fused, and two feature fusion modes, namely a splicing mode and an element addition mode are adopted here. And finally, sending the fused features to a full connection layer for classification, and calculating a softmax vector to obtain a predicted result. The overall algorithm flow diagram is shown in algorithm 2.
Figure BDA0002075010750000051
The loss function of the present invention is a cross entropy loss function to guide the training and learning process.
Figure BDA0002075010750000052
The experimental platform of the established model is as follows: the centos 7 system configures the E5 processor, a NVIDIA Tesla P100 graphics card. The training process of the invention adopts a joint cross entropy loss function and a sorting consistency loss function for training, the optimizer adopts a random gradient descent optimizer SGD, the initial learning rate is set to lr=0.01, the batch_size=16, 100 epochs are iterated to obtain a trained model, and the training model is tested on a data set CUB200-2011 proposed by the california institute of technology. The hyper-parameters of the model training of the invention are not limited to the following parameters
Figure BDA0002075010750000053
The test curve of the invention on the data set is shown in fig. 3, and the test result on the data set is shown in the table below in the specification.
Figure BDA0002075010750000061
FIG. 4 shows the prediction results of a part of the test sample of the CUB-200-2011 data set, and it can be seen that the fine granularity category of the image is predicted better in the invention.
The above description is only of the preferred embodiment of the present invention, and is not intended to limit the present invention in any other way, but is intended to cover any modifications or equivalent variations according to the technical spirit of the present invention, which fall within the scope of the present invention as defined by the appended claims.

Claims (1)

1. The image fine-granularity recognition method based on the multi-stream multi-scale cross bilinear features is characterized by extracting fine-granularity image features by using a multi-stream network, calculating the cross bilinear features, and predicting fine-granularity categories by utilizing the fused cross features, and comprises the following steps:
(1) Data augmentation is performed on the input image;
the image is amplified in the step (1), and the specific steps are as follows:
step 2.1: the data is enhanced by using offline rotation and online rotation, wherein the offline rotation is to rotate a data set every 10 degrees at [0,359], the online rotation is to randomly rotate a picture input into a network at a certain angle, and besides, the brightness enhancement is also used, and the data enhancement is performed by using a random clipping mode;
step 2.2: data augmentation by random image blending enhancement, let U (ε) be [0,1 ]]Random probability distribution on each random sampling epsilon-U (epsilon) for two sets of training samples x 1 And x 2 Random combination is carried out according to probability distribution to obtain epsilon x 1 +(1-ε)x 2 The corresponding label is epsilon h 1 +(1-ε)h 2 This completes the random image blending enhancement;
(2) Extracting image features by using a multi-stream basic network, and calculating cross bilinear features and bottom bilinear features;
extracting image features by using a multi-stream basic network in the step (2) and calculating crossed bilinear features:
step 3.1: extracting the characteristics of the image after data augmentation by utilizing a multi-Stream network, feeding the image after the augmentation into a K-path convolutional neural network, wherein the K-path convolutional neural network Stream 1, the Stream 2 and the Stream 3 respectively adopt a ResNet-34 network, a ResNet-50 network and a VGG-16 network, and the characteristics of the fine-grained image are obtained by utilizing the characteristics as extraction networks of basic characteristics;
step 3.2: the cross bilinear characteristic of the multi-Stream network is calculated, and the cross bilinear characteristic of the Stream 1 and the Stream 2, the bilinear characteristic of the Stream 1 and the Stream 3 and the bilinear characteristic of the Stream 2 and the Stream 3 are respectively extracted, so that the cross bilinear characteristic of the K-path convolutional neural network is obtained, and the calculation method of the bilinear characteristic is as follows: inputting two paths of convolutional neural network feature graphs, namely A and B, respectively, transposing the A, multiplying the A by the B, normalizing the result, and regularizing the L2;
step 3.3: calculating bilinear features of a bottom layer, wherein the bottom layer bilinear features are obtained by performing second-order bilinear pooling by using the bottom layer and the bottom layer, wherein the bottom layer is selected from a ResNet-5a layer of Stream 1, namely a first layer of a fifth bottleneck block, a ResNet-5a layer of Stream 2, namely a first layer of a fifth bottleneck block, and a Conv5_1 layer of Stream 3, namely a first layer of a fifth convolution block, and the bilinear features of the bottom layer are fused with cross bilinear features of a high layer;
(3) Predicting the fine granularity category by utilizing the fused characteristics;
in the step (3), the fused characteristics are used for predicting the fine granularity category:
step 4.1: fusing the cross bilinear features and the bottom bilinear features, wherein two feature fusion modes, namely a splicing mode and an element addition mode, are adopted, and finally, the fused features are sent to a full-connection layer for classification, and a softmax vector is calculated to obtain a predicted result;
wherein the loss function is a cross entropy loss function to guide the training and learning process;
Figure QLYQS_1
wherein y is i A true category label is indicated and,
Figure QLYQS_2
class label information representing network predictions, C being the total number of classes on the training dataset;
so far, the image fine granularity identification method based on the multi-stream multi-scale cross bilinear features is completed.
CN201910450570.3A 2019-05-28 2019-05-28 Image fine granularity identification method based on multi-stream multi-scale cross bilinear features Active CN110188816B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910450570.3A CN110188816B (en) 2019-05-28 2019-05-28 Image fine granularity identification method based on multi-stream multi-scale cross bilinear features

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910450570.3A CN110188816B (en) 2019-05-28 2019-05-28 Image fine granularity identification method based on multi-stream multi-scale cross bilinear features

Publications (2)

Publication Number Publication Date
CN110188816A CN110188816A (en) 2019-08-30
CN110188816B true CN110188816B (en) 2023-05-02

Family

ID=67718218

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910450570.3A Active CN110188816B (en) 2019-05-28 2019-05-28 Image fine granularity identification method based on multi-stream multi-scale cross bilinear features

Country Status (1)

Country Link
CN (1) CN110188816B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110519485B (en) * 2019-09-09 2021-08-31 Oppo广东移动通信有限公司 Image processing method, image processing device, storage medium and electronic equipment
CN111401122B (en) * 2019-12-27 2023-09-26 航天信息股份有限公司 Knowledge classification-based complex target asymptotic identification method and device
CN111325221B (en) * 2020-02-25 2023-06-23 青岛海洋科技中心 Image feature extraction method based on image depth information
CN111091585B (en) * 2020-03-19 2020-07-17 腾讯科技(深圳)有限公司 Target tracking method, device and storage medium
CN111476144B (en) * 2020-04-02 2023-06-09 深圳力维智联技术有限公司 Pedestrian attribute identification model determining method and device and computer readable storage medium
CN112418358A (en) * 2021-01-14 2021-02-26 苏州博宇鑫交通科技有限公司 Vehicle multi-attribute classification method for strengthening deep fusion network

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108875674B (en) * 2018-06-29 2021-11-16 东南大学 Driver behavior identification method based on multi-column fusion convolutional neural network
CN109685115B (en) * 2018-11-30 2022-10-14 西北大学 Fine-grained conceptual model with bilinear feature fusion and learning method

Also Published As

Publication number Publication date
CN110188816A (en) 2019-08-30

Similar Documents

Publication Publication Date Title
CN110188816B (en) Image fine granularity identification method based on multi-stream multi-scale cross bilinear features
Guo et al. Scene-driven multitask parallel attention network for building extraction in high-resolution remote sensing images
CN114202672A (en) Small target detection method based on attention mechanism
CN110633708A (en) Deep network significance detection method based on global model and local optimization
Chen et al. Corse-to-fine road extraction based on local Dirichlet mixture models and multiscale-high-order deep learning
Kamran et al. Efficient yet deep convolutional neural networks for semantic segmentation
Liang et al. Environmental microorganism classification using optimized deep learning model
CN115294563A (en) 3D point cloud analysis method and device based on Transformer and capable of enhancing local semantic learning ability
Zhang et al. ECSNet: An accelerated real-time image segmentation CNN architecture for pavement crack detection
Makwana et al. PCBSegClassNet—A light-weight network for segmentation and classification of PCB component
Li et al. Deep saliency detection via channel-wise hierarchical feature responses
Xiang et al. Crowd density estimation method using deep learning for passenger flow detection system in exhibition center
Yin et al. Online hard region mining for semantic segmentation
Wang et al. LiteCortexNet: toward efficient object detection at night
Peng et al. Progressive Erasing Network with consistency loss for fine-grained visual classification
CN113436115A (en) Image shadow detection method based on depth unsupervised learning
Mahmud et al. Semantic Image Segmentation using CNN (Convolutional Neural Network) based Technique
Li et al. Structure-Aware Multi-Hop Graph Convolution for Graph Neural Networks
Fu et al. Density-Aware U-Net for Unstructured Environment Dust Segmentation
Zhou et al. Terrain Classification Algorithm for Lunar Rover Using a Deep Ensemble Network with High-Resolution Features and Interdependencies between Channels
Borgersen et al. MOG: a background extraction approach for data augmentation of time-series images in deep learning segmentation
Abishek et al. Soil Texture Prediction Using Machine Learning Approach for Sustainable Soil Health Management
Shi et al. Compact global association based adaptive routing framework for personnel behavior understanding
Agarwal et al. Convolutional Neural Network for Traffic Sign Classification
Caldeira et al. Comparison Study on Convolution Neural Networks (CNNs) vs. Human Visual System (HVS)

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant