CN112861970A - Fine-grained image classification method based on feature fusion - Google Patents

Fine-grained image classification method based on feature fusion Download PDF

Info

Publication number
CN112861970A
CN112861970A CN202110179265.2A CN202110179265A CN112861970A CN 112861970 A CN112861970 A CN 112861970A CN 202110179265 A CN202110179265 A CN 202110179265A CN 112861970 A CN112861970 A CN 112861970A
Authority
CN
China
Prior art keywords
image
feature map
feature
network
resnet
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110179265.2A
Other languages
Chinese (zh)
Other versions
CN112861970B (en
Inventor
初妍
王丽娜
莫世奇
李思纯
李松
时洁
胡博
苗晓晨
赵佳昕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Engineering University
Original Assignee
Harbin Engineering University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Engineering University filed Critical Harbin Engineering University
Priority to CN202110179265.2A priority Critical patent/CN112861970B/en
Publication of CN112861970A publication Critical patent/CN112861970A/en
Application granted granted Critical
Publication of CN112861970B publication Critical patent/CN112861970B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the technical field of image recognition in computer vision, and particularly relates to a fine-grained image classification method based on feature fusion. The invention realizes the extraction of local detail characteristics of the fine-grained images on the classification task, accurately positions the fine-grained images in the concerned target area, solves the difficulty of small intra-class difference of the fine-grained images on the classification task, utilizes the improved non-maximum value to inhibit the soft-NMS optimization area to suggest the RPN to acquire the target object, and avoids the interference of background information. According to the invention, the bilinear convolutional neural network B-CNNs are improved through the attention module SCA and used for a fine-grained classification task so as to obtain attention characteristics with different dimensions. Compared with the existing classification method, the method is positioned in the key part of the distinction, and has higher accuracy.

Description

Fine-grained image classification method based on feature fusion
Technical Field
The invention belongs to the technical field of image recognition in computer vision, and particularly relates to a fine-grained image classification method based on feature fusion.
Background
The traditional classification task, multi-finger gross classification, is for example cat and dog. Due to their many distinctive features, it is relatively easier than fine-grained image classification. Fine-grained image classification is a subtask of image classification, mainly identifying hundreds of sub-categories under the same basic category, such as hundreds of sub-categories of birds, cars, pets, flowers, and airplanes. Different from a general classification task, fine-grained image classification has the characteristic of small intra-class difference, and the fine and local difference is the key of fine-grained image classification.
Due to the slight intra-class differences, different sub-classes can often be distinguished only by slight local differences. The fine-grained classification method mainly comprises two methods: one is a classification model based on strong supervision, which needs to use additional information such as manually labeled object labeling boxes and part labeling points in addition to the class labels of the images in order to obtain better classification accuracy. For example, the Part R-CNN algorithm adopts a recursive convolutional neural network to detect objects and local regions in an image. The practicability of the algorithm is limited to a great extent because the acquisition cost of the label information is very expensive. The other is a classification model based on weak supervision, which only relies on class labels to complete good classification without using additional part labeling information. Like the Two-level attention (Two-level attention) algorithm, the fine-grained image classification is completed only by using the class label without depending on additional labeling information. Although the extracted features have certain expression capability, how to effectively extract the features of the discriminant parts of the key attention area categories on the premise of only having category labels is challenging.
Disclosure of Invention
The invention aims to realize the extraction of local detail features of fine-grained images on a classification task and the accurate positioning in a concerned target area, and provides a fine-grained image classification method based on feature fusion.
The purpose of the invention is realized by the following technical scheme: the method comprises the following steps:
step 1: acquiring an image data set to be classified, taking partial image data to construct a training set, and forming a test set by the rest data; labeling the images in the training set to obtain class labels corresponding to the images;
step 2: extracting a feature map of each image in the training set by using a VGG-19 convolutional neural network, and obtaining a feature vector of each image in the training set through sliding window operation on the final conv5-3 feature map;
and step 3: inputting the feature vector of each image in the training set into a regression layer and a classification layer to obtain a regional candidate detection frame set of each image in the training set; calculating a confidence score f for each detection frame in the set of region candidate detection framesiSelecting a detection frame with the highest confidence coefficient to cut the image to obtain a cut image training set;
and 4, step 4: inputting the cut image training set into an SC-B-CNNs model for training;
the SC-B-CNNs model comprises a first ResNet-50 network, a second ResNet-50 network and a softmax classifier; the first ResNet-50 network is a ResNet-50 network which is pre-trained on ImageNet and removes a last full connection layer, and an attention module SCA is added between conv2 and conv3 volume blocks of the ResNet-50 network; the second ResNet-50 network does not perform pre-training and adds attention modules SCA between its conv4 and conv5 volume blocks;
step 4.1: respectively inputting the cut image training set into a first ResNet-50 network and a second ResNet-50 network, wherein the first ResNet-50 network outputs a first weighted feature map f of each imageAThe second ResNet-50 network outputs a second weighted feature map f for each imageB
Step 4.2: the first weight characteristic graph f of each image in the cut image training setAAnd a second weighted feature map fBObtaining a bilinear feature vector of each image in the cut image training set through bilinear pooling operation;
step 4.3: inputting the bilinear feature vector of each image in the cut image training set into a softmax classifier to obtain the category of the image;
and 5: and inputting the test set into the trained SC-B-CNNs model to obtain a classification result of the image data set to be classified.
The present invention may further comprise:
the attention module SCA is used for extracting a feature map F with weight distribution of an input feature map GscThe method comprises the following specific steps:
step 4.1.1: generating a feature map F by 1 multiplied by 1 convolution for the feature map G input to the attention module SCA;
step 4.1.2: feature graph F is dimensionality reduced using global mean pooling by having a parameter WfcThe full-connection layer assigns weight to the full-connection layer, then compresses the w multiplied by h multiplied by 1 characteristic diagram into a channel according to the channel direction through convolution operation, and generates a space attention diagram A by adopting a sigmoid activation functions
Figure BDA0002941705150000021
Wherein G ∈ Rw×h×cW is the length of the feature map G, h is the width of the feature map G, and w × h represents the two-dimensional space size of the feature map G; c represents the number of channels; f. of7×7Represents the size of the convolution kernel; σ () represents a sigmoid activation function;
step 4.1.3: element-by-element dot multiplication method for spatial attention diagram AsPerforming feature fusion with the feature map F to obtain a spatial attention feature Fs
Figure BDA0002941705150000022
Step 4.1.4: feature spatial attention FsCompressing according to the spatial dimension w multiplied by h to generate a global compressed feature vector z of the current feature mapc
Figure BDA0002941705150000031
Wherein f issq() Representing a compression operation; u. ofcRepresenting the c channel characteristic diagram;
step 4.1.5: obtaining the weight value of each channel in the feature map through two full-connection layers, and obtaining a feature map F with weight distribution by using sigmoid activationsc
Figure BDA0002941705150000032
A=σ(Ws2×tanh(Ws1×zc))
Wherein σ () represents a sigmoid activation function, and tanh () represents a tanh activation function; a is a feature vector of weight distribution; ws1Is the weight of the first fully connected layer; ws2Is the weight of the second fully connected layer; u. ofcRepresenting the c channel characteristic diagram;
Figure BDA0002941705150000033
representing element-by-element dot multiplication.
The invention has the beneficial effects that:
the invention realizes the extraction of local detail characteristics of the fine-grained images on the classification task, accurately positions the fine-grained images in the concerned target area, solves the difficulty of small intra-class difference of the fine-grained images on the classification task, utilizes the improved non-maximum value to inhibit the soft-NMS optimization area to suggest the RPN to acquire the target object, and avoids the interference of background information. According to the invention, the bilinear convolutional neural network B-CNNs are improved through the attention module SCA and used for a fine-grained classification task so as to obtain attention characteristics with different dimensions. Compared with the existing classification method, the method is positioned in the key part of the distinction, and has higher accuracy.
Drawings
Fig. 1 is a frame diagram of the fine-grained image classification method based on feature fusion according to the present invention.
Fig. 2 is a specific flowchart of the RPN network according to the present invention.
FIG. 3 is a schematic diagram of the framework of the B-CNNs based on SCA in the invention.
FIG. 4 is a schematic diagram of the attention module SCA of the present invention.
Fig. 5 is a specific algorithm code diagram of the SCA-based bilinear CNNs in the present invention.
FIG. 6 is a table of the results of comparative experiments performed on three datasets CUB-200, Stanford cars and Oxford flowers.
FIG. 7 is a table of the results of comparative experiments performed on the CUB-200 dataset.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
The invention aims to extract local detail features of fine-grained images on a classification task and accurately position the fine-grained images in a concerned target area, and provides a weak supervision fine-grained image classification method based on feature fusion. An Attention module SCA (Spatial-Channel Attention) is designed to improve bilinear convolutional neural networks (B-CNNs) for a fine-grained classification task so as to acquire Attention features of different dimensions. Compared with the existing classification method, the method is positioned in the key part of the distinction, and has higher accuracy.
Step 1, inputting images in a data set and corresponding class labels, and extracting a feature map of each image by using a VGG-19 convolutional neural network;
step 2, obtaining a 256-dimensional feature vector through 3 × 3 sliding window operation on the final conv5-3 feature map;
step 3, inputting 256-dimensional feature vectors into two full-connection layers, namely a boundary regression layer and a classification layer, to obtain a regional candidate frame set;
step 4, selecting a detection frame with the highest confidence level in the frames to be detected by using an improved soft-NMS algorithm;
step 5, cutting and dividing the detected target area with the highest confidence coefficient;
step 6, inputting the cut image;
step 7, extracting convolution characteristics from the input image by using two ResNet-50 networks with the last full connection layer removed respectively;
step 8, the first network uses ResNet-50 pre-trained on ImageNet and adds a designed attention module SCA between the conv2 and conv3 volume blocks to obtain a weighted feature map;
step 9 the second sub-network uses ResNet-50 without pre-training and adds the designed attention module SCA between the conv4 and conv5 volume blocks to get the weighted feature map;
step 10, obtaining bilinear feature vectors by bilinear pooling operation on the weighted feature maps in the steps 8 and 9;
step 11, inputting the bilinear feature vectors into a softmax classifier to obtain the category of the image;
step 12 inputs the test data set and calculates the accuracy of the model classification.
The invention extracts the image characteristics through the RPN network and completes the selection of the candidate frame. And taking the picture as an input, extracting rough features of the detected image by using VGG-19, and outputting an RPN (recursive stereo network) which is a region of interest obtained by convolving the feature map. To prevent overfitting, the RPN network is optimized using a modified soft-NMS, selecting the region where the higher confidence target is located. And optimizing the preset region, selecting anchors with 3 scales and 3 aspect ratios, namely generating 9 types of anchors, outputting 18 confidence values at each sliding window position classification layer, and outputting position information of 36 target interested regions by the regression layer to obtain more accurate candidate regions. Carrying out parameterized calculation on the target according to the boundary coordinates, wherein the formula is as follows:
tx=(x-xa)/wa,ty=(y-ya)/ha
tw=log(w/wa),th=log(h/ha)
Figure BDA0002941705150000041
Figure BDA0002941705150000042
wherein, x, y, w, h respectively represent the horizontal and vertical coordinates and the length and width of the center of the prediction matrix frame. t is tiRepresenting parameterization of object boundary coordinates.
Figure BDA0002941705150000051
Indicating annotation information associated with the positive anchor point. x is the number ofa,ya,wa,haRespectively representing the horizontal and vertical coordinates and the length and width, x, of the anchor point frame*,y*,w*,h*Respectively representing the abscissa and ordinate of the true position of the label and the length and width.
Sorting all the detected detection boxes according to the scores of the detection boxes (when the scores are scored by using a classifier, a probability value is obtained, and the probability value represents the probability that the current detection box is the target to be detected), selecting the detection box A with the highest score, setting a threshold b, calculating the lou (interaction over Unit) between the detection box A and the maximum detection box A in the rest detection boxes, and if loU is larger than the threshold b, obtaining the detection box with the high overlapping rate. Deleting the detection boxes; there may be no overlap with the current frame or their overlap area is very small (loU is less than threshold b), then the unprocessed frames are reordered, and after the ordering is completed, a frame with the largest score is also selected, then loU values of other frames and the largest frame are calculated, then the frames with loU larger than a certain threshold are deleted again, and the process is iterated continuously until all frames are processed, and the final detection result is output.
The RPN extracted candidate frames will be highly overlapping. To reduce redundancy, the improved soft-NMS is used for optimization based on the classification score of the detection box. And when the score of the detection box is larger than the threshold value t, putting the detection box into a final detection result set. When the areas are overlapped, the score of the detected frame is multiplied by a decay function, so that the error probability is effectively reduced, and the detection accuracy is improved. The specific calculation formula is as follows:
Figure BDA0002941705150000052
wherein: f. ofiThe score corresponding to the ith detection box is shown, and t is a threshold value.
The SC-B-CNNs network architecture provided by the invention can be formed by a quaternion function B ═ fA,fBP, C) represents that bilinear features are obtained by bilinear combination through outer product operation, and the calculation formula is as follows:
b=fA T·fB
wherein f isAAnd fBThe feature function containing the added attention block SCA, P is the pooling function and C is the classification function.
The feature outputs for each location are combined using bilinear pooling. The bilinear pooling operation of the input image l at position I is defined as:
bilinear(l,I,fA,fB)=fA(l,I)TfB(l,I)
wherein f isAAnd fBAre the output of two feature extraction functions for B-CNNs.
Firstly, a feature graph extracted by a feature function is used as an original input G, G belongs to Rw×h×cWhere w × h denotes a two-dimensional space size of G, and c denotes the number of channels. Feature map F is generated by a 1 × 1 convolution, and F is dimensionality reduced using Global Average Pooling (Global Average Pooling), by having a parameter WfcThe full-connection layer assigns weight to the full-connection layer, then compresses the w multiplied by h multiplied by 1 characteristic diagram into a channel according to the channel direction through convolution operation, and generates a space attention diagram A by adopting a sigmoid activation functions,As∈Rw×h×1. The process of spatial attention extraction is represented as the formula:
Figure BDA0002941705150000061
wherein: f. of7×7Representing the size of the convolution kernel, σ () representing the sigmoid activation function, WfcIs represented by having a parameter WfcThe full interconnect layer of (1).
Then, the spatial attention map A is multiplied by element pointssPerforming feature fusion with the original input F to obtain a spatial attention feature Fs
Figure BDA0002941705150000062
And compressing the global space information into the channel description characteristic information. Generating a global compressed feature vector z of the current feature map by compressing the feature map Fs in a spatial dimension w × hcThe specific calculation formula is as follows:
Figure BDA0002941705150000063
wherein f issq() Denotes a compression operation, ucShowing the c-th channel profile.
Then, an activation operation is carried out, and by learning the weight parameters, the nonlinear correlation between the channels is found. And obtaining the weight value of each channel in the feature map through the two fully-connected layers, and taking the weighted feature map as the input of the next layer of network. The weight assignment calculation formula of the channel is as follows:
Ac=feq(z,W)=σ(Ws2×tanh(Ws1×zc))
wherein f iseq() Represents a compression operation, z represents a global compressed feature vector, σ () represents a sigmoid activation function, and tanh () represents a tanh activation function.
After the weight distribution vector of the feature map is obtained through the operation, simple gate control is selected and used, sigmoid activation is used, and the feature map F with the weight distribution is obtainedscThe calculation process is as follows:
Figure BDA0002941705150000064
wherein A iscIs a feature vector of the weight distribution, ucShowing the characteristic diagram of the c-th channel,
Figure BDA0002941705150000065
representing element-by-element dot multiplication.
The function of using two fully-connected layers is to ensure the consistency of input and output. The first full-link layer firstly reduces the dimension of the channel to 1/16, and after the channel passes through the tanh activation function, the channel returns to the original input dimension through one full-link layer.
The specific algorithm of the SCA-based bilinear CNNs is shown in FIG. 5. To demonstrate the effectiveness of the proposed method, comparative experiments were performed on three datasets, CUB-200, Stanford cars and Oxford flowers, respectively, and the results of the experiments are shown in FIG. 6. To further verify the validity and accuracy of the improved RPN network and SCA, comparative experiments were performed on the CUB-200 dataset, with the results shown in fig. 7.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (2)

1. A fine-grained image classification method based on feature fusion is characterized by comprising the following steps:
step 1: acquiring an image data set to be classified, taking partial image data to construct a training set, and forming a test set by the rest data; labeling the images in the training set to obtain class labels corresponding to the images;
step 2: extracting a feature map of each image in the training set by using a VGG-19 convolutional neural network, and obtaining a feature vector of each image in the training set through sliding window operation on the final conv5-3 feature map;
and step 3: inputting the feature vector of each image in the training set into a regression layer and a classification layer to obtain a regional candidate detection frame set of each image in the training set; calculating a confidence score f for each detection frame in the set of region candidate detection framesiSelecting a detection frame with the highest confidence coefficient to cut the image to obtain a cut image training set;
and 4, step 4: inputting the cut image training set into an SC-B-CNNs model for training;
the SC-B-CNNs model comprises a first ResNet-50 network, a second ResNet-50 network and a softmax classifier; the first ResNet-50 network is a ResNet-50 network which is pre-trained on ImageNet and removes a last full connection layer, and an attention module SCA is added between conv2 and conv3 volume blocks of the ResNet-50 network; the second ResNet-50 network does not perform pre-training and adds attention modules SCA between its conv4 and conv5 volume blocks;
step 4.1: respectively inputting the cut image training set into a first ResNet-50 network and a second ResNet-50 network, wherein the first ResNet-50 network outputs a first weighted feature map f of each imageAThe second ResNet-50 network outputs a second weighted feature map f for each imageB
Step 4.2: the first weight characteristic graph f of each image in the cut image training setAAnd a second weighted feature map fBObtaining a bilinear feature vector of each image in the cut image training set through bilinear pooling operation;
step 4.3: inputting the bilinear feature vector of each image in the cut image training set into a softmax classifier to obtain the category of the image;
and 5: and inputting the test set into the trained SC-B-CNNs model to obtain a classification result of the image data set to be classified.
2. The fine-grained image classification method based on feature fusion according to claim 1, characterized in that: the attention module SCA is used for extracting a feature map F with weight distribution of an input feature map GscThe method comprises the following specific steps:
step 4.1.1: generating a feature map F by 1 multiplied by 1 convolution for the feature map G input to the attention module SCA;
step 4.1.2: feature graph F is dimensionality reduced using global mean pooling by having a parameter WfcThe full-connection layer assigns weight to the full-connection layer, then compresses the w multiplied by h multiplied by 1 characteristic diagram into a channel according to the channel direction through convolution operation, and generates a space attention diagram A by adopting a sigmoid activation functions
Figure FDA0002941705140000011
Wherein G ∈ Rw×h×cW is the length of the feature map G, h is the width of the feature map G, and w × h represents the two-dimensional space size of the feature map G; c represents the number of channels; f. of7×7Represents the size of the convolution kernel; σ () represents a sigmoid activation function;
step 4.1.3: element-by-element dot multiplication method for spatial attention diagram AsPerforming feature fusion with the feature map F to obtain a spatial attention feature Fs
Figure FDA0002941705140000021
Step 4.1.4: feature spatial attention FsCompressing according to the spatial dimension w multiplied by h to generate a global compressed feature vector z of the current feature mapc
Figure FDA0002941705140000022
Wherein f issq() Representing a compression operation; u. ofcRepresenting the c channel characteristic diagram;
step 4.1.5: obtaining the weight value of each channel in the feature map through two full-connection layers, and obtaining a feature map F with weight distribution by using sigmoid activationsc
Figure FDA0002941705140000023
A=σ(Ws2×tanh(Ws1×zc))
Wherein σ () represents a sigmoid activation function, and tanh () represents a tanh activation function; a is a feature vector of weight distribution; ws1Is the weight of the first fully connected layer; ws2Is the weight of the second fully connected layer; u. ofcRepresenting the c channel characteristic diagram;
Figure FDA0002941705140000024
representing element-by-element dot multiplication.
CN202110179265.2A 2021-02-09 2021-02-09 Fine-grained image classification method based on feature fusion Active CN112861970B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110179265.2A CN112861970B (en) 2021-02-09 2021-02-09 Fine-grained image classification method based on feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110179265.2A CN112861970B (en) 2021-02-09 2021-02-09 Fine-grained image classification method based on feature fusion

Publications (2)

Publication Number Publication Date
CN112861970A true CN112861970A (en) 2021-05-28
CN112861970B CN112861970B (en) 2023-01-03

Family

ID=75989506

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110179265.2A Active CN112861970B (en) 2021-02-09 2021-02-09 Fine-grained image classification method based on feature fusion

Country Status (1)

Country Link
CN (1) CN112861970B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113393371A (en) * 2021-06-28 2021-09-14 北京百度网讯科技有限公司 Image processing method and device and electronic equipment
CN113744292A (en) * 2021-09-16 2021-12-03 安徽世绿环保科技有限公司 Garbage classification station garbage throwing scanning system
CN113869347A (en) * 2021-07-20 2021-12-31 西安理工大学 Fine-grained classification method for severe weather image
CN114067316A (en) * 2021-11-23 2022-02-18 燕山大学 Rapid identification method based on fine-grained image classification

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140050391A1 (en) * 2012-08-17 2014-02-20 Nec Laboratories America, Inc. Image segmentation for large-scale fine-grained recognition
CN108898137A (en) * 2018-05-25 2018-11-27 黄凯 A kind of natural image character identifying method and system based on deep neural network
CN109086792A (en) * 2018-06-26 2018-12-25 上海理工大学 Based on the fine granularity image classification method for detecting and identifying the network architecture
CN110826558A (en) * 2019-10-28 2020-02-21 桂林电子科技大学 Image classification method, computer device, and storage medium
CN110866907A (en) * 2019-11-12 2020-03-06 中原工学院 Full convolution network fabric defect detection method based on attention mechanism
CN111210907A (en) * 2020-01-14 2020-05-29 西北工业大学 Pain intensity estimation method based on space-time attention mechanism
CN111709265A (en) * 2019-12-11 2020-09-25 深学科技(杭州)有限公司 Camera monitoring state classification method based on attention mechanism residual error network
WO2020252924A1 (en) * 2019-06-19 2020-12-24 平安科技(深圳)有限公司 Method and apparatus for detecting pedestrian in video, and server and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140050391A1 (en) * 2012-08-17 2014-02-20 Nec Laboratories America, Inc. Image segmentation for large-scale fine-grained recognition
CN108898137A (en) * 2018-05-25 2018-11-27 黄凯 A kind of natural image character identifying method and system based on deep neural network
CN109086792A (en) * 2018-06-26 2018-12-25 上海理工大学 Based on the fine granularity image classification method for detecting and identifying the network architecture
WO2020252924A1 (en) * 2019-06-19 2020-12-24 平安科技(深圳)有限公司 Method and apparatus for detecting pedestrian in video, and server and storage medium
CN110826558A (en) * 2019-10-28 2020-02-21 桂林电子科技大学 Image classification method, computer device, and storage medium
CN110866907A (en) * 2019-11-12 2020-03-06 中原工学院 Full convolution network fabric defect detection method based on attention mechanism
CN111709265A (en) * 2019-12-11 2020-09-25 深学科技(杭州)有限公司 Camera monitoring state classification method based on attention mechanism residual error network
CN111210907A (en) * 2020-01-14 2020-05-29 西北工业大学 Pain intensity estimation method based on space-time attention mechanism

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
TRINH LE BA KHANH 等: "Enhancing U-Net with Spatial-Channel Attention Gate for Abnormal Tissue Segmentation in Medical Imaging", 《APPLIED SCIENCES》 *
李旭: "基于注意力机制的细粒度图像分类方法研究", 《中国优秀硕士学位论文全文库 信息科技辑》 *
杨贞: "《图像特征处理技术及应用》", 31 August 2020 *
王亚南: "基于RPN网络的行人检测方法研究", 《中国优秀硕士学位论文全文库 信息科技辑》 *
赵浩如 等: "基于RPN与B-CNN的细粒度图像分类算法研究", 《计算机应用与软件》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113393371A (en) * 2021-06-28 2021-09-14 北京百度网讯科技有限公司 Image processing method and device and electronic equipment
CN113393371B (en) * 2021-06-28 2024-02-27 北京百度网讯科技有限公司 Image processing method and device and electronic equipment
CN113869347A (en) * 2021-07-20 2021-12-31 西安理工大学 Fine-grained classification method for severe weather image
CN113744292A (en) * 2021-09-16 2021-12-03 安徽世绿环保科技有限公司 Garbage classification station garbage throwing scanning system
CN114067316A (en) * 2021-11-23 2022-02-18 燕山大学 Rapid identification method based on fine-grained image classification
CN114067316B (en) * 2021-11-23 2024-05-03 燕山大学 Rapid identification method based on fine-granularity image classification

Also Published As

Publication number Publication date
CN112861970B (en) 2023-01-03

Similar Documents

Publication Publication Date Title
CN109584248B (en) Infrared target instance segmentation method based on feature fusion and dense connection network
CN110728200B (en) Real-time pedestrian detection method and system based on deep learning
CN112861970B (en) Fine-grained image classification method based on feature fusion
CN107563372B (en) License plate positioning method based on deep learning SSD frame
CN111461039B (en) Landmark identification method based on multi-scale feature fusion
CN109583483B (en) Target detection method and system based on convolutional neural network
CN114758383A (en) Expression recognition method based on attention modulation context spatial information
CN107784288A (en) A kind of iteration positioning formula method for detecting human face based on deep neural network
CN113269054B (en) Aerial video analysis method based on space-time 2D convolutional neural network
CN112861917B (en) Weak supervision target detection method based on image attribute learning
CN114758288A (en) Power distribution network engineering safety control detection method and device
CN111898432A (en) Pedestrian detection system and method based on improved YOLOv3 algorithm
CN109670555B (en) Instance-level pedestrian detection and pedestrian re-recognition system based on deep learning
CN111768415A (en) Image instance segmentation method without quantization pooling
CN112861785B (en) Instance segmentation and image restoration-based pedestrian re-identification method with shielding function
CN111652273A (en) Deep learning-based RGB-D image classification method
CN115497122A (en) Method, device and equipment for re-identifying blocked pedestrian and computer-storable medium
CN112396036A (en) Method for re-identifying blocked pedestrians by combining space transformation network and multi-scale feature extraction
CN112329771A (en) Building material sample identification method based on deep learning
CN116596966A (en) Segmentation and tracking method based on attention and feature fusion
CN114743126A (en) Lane line sign segmentation method based on graph attention machine mechanism network
CN114494786A (en) Fine-grained image classification method based on multilayer coordination convolutional neural network
CN116935249A (en) Small target detection method for three-dimensional feature enhancement under unmanned airport scene
CN116796248A (en) Forest health environment assessment system and method thereof
CN115170662A (en) Multi-target positioning method based on yolov3 and convolutional neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant