CN110619369B - Fine-grained image classification method based on feature pyramid and global average pooling - Google Patents

Fine-grained image classification method based on feature pyramid and global average pooling Download PDF

Info

Publication number
CN110619369B
CN110619369B CN201910899445.0A CN201910899445A CN110619369B CN 110619369 B CN110619369 B CN 110619369B CN 201910899445 A CN201910899445 A CN 201910899445A CN 110619369 B CN110619369 B CN 110619369B
Authority
CN
China
Prior art keywords
feature
local
average pooling
global average
fine
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910899445.0A
Other languages
Chinese (zh)
Other versions
CN110619369A (en
Inventor
龚声蓉
周少雄
王朝晖
应文豪
李菊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Yiyou Huiyun Software Co.,Ltd.
Original Assignee
Changshu Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changshu Institute of Technology filed Critical Changshu Institute of Technology
Priority to CN201910899445.0A priority Critical patent/CN110619369B/en
Publication of CN110619369A publication Critical patent/CN110619369A/en
Application granted granted Critical
Publication of CN110619369B publication Critical patent/CN110619369B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]

Abstract

The invention discloses a fine-grained image classification method based on a feature pyramid and global average pooling, which comprises the following steps of: step 1, inputting images into a convolution layer of a pre-trained convolution neural network to obtain a multi-channel characteristic diagram; step 2, the multi-channel feature map passes through a global average pooling layer to obtain a saliency map of the input image, and position information of a target is extracted; step 3, extracting the characteristics of the multi-channel characteristic diagram by the characteristic pyramid network and predicting to obtain K local areas with the maximum information quantity; and 4, aggregating the local features of the K local regions and global feature prediction obtained by the input image through the convolutional neural network to output the final identification category. The method reduces the influence of background noise, enhances the robustness of local area selection and improves the identification precision.

Description

Fine-grained image classification method based on feature pyramid and global average pooling
Technical Field
The invention relates to a fine-grained image classification method, in particular to a fine-grained image classification method based on a feature pyramid and global average pooling.
Background
Fine-grained image recognition is a concept in the field of image processing, and conventional image recognition can generally only recognize a large class to which an object in an image belongs, which is called coarse-grained image recognition. While there are usually many subcategories under the same general category, conventional image recognition methods cannot determine the specific subcategories to which the target belongs. The fine-grained image recognition can be used for classifying the targets in the image more finely, the classification granularity is finer, and the specific sub-category of the targets under the large category can be determined to meet the higher image recognition requirements under different scenes.
The early fine-grained classification method generally relies on manual experience to extract features manually, and generally comprises the steps of extracting local features such as SIFT features or HOG from an image, coding the features by using coding models such as VLAD or Fisher Vector to obtain required feature representation, and then classifying the features by using classifiers such as a shallow neural network or SVM. But the generalization of the model is poor.
The fine-grained image classification method based on deep learning can be divided into two categories, namely a strong supervision method and a weak supervision method, and the difference between the strong supervision method and the weak supervision method is whether manual labeling information such as bounding boxes or local region labeling is used or not. Such processes are generally divided into three steps: firstly, a foreground object and a plurality of local areas in an image are obtained by using methods such as labeling information or visual attention of the image, then, a deep convolution network is utilized to respectively extract convolution characteristics, and finally, the characteristics of all the local areas are integrated to classify a target. The classification method with strong supervision has poor practicability due to the fact that the acquisition cost of manual labeling information is high, and actual application requirements are difficult to meet.
Most of the existing fine-grained identification methods are based on work under a weak supervision condition, namely, manual marking information is not relied on, but it becomes difficult to accurately acquire objects and locate regional local areas in images under the weak supervision condition. In a real scene, a target is not necessarily located in the middle of the scene, and the surrounding environment may block the target, interfere with the target in color, or cause a large visual difference between images of the same category due to different shooting angles, a change in the posture of the target object, and the like. The following two problems exist in particular:
1. the selected local area is more background noise. The target in the image is generally in a complex environment, for example, in a bird recognition task, the target bird is generally positioned in the middle of a branch and is seriously shielded, or the appearance color of a leaf, a trunk and the like is similar to that of the target, so that strong interference is easily caused. Most of the existing methods directly input the whole image into a model and extract features, but visual experiments show that local regions obtained by the methods generally have more background noise, and the features extracted from the noise regions do not belong to target features, so that the results of the classification process are often influenced to a certain extent, and the fine-grained image recognition effect of the model is reduced. Some methods also extract a plurality of areas with higher distinctiveness from an original image by using an unsupervised mode, such as a selectivesearch method, and then send the areas into a network model for training and extracting features.
2. The robustness of the features is not sufficient. Fine-grained image recognition has specificity compared to ordinary image recognition, sub-categories of fine-grained recognition typically have smaller inter-category differences, and these differences typically exist in smaller local areas. However, the current method is not robust enough to the feature extracted from the target object of fine-grained image recognition. The traditional manual design features need to be designed based on expert experience, the manual design features are difficult to effectively express the distinguishing information in the image while having instability, the adaptability of the method is generally poor, and when the operation object of the method is switched from one field to another field, the effect is rapidly reduced, so that the practicability is greatly reduced. Most of the features designed by the existing deep learning-based method are not targeted enough for the tasks, and generally, the features are extracted by directly using deep neural networks such as VGGNet or ResNet, so that good effect can be obtained when global features of targets are extracted, but the capability in the aspect of representation of detailed information is not enough. The difference between images of the fine-grained image recognition task is in tiny details in many cases, so that the recognition effect is poor. And when the size of the target in the image changes greatly, the robust features cannot be extracted in a good adaptation mode, so that a good effect cannot be achieved.
Disclosure of Invention
In view of the above-mentioned defects in the prior art, the present invention provides a fine-grained image classification method based on a feature pyramid and global average pooling, which solves the noise problem of a target location area with less calculation overhead and improves the feature robustness of target object extraction.
The technical scheme of the invention is as follows: a fine-grained image classification method based on a feature pyramid and global average pooling comprises the following steps:
step 1, inputting images into a convolution layer of a pre-trained convolution neural network to obtain a multi-channel characteristic diagram;
step 2, the multi-channel feature map passes through a global average pooling layer to obtain a saliency map of the input image, and position information of a target is extracted;
step 3, extracting the characteristics of the multi-channel characteristic diagram by the characteristic pyramid network and predicting to obtain K local areas with the maximum information quantity;
and 4, aggregating the local features of the K local regions and global feature prediction obtained by the input image through the convolutional neural network to output the final identification category.
Further, the step 2 comprises the following steps: step 2.1, the global average pooling layer maps each feature map into a neuron, is connected with softmax for training, and predicts the category; step 2.2: and after training is finished, multiplying and accumulating the weight of the class with the highest probability corresponding to the neuron and the multi-channel characteristic graph respectively to obtain the saliency map.
Further, the step 3 comprises the following steps: step 3.1, inputting the feature graph into a feature pyramid network to generate feature graphs of N scales, wherein N is a natural number not less than 3; 3.2, performing upsampling on an upper layer characteristic diagram in the characteristic diagram obtained in the step 3.1, performing fusion on a lower layer characteristic diagram after convolution kernel to obtain fusion characteristic diagrams of N scales; and 3.3, selecting candidate areas with different sizes on the fusion characteristic diagram with N scales, filtering the bounding box generated in the second step, predicting the candidate areas, and sequencing the candidate areas according to the size of the activation value of the bounding box to obtain a local area, wherein the target bounding box is generated by taking the maximum connected area in the saliency map and setting a threshold value to obtain the specific position of the target.
Further, the K local regions with the largest information amount predicted in step 3 are optimized by adopting the ranking consistency loss, so that the local region classification prediction result and the activation value obtained by the feature pyramid network have the same ranking.
Further, the optimization by using the ordered consistent loss is optimized by using a hinge loss function, and the K local regions are set as R ═ { R ═ R1,R2,…,RKAnd ranking the K local areas from high to low according to the activation values, wherein the activation values obtained by predicting the K local areas through the characteristic pyramid network are S ═ S respectively1,S2,…,SKAnd the K local areas are predicted by a convolutional neural network to obtain a probability P ═ P1,P2,…,PKThe ordering penalty is defined as follows:
Figure BDA0002211329790000031
Siand SjIn order to activate the value of the key,
the hinge loss function f (x) is: (x) max {1-x, 0 }.
Compared with the prior art, the invention has the advantages that:
the invention reserves all convolution layers of the convolution neural network and replaces the last full connection layer with a full local average pooling layer (GAP), so that the network obtains excellent target positioning capability. And mapping each feature map of the last convolution layer into a neuron after the feature maps pass through the GAP, connecting the neurons with a softmax classification layer to obtain the output probability of each category, and adding the convolution layer feature maps according to the weights of the neurons of the corresponding categories to obtain the saliency maps corresponding to each category. After the saliency map is obtained, a saliency threshold is set to generate a bounding box of the target. Local region candidates for the target are then performed within this bounding box, which greatly reduces the interference of background noise on feature extraction and model classification. And the proposed method shares the convolution layer with the original feature extraction network, only one GAP layer is added, and only little calculation expense is added.
And extracting the features by adopting a feature pyramid network. The principle of the constructed feature pyramid is that features of high-level low-resolution and high-semantic information and features of low-level high-resolution and low-semantic information are fused to obtain a feature map with high semantics and high resolution, and prediction is performed on the feature maps with multiple scales obtained after fusion, so that the model greatly enhances the processing capacity of small targets in the image under the condition of not increasing the calculated amount basically, and the precision of a fine-grained image recognition result is further improved.
Drawings
FIG. 1 is a schematic flow chart of the overall framework of the method of the present invention.
FIG. 2 is a flow diagram illustrating the process of obtaining a target saliency map using global mean pooling.
Fig. 3 is a schematic diagram of a feature pyramid structure.
FIG. 4 is a diagram illustrating the target location result on the CUB-200 and 2011 data set according to the present invention.
Detailed Description
The present invention is further illustrated by the following examples, which are not to be construed as limiting the invention thereto.
The general framework of the fine-grained image classification method based on the feature pyramid and the global average pooling according to the present embodiment is shown in fig. 1. The method comprises the following specific steps:
step 1, inputting images into a convolution layer of a pre-trained convolution neural network to obtain a multi-channel characteristic diagram;
step 2, obtaining a saliency map of the input image by the multi-channel feature map through a global average pooling layer, and extracting position information of a target;
step 3, extracting the characteristics of the multi-channel characteristic diagram by the characteristic pyramid network and predicting to obtain K local areas with the maximum information quantity;
and 4, step 4: and aggregating the local features of the K local regions and global feature prediction obtained by the input image through the convolutional neural network to output a final identification category.
And secondly, replacing a full connection layer of the base network ResNet-50 by a global average pooling layer, reserving all convolution layers, preliminarily predicting the class of the image according to the class of ImageNet-1k, and obtaining a saliency map by a class activation mapping method. The saliency map displays the position of the target in the image in the form of an activation value, and the higher the activation value, the more likely the target is contained. The specific position of the target can be obtained by setting a threshold value according to the maximum connected region in the saliency map, the target bounding box is generated, and the obtained bounding box region has less background noise. In this way, the obtained target bounding box information is further used in step three to perform candidate local area filtering. The method of obtaining the saliency map is shown in figure 2. Step 2 may further comprise the steps of:
step 2.1, mapping each feature map into a neuron by the multi-channel feature map through a global average pooling layer, connecting the neuron with softmax for training, and predicting the category;
and 2.2, after the training is finished, multiplying and accumulating the weight of the class with the highest probability corresponding to the neuron with the multi-channel characteristic graph respectively to obtain the saliency map of the target.
The feature pyramid uses the pyramid shape of the feature hierarchical structure of the convolutional network to fuse the high semantic features with low resolution at the upper layer of the pyramid and the high resolution and low semantic features at the lower layer of the pyramid to obtain the features with high semantic information and relatively retaining more detailed information, and the local regions are independently predicted on the feature maps with different scales. Step 3 the structure of the feature pyramid is shown in fig. 3, and step 3 further comprises the following steps:
step 3.1, inputting the feature graph into a feature pyramid network to further generate feature graphs of three scales;
3.2, performing double upsampling on the upper layer characteristic diagram, fusing the upper layer characteristic diagram with the lower layer characteristic diagram after passing through a 1 × 1 convolution kernel to obtain fused characteristic diagrams of three scales;
and 3.3, selecting candidate areas with different sizes on the fusion characteristic diagram with the three scales, filtering the bounding box generated in the second step, predicting, and sorting according to the size of the activation value.
At this time, for each image, a plurality of local regions and activation values thereof obtained by filtering from the feature pyramid network and the saliency extraction network are selected, K local regions with the highest activation values are selected from the local regions and are scaled to be 224 multiplied by 224, then the local regions are sent into the ResNet-50 network model again for feature extraction, and finally the local regions are classified by a full connection layer. In order to optimize the selected local area, the method optimizes by using the sorting consistency loss, so that the classification prediction result of the local area at the moment has the same sorting with the size of the activation value obtained by the feature pyramid network, and the selected local area has the greatest distinction. A hinge loss function is introduced to optimize the model parameters to select the optimal local area.
Let K local regions be R ═ R1,R2,…,RKAnd ranking the activation values from high to low, wherein the activation values obtained through characteristic pyramid network prediction are S-S respectively1,S2,…,SK}. The K local areas are predicted by a ResNet-50 network to obtain the probability P ═ { P ═ P1,P2,…,PK}. The hinge penalty function can be viewed as a pairwise ordering penalty function that requires the elements S to have precedence in SiAnd SjIf S isi>SjThen there is also the same precedence order P in Pi>PjOtherwise, a penalty is imposed. The ordering penalty in this method is defined as follows:
Figure BDA0002211329790000051
wherein the hinge loss function f (x) is defined as:
f(x)=max{1-x,0}
in the aspect of model training, the model parameters are optimized by taking the sum of the sorting loss of the K local regions and the classification loss of the K local regions on ResNet-50 and the classification loss of the input images as the total loss. ResNet-50 is used as the basic network, and parameters are shared all the time. When the method is tested, the prediction type of each input image is obtained from the input image and the classification result of K local areas on ResNet-50.
The demonstration experiment of the invention uses a data set comprising: CUB-200 + 2011, Stanford Cars.
The CUB-200-2011 is a bird data set which is the most common and classical data set in the field of fine-grained image recognition at present. There were 11788 bird images in the dataset, grouped into 200 categories. There were 5994 training images and 5794 test images, with approximately 30 training images and 11-30 test images for each bird.
Stanford Cars is a vehicle data set proposed by professor Li-Feifeei at Stanford university, USA, and is one of the most commonly used data sets for fine-grained image recognition at present. There are 16185 images of the vehicles in the data set and the images are divided into 196 vehicle categories by brand, year and model. The number of training images is 8144, the number of test images is 8041, and on average, each vehicle type comprises 24-81 training images and 24-83 test images. The details of the above data set are given in the following table:
Figure BDA0002211329790000061
in addition, the experimental hardware environment: ubuntu 16.04, Telsa-P100 video card, video memory 12G, core (TM) i7 processor, main frequency 3.4G, memory 16G.
The code running environment is as follows: deep learning framework (Pythrch-0.4.1), Python 3.5.
The experimental results are as follows:
accuracy was selected as an evaluation index to evaluate the experimental results. Training and evaluation are performed under the same experimental environment for different semantic segmentation methods.
A deeper deep neural network ResNet-50 is used as the network backbone. The ResNet-50 network is pre-trained on the ImageNet-1k dataset, which saves a lot of initial parameter training time of the model and reduces model overfitting. During training, using SGD as a model optimizer, using multi-step learning ratesThe mode sets the learning rate, with an initial learning rate of 0.001, which drops to 1/10 after the 60 th and 100 th iterations. Setting the weight decay of the model to 10-4The momentum is set to 0.9 and the data size of the training batch is set to 16. Cross-entropy Loss in Loss was used as a classification Loss function in the experiments. The image in the dataset is pre-cropped to 448 x 448 size.
To verify the effectiveness of the target localization method presented herein, experiments were first performed on the CUB-200-2011 avian data set. The reason for selecting this data set is that the environment where the bird target is located is generally more complex, and besides the bird target itself is smaller, the bird itself has different attitudes such as flying in the air, perching on trees, swimming in water, and so on, and therefore often accompanies strong interference factors such as occlusion, attitude change, similar background, and so on, and thus the difficulty of accurate positioning is greater than that of a vehicle data set Stanford Cars. The results of the object localization obtained by the method of the present invention are shown in fig. 4. The first line in the figure is the original image processed to 448 × 448 in size, the second line is the saliency map obtained, and the last line is the target object bounding box generated. For the first column of pictures, the target object is located among a large number of branches; in the third column of pictures, the color of the tree is very similar to the color of the body of the target object, and both the tree and the target object have strong interference. It can be seen that the target saliency map and the bounding box obtained based on the method of the invention are accurate.
In addition, the method of the present invention was verified on the CUB-200 plus 2011 and Stanford Cars datasets. The ResNet-50 convolutional neural network is used as the basic network of the model. The ResNet-50 network has 50 convolutional layers, the residual modules adopt a bottleneck structure, and the modules adopt a jump connection mode, so that the characteristic extraction capability is stronger compared with that of VGGNet. Because the general scale of the fine-grained image data set is small, overfitting is easy to generate by direct training, so that the performance of the model is reduced, the model is pre-trained on the ImageNet-1k large-scale data set, the early-stage training process can be accelerated, and the model is not easy to fall into a local optimal solution.
In order to improve the practicability of the method, the method does not use additional marking information, realizes the positioning of the target object under the condition of weak supervision by adopting a global average pooling mode, and further obtains the bounding box of the target object. In order to improve the representation capability of the model on the local detail information, a feature pyramid network is adopted to fuse the feature map output by the ResNet-50 network. And after the activation values of the candidate regions are obtained, K highest activation regions are selected and sent to the ResNet-50 network again for category prediction. And then, removing redundant local regions by using an NMS algorithm, calculating the sequencing consistency loss of the local regions to optimize the selection of the local regions, and finally predicting and combining the selected local regions and the whole image classification result. The results of the experiments on the CUB-200 plus 2011 and Stanford Cars data sets are shown in Table 1. It can be seen that the recognition Accuracy of the method of the present invention on both data sets is higher than that of some methods which are popular at present, and particularly on the CUB-200-2011 data set, the method has obvious advantages compared with other methods.
TABLE 1 results on the CUB-200 plus 2011 and Stanford Cars datasets
Figure BDA0002211329790000071
According to the method, the target saliency map can be well obtained based on the global average pooling method, and the target position is further determined, so that the background noise of the local area selected in the next step is less, and the calculation cost is reduced. And a more robust feature is further extracted by using a feature pyramid network, and the module performs hierarchical fusion on the multi-scale features, so that the semanteme of the low-level features is enhanced, a network model can capture more detailed information, a more distinctive local area is found, and the identification effect of the model is finally improved. The method of the invention proves the effectiveness of the method through quantitative experiment results on CUB-200 plus 2011 and Stanford Cars data sets.

Claims (4)

1. A fine-grained image classification method based on a feature pyramid and global average pooling is characterized by comprising the following steps:
step 1, inputting images into a convolution layer of a pre-trained convolution neural network to obtain a multi-channel characteristic diagram;
step 2, the multi-channel feature map passes through a global average pooling layer to obtain a saliency map of the input image, and position information of a target is extracted;
step 3, extracting the characteristics of the multi-channel characteristic diagram by the characteristic pyramid network and predicting to obtain K local areas with the maximum information quantity; the step 3 comprises the following steps: step 3.1, inputting the feature graph into a feature pyramid network to generate feature graphs of N scales, wherein N is a natural number not less than 3; 3.2, performing upsampling on an upper layer characteristic diagram in the characteristic diagram obtained in the step 3.1, performing fusion on a lower layer characteristic diagram after convolution kernel to obtain fusion characteristic diagrams of N scales; 3.3, selecting candidate areas with different sizes on the fusion characteristic graph with N scales, filtering the bounding box generated in the step 2, predicting and sequencing according to the size of an activation value of the bounding box to obtain a local area, wherein the bounding box is generated by taking the maximum connected area in the saliency map and setting a threshold to obtain a specific position of a target;
and 4, aggregating the local features of the K local regions and global feature prediction obtained by the input image through the convolutional neural network to output the final identification category.
2. The fine-grained image classification method based on feature pyramid and global average pooling of claim 1, wherein the step 2 comprises the steps of: step 2.1, the global average pooling layer maps each feature map into a neuron, is connected with softmax for training, and predicts the category; step 2.2: and after training is finished, multiplying and accumulating the weight of the class with the highest probability corresponding to the neuron and the multi-channel characteristic graph respectively to obtain a saliency map.
3. The fine-grained image classification method based on the feature pyramid and the global average pooling of claim 1, wherein the K local regions with the largest information amount predicted in the step 3 are optimized by adopting rank consistent loss, so that the local region classification prediction result and the activation value obtained by the feature pyramid network have the same rank.
4. The method of claim 3, wherein the optimization with rank-consistent loss is performed by using a hinge loss function, and the K local regions are R ═ R { (R) } according to the method of classifying fine-grained images based on the feature pyramid and the global average pooling1,R2,...,RKAnd ranking the K local areas from high to low according to the activation values, wherein the activation values obtained by predicting the K local areas through the characteristic pyramid network are S ═ S respectively1,S2,...,SKAnd the K local areas are predicted by a convolutional neural network to obtain a probability P ═ P1,P2,...,PKThe ordering penalty is defined as follows:
Figure FDA0002693522270000011
Siand SjIn order to activate the value of the key,
the hinge loss function f (x) is: (x) max {1-x, 0 }.
CN201910899445.0A 2019-09-23 2019-09-23 Fine-grained image classification method based on feature pyramid and global average pooling Active CN110619369B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910899445.0A CN110619369B (en) 2019-09-23 2019-09-23 Fine-grained image classification method based on feature pyramid and global average pooling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910899445.0A CN110619369B (en) 2019-09-23 2019-09-23 Fine-grained image classification method based on feature pyramid and global average pooling

Publications (2)

Publication Number Publication Date
CN110619369A CN110619369A (en) 2019-12-27
CN110619369B true CN110619369B (en) 2020-12-11

Family

ID=68923922

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910899445.0A Active CN110619369B (en) 2019-09-23 2019-09-23 Fine-grained image classification method based on feature pyramid and global average pooling

Country Status (1)

Country Link
CN (1) CN110619369B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4209937A1 (en) * 2022-01-10 2023-07-12 Samsung Electronics Co., Ltd. Method and apparatus with object recognition

Families Citing this family (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111291767B (en) * 2020-02-12 2023-04-28 中山大学 Fine granularity identification method, terminal equipment and computer readable storage medium
CN111340046A (en) * 2020-02-18 2020-06-26 上海理工大学 Visual saliency detection method based on feature pyramid network and channel attention
CN111291819B (en) * 2020-02-19 2023-09-15 腾讯科技(深圳)有限公司 Image recognition method, device, electronic equipment and storage medium
CN113361529A (en) * 2020-03-03 2021-09-07 北京四维图新科技股份有限公司 Image semantic segmentation method and device, electronic equipment and storage medium
CN111461181B (en) * 2020-03-16 2021-09-07 北京邮电大学 Vehicle fine-grained classification method and device
CN111507215B (en) * 2020-04-08 2022-01-28 常熟理工学院 Video target segmentation method based on space-time convolution cyclic neural network and cavity convolution
CN111428689B (en) * 2020-04-20 2022-07-01 重庆邮电大学 Face image feature extraction method based on multi-pool information fusion
CN111832573B (en) * 2020-06-12 2022-04-15 桂林电子科技大学 Image emotion classification method based on class activation mapping and visual saliency
CN112016617B (en) * 2020-08-27 2023-12-01 中国平安财产保险股份有限公司 Fine granularity classification method, apparatus and computer readable storage medium
CN111985572B (en) * 2020-08-27 2022-03-25 中国科学院自动化研究所 Fine-grained image identification method of channel attention mechanism based on feature comparison
CN112215239A (en) * 2020-09-15 2021-01-12 浙江工业大学 Retinal lesion fine-grained grading method and device
CN112257758A (en) * 2020-09-27 2021-01-22 浙江大华技术股份有限公司 Fine-grained image recognition method, convolutional neural network and training method thereof
CN112528058B (en) * 2020-11-23 2022-09-02 西北工业大学 Fine-grained image classification method based on image attribute active learning
CN112508910A (en) * 2020-12-02 2021-03-16 创新奇智(深圳)技术有限公司 Defect extraction method and device for multi-classification defect detection
CN112446354A (en) * 2020-12-14 2021-03-05 浙江工商大学 Fine-grained image classification method based on multi-scale saliency map positioning
CN112686242B (en) * 2020-12-29 2023-04-18 昆明理工大学 Fine-grained image classification method based on multilayer focusing attention network
CN115393453A (en) * 2021-05-10 2022-11-25 京东科技控股股份有限公司 Image processing method and device and electronic equipment
CN113378883B (en) * 2021-05-12 2024-01-23 山东科技大学 Fine-grained vehicle classification method based on channel grouping attention model
CN113536973B (en) * 2021-06-28 2023-08-18 杭州电子科技大学 Traffic sign detection method based on saliency
CN113807362B (en) * 2021-09-03 2024-02-27 西安电子科技大学 Image classification method based on interlayer semantic information fusion depth convolution network
CN115661429B (en) * 2022-11-11 2023-03-10 四川川锅环保工程有限公司 System and method for identifying defects of boiler water wall pipe and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109215034A (en) * 2018-07-06 2019-01-15 成都图必优科技有限公司 A kind of Weakly supervised image, semantic dividing method for covering pond based on spatial pyramid
CN109376576A (en) * 2018-08-21 2019-02-22 中国海洋大学 The object detection method for training network from zero based on the intensive connection of alternately update
CN110009679A (en) * 2019-02-28 2019-07-12 江南大学 A kind of object localization method based on Analysis On Multi-scale Features convolutional neural networks

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9818048B2 (en) * 2015-01-19 2017-11-14 Ebay Inc. Fine-grained categorization
CN108764133B (en) * 2018-05-25 2020-10-20 北京旷视科技有限公司 Image recognition method, device and system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109215034A (en) * 2018-07-06 2019-01-15 成都图必优科技有限公司 A kind of Weakly supervised image, semantic dividing method for covering pond based on spatial pyramid
CN109376576A (en) * 2018-08-21 2019-02-22 中国海洋大学 The object detection method for training network from zero based on the intensive connection of alternately update
CN110009679A (en) * 2019-02-28 2019-07-12 江南大学 A kind of object localization method based on Analysis On Multi-scale Features convolutional neural networks

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
联合膨胀卷积残差网络和金字塔池化表达的高分影像建筑物自动识别;乔文凡 等;《地理与地理信息科学》;20180930;第34卷(第5期);第56-62页 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4209937A1 (en) * 2022-01-10 2023-07-12 Samsung Electronics Co., Ltd. Method and apparatus with object recognition

Also Published As

Publication number Publication date
CN110619369A (en) 2019-12-27

Similar Documents

Publication Publication Date Title
CN110619369B (en) Fine-grained image classification method based on feature pyramid and global average pooling
CN110428428B (en) Image semantic segmentation method, electronic equipment and readable storage medium
CN109934293B (en) Image recognition method, device, medium and confusion perception convolutional neural network
CN109447034B (en) Traffic sign detection method in automatic driving based on YOLOv3 network
CN107609601B (en) Ship target identification method based on multilayer convolutional neural network
CN110837836B (en) Semi-supervised semantic segmentation method based on maximized confidence
CN108304873B (en) Target detection method and system based on high-resolution optical satellite remote sensing image
CN110414377B (en) Remote sensing image scene classification method based on scale attention network
CN111027493B (en) Pedestrian detection method based on deep learning multi-network soft fusion
Endres et al. Category-independent object proposals with diverse ranking
CN107209873B (en) Hyper-parameter selection for deep convolutional networks
CN108509978A (en) The multi-class targets detection method and model of multi-stage characteristics fusion based on CNN
CN110263786B (en) Road multi-target identification system and method based on feature dimension fusion
CN112131978B (en) Video classification method and device, electronic equipment and storage medium
CN107683469A (en) A kind of product classification method and device based on deep learning
CN111652317B (en) Super-parameter image segmentation method based on Bayes deep learning
CN110276248B (en) Facial expression recognition method based on sample weight distribution and deep learning
CN105184298A (en) Image classification method through fast and locality-constrained low-rank coding process
CN110008899B (en) Method for extracting and classifying candidate targets of visible light remote sensing image
CN109903339B (en) Video group figure positioning detection method based on multi-dimensional fusion features
CN110892409A (en) Method and apparatus for analyzing images
CN110633727A (en) Deep neural network ship target fine-grained identification method based on selective search
CN109165658B (en) Strong negative sample underwater target detection method based on fast-RCNN
CN112861917A (en) Weak supervision target detection method based on image attribute learning
CN110852327A (en) Image processing method, image processing device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20210423

Address after: 215000 Building No. 1, Kechuang Park, Taihu New Town, Wujiang, No. 18, Suzhou River Road, Wujiang District, Jiangsu Province

Patentee after: Jiangsu Yiyou Huiyun Software Co.,Ltd.

Address before: 215500, No. three, No. 99 South Ring Road, Changshou City, Jiangsu, Suzhou

Patentee before: CHANGSHU INSTITUTE OF TECHNOLOGY

TR01 Transfer of patent right