CN110555461A - scene classification method and system based on multi-structure convolutional neural network feature fusion - Google Patents

scene classification method and system based on multi-structure convolutional neural network feature fusion Download PDF

Info

Publication number
CN110555461A
CN110555461A CN201910702273.3A CN201910702273A CN110555461A CN 110555461 A CN110555461 A CN 110555461A CN 201910702273 A CN201910702273 A CN 201910702273A CN 110555461 A CN110555461 A CN 110555461A
Authority
CN
China
Prior art keywords
vgg
feature
neural network
fusion
convolutional neural
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910702273.3A
Other languages
Chinese (zh)
Inventor
刘峰
戴向娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China University of Geosciences
Original Assignee
China University of Geosciences
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China University of Geosciences filed Critical China University of Geosciences
Priority to CN201910702273.3A priority Critical patent/CN110555461A/en
Publication of CN110555461A publication Critical patent/CN110555461A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KGRAPHICAL DATA READING; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for recognising patterns
    • G06K9/62Methods or arrangements for pattern recognition using electronic means
    • G06K9/6217Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06K9/6256Obtaining sets of training patterns; Bootstrap methods, e.g. bagging, boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KGRAPHICAL DATA READING; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for recognising patterns
    • G06K9/62Methods or arrangements for pattern recognition using electronic means
    • G06K9/6267Classification techniques
    • G06K9/6268Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Computing arrangements based on biological models using neural network models
    • G06N3/04Architectures, e.g. interconnection topology
    • G06N3/0454Architectures, e.g. interconnection topology using a combination of multiple neural nets
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Computing arrangements based on biological models using neural network models
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/13Satellite images

Abstract

the invention provides a scene classification method and a scene classification system based on multi-structure convolutional neural network feature fusion, and the scene classification method and the scene classification system based on multi-structure convolutional neural network feature fusion comprise the steps of selecting CaffeNet, VGG-s and VGG-F models, pre-training the CaffeNet, VGG-s and VGG-F models on an ImageNet data set, selecting an original scene image set subjected to scene classification, finely adjusting the CaffeNet, VGG-s and VGG-F models after pre-training by using the original scene image set to obtain network CaffeNet-t, VGG-s-t and VGG-F-t, respectively using the Caffet-t, VGG-s-t and VGG-F-t as feature extractors to respectively process the original scene image set to extract features of each image in the original scene set, obtaining features of each image in the original scene set from the last full connection layer of the convolutional neural network, obtaining features Caffe-t, F VGG-s-t and F VGG-F-t from the last full connection layer of the convolutional neural network, obtaining a feature set by serial feature fusion Caffe-t, fusing three features with F57 and a final feature set, and obtaining a classification feature set by using a training classifier to obtain a scene classification feature set to be tested.

Description

Scene classification method and system based on multi-structure convolutional neural network feature fusion
Technical Field
the invention relates to the technical field of digital image processing, in particular to remote sensing image processing, and specifically relates to a scene classification method and system based on multi-structure convolution neural network feature fusion.
Background
with the development of satellite remote sensing technology, remote sensing image scene classification becomes an active research topic, and the purpose is to divide extracted remote sensing image sub-regions covering various ground types or ground objects into different semantic categories, which has been widely used in various practical remote sensing applications, such as land resource management, city planning, and the like. Learning efficient image representation is the core of the remote sensing image scene classification task. Due to high intra-class differences and high inter-class similarities between actual scene images, feature coding methods based on low-level hand-designed features or unsupervised feature learning based scene classification tasks can only generate mid-level image features with limited representation capability, which fundamentally limits the performance of the scene classification task.
recently, with the development of deep learning, and in particular convolutional neural networks, convolutional neural networks have shown surprising performance in object recognition and detection. A plurality of researchers also use the method for the classification of the remote sensing image scenes, and the good classification performance is achieved. Although the current methods can further improve the classification performance, one of the limitations of these methods is that only one convolutional neural network structure is used to extract the features of the scene image, and the complementarity of the extracted features of different convolutional neural network structures is ignored. Therefore, the practical application of the convolutional neural network is restricted to a certain extent, so that how to better apply the convolutional neural network to an algorithm model for remote sensing image classification is to improve the classification precision, and the convolutional neural network is a main problem to be researched and solved in the convolutional neural network research.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a scene classification method and system based on multi-structure convolution neural network feature fusion to solve the technical defects, aiming at the technical problem that the classification precision in the current remote sensing image processing process is not high.
The scene classification method based on multi-structure convolutional neural network feature fusion comprises the following steps:
s1, selecting CaffeNet, VGG-S and VGG-f models, and pre-training the CaffeNet, VGG-S and VGG-f models on the ImageNet data set;
s2, selecting an original scene image set subjected to scene classification;
s3, fine-tuning the CaffeNet, VGG-S and VGG-f models after pre-training by using the original scene image set to obtain networks CaffeNet-t, VGG-S-t and VGG-f-t;
s4, respectively processing the original scene image set by respectively using CaffeNet-t, VGG-S-t and VGG-F-t as feature extractors to extract the features of each image in the original scene image set, and obtaining feature sets F Caffe-t, F VGG-S-t and F VGG-F-t from the penultimate full-connected layer of the convolutional neural network;
s5, fusing three feature sets F Caffe-t, F VGG-S-t and F VGG-F-t through a serial feature fusion strategy to obtain a fusion feature set;
S6, training the extreme learning machine classifier by using the fusion feature set, and obtaining a final classifier after the training is finished;
and S7, inputting the scene image to be detected into a final classifier for scene classification.
Further, the fine-tuning of the cafneet, VGG-S and VGG-f models in step S3 specifically includes:
s31, setting the fine tuning iteration number N, the learning rate alpha and the batch size mini _ batch;
S32, forward propagation training: and calculating the real classification effect of the network structure under the current coefficient, wherein the iteration process is as follows:
xi+1=fi(ui),
ui=Wixi+bi
wherein, x i is the input of the i-th layer, W i is the weight vector of the i-th layer, which acts on the input data, b i is the additional offset vector of the i-th layer, f i (-) represents the activation function of the i-th layer, and u i is the result of convolution operation on the input;
s33, back propagation training: and continuously iterating and updating the coefficient through comparison between the network output and the real label to enable the output result to be close to the expected value, wherein the iteration process is as follows:
wherein, the learning rate alpha is a control factor of the back propagation strength, and L (W, b) is a loss function;
s34, repeating the steps S32 and S33N times according to the iteration number N set in S31.
further, in step S4, the feature extracted from the penultimate fully connected layer of cafneet-t is F Caffe-t, the feature extracted from the penultimate fully connected layer of VGG-S-t is F VGG-S-t, and the feature extracted from the penultimate fully connected layer of VGG-F-t is F VGG-F-t, with feature dimensions of 4096.
Further, the serial feature fusion policy in step S5 specifically includes:
s51, using a KPCA dimension reduction method to respectively reduce the dimension of each 4096-dimensional feature F Caffe-t, F VGG-S-t and F VGG-F-t, wherein the feature dimensions after dimension reduction are all 2048;
S52, fusing the feature after dimensionality reduction by using a serial feature fusion method, wherein the dimensionality of the feature after fusion is 6144;
And S53, performing dimension reduction processing on the fused feature vector again by using a KPCA dimension reduction method, and taking the finally obtained fusion feature as the final representation of the image, wherein the dimension is 4096.
scene classification system based on multi-structure convolution neural network feature fusion includes: a processor and a storage device; the processor loads and executes the instructions and data in the storage device to realize any scene classification method based on multi-structure convolutional neural network feature fusion.
Compared with the prior art, the invention has the beneficial effects that:
1. aiming at the advantages and the disadvantages of the convolutional neural network and the extreme learning machine, the remote sensing scene classification frame using the convolutional neural network as the feature extractor and the extreme learning machine as the classifier is provided, so that the advantages of the convolutional neural network and the extreme learning machine can be fully exerted, and the classification precision is improved.
2. by adopting the feature fusion method of reducing the dimension first and then fusing and then reducing the dimension, the redundant information and the noise can be removed to the maximum extent to reduce the dimension of the final feature vector on the premise of accelerating the speed of training the classifier and improving the classification precision.
3. the characteristics of the multi-structure convolutional neural network are fused, the complementarity of the characteristics extracted by different convolutional neural network structures can be fully utilized by the fused characteristics, the distinguishing performance of characteristic vectors can be effectively improved, the classification performance of the convolutional neural network can be obviously improved, and the characteristic expression capability and the classification precision of a remote sensing image scene are better.
drawings
the invention will be further described with reference to the accompanying drawings and examples, in which:
FIG. 1 is a flow chart of a scene classification method based on multi-structure convolutional neural network feature fusion according to the present invention;
FIG. 2 is an AID data set example image;
Fig. 3 is a confusion matrix that classifies AID datasets.
Detailed Description
For a more clear understanding of the technical features, objects and effects of the present invention, embodiments of the present invention will now be described in detail with reference to the accompanying drawings.
as shown in fig. 1, the scene classification method based on multi-structure convolutional neural network feature fusion includes:
s1, selecting CaffeNet, VGG-S and VGG-f models, and pre-training the CaffeNet, VGG-S and VGG-f models on the ImageNet data set;
s2, selecting an original scene image set subjected to scene classification;
s3, fine-tuning the CaffeNet, VGG-S and VGG-f models after pre-training by using the original scene image set to obtain networks CaffeNet-t, VGG-S-t and VGG-f-t;
S4, respectively processing the original scene image set by respectively using CaffeNet-t, VGG-S-t and VGG-F-t as feature extractors to extract the features of each image in the original scene image set, and obtaining feature sets F Caffe-t, F VGG-S-t and F VGG-F-t from the penultimate full-connected layer of the convolutional neural network;
s5, fusing three feature sets F Caffe-t, F VGG-S-t and F VGG-F-t through a serial feature fusion strategy to obtain a fusion feature set;
S6, training the extreme learning machine classifier by using the fusion feature set, and obtaining a final classifier after the training is finished;
and S7, inputting the scene image to be detected into a final classifier for scene classification.
The step S3 of fine tuning the CaffeNet, VGG-S and VGG-f models specifically comprises the following steps:
s31, setting the fine tuning iteration number N, the learning rate alpha and the batch size mini _ batch;
S32, forward propagation training: and calculating the real classification effect of the network structure under the current coefficient, wherein the iteration process is as follows:
xi+1=fi(ui),
ui=Wixi+bi
wherein, x i is the input of the i-th layer, W i is the weight vector of the i-th layer, which acts on the input data, b i is the additional offset vector of the i-th layer, f i (-) represents the activation function of the i-th layer, and u i is the result of convolution operation on the input;
s33, back propagation training: and continuously iterating and updating the coefficient through comparison between the network output and the real label to enable the output result to be close to the expected value, wherein the iteration process is as follows:
Wherein, the learning rate alpha is a control factor of the back propagation strength, and L (W, b) is a loss function;
S34, repeating the steps S32 and S33N times according to the iteration number N set in S31.
in step S4, the feature extracted from the penultimate fully connected layer of CaffeNet-t is F Caffe-t, the feature extracted from the penultimate fully connected layer of VGG-S-t is F VGG-S-t, the feature extracted from the penultimate fully connected layer of VGG-F-t is F VGG-F-t, and the feature dimensions are 4096.
The serial feature fusion strategy in step S5 specifically includes:
S51, using a KPCA dimension reduction method to respectively reduce the dimension of each 4096-dimensional feature F Caffe-t, F VGG-S-t and F VGG-F-t, wherein the feature dimensions after dimension reduction are all 2048;
S52, fusing the feature after dimensionality reduction by using a serial feature fusion method, wherein the dimensionality of the feature after fusion is 6144;
and S53, performing dimension reduction processing on the fused feature vector again by using a KPCA dimension reduction method, and taking the finally obtained fusion feature as the final representation of the image, wherein the dimension is 4096.
The examples of the invention are as follows:
1. the selected original scene image set is an AID data set, the data set comprises 30 remote sensing image scene categories of Airport, Bar Land, Baseball Field, Beach, Bridge, Center, Church, Commercial, Dense Resiental, Desert, Farmland, Forest, Industrial, Meadow, Medium Resiental, Mountain, Park, Parking, Playground, Pond, Port, Railway State, Resort, River, School, SparseResiental, Square, Stadium, Storage Tanks and Viaduct, the total number of images is 10000, and an example image of each category is listed as shown in FIG. 2. The number of images for different scene categories varies from 220 to 420, the image size is 600 × 600 pixels, and the spatial resolution of the images varies from about 8 meters to about 0.5 meters;
2. dividing the image data set, randomly selecting 20% of images in each class as a training set (an original scene image set), and using the rest 80% of images as a test set (to-be-tested scene images);
3. Selecting 3 convolutional neural networks CaffeNet, VGG-s and VGG-f with different structures as experimental models according to the difference of the number and the size of convolutional layer filters;
4. fine-tuning three convolutional neural networks CaffeNet, VGG-s and VGG-f trained on the ImageNet data set by using a training set, wherein the training iteration times are 500 times, and obtaining fine-tuned networks CaffeNet-t, VGG-s-t and VGG-f-t;
5. respectively using CaffeNet-t, VGG-s-t and VGG-F-t as feature extractors to process an original scene image set, and obtaining features F Caffe-t, F VGG-s-t and F VGG-F-t from a penultimate full-connected layer of a convolutional neural network;
6. fusing the features F Caffe-t, F VGG-s-t and F VGG-F-t extracted by the convolutional neural networks of different structures by using a KPCA dimension reduction method and a serial feature fusion strategy, and representing the fused features as final features of the original image;
7. And (3) carrying out final classification of the fusion features by using an Extreme Learning Machine (ELM) classifier, wherein the steps are as follows:
(1) dividing the fused features into a training feature set and a testing feature set according to the division of the training set and the testing set of the original data;
(2) Training a support extreme learning machine classifier by using the training feature set;
(3) classifying the test feature set by using a trained support extreme learning machine classifier;
(4) the final classification accuracy is calculated.
the confusion matrix obtained by the experiment is shown in fig. 3, wherein the horizontal axis is the true value and the vertical axis is the predicted value.
as can be seen from fig. 3, the accuracy of the method on 13 classes is over 95%, and the obtained overall accuracy of 92.54% is better than that of the existing most advanced methods, so that the effectiveness of the method for remote sensing image scene classification can be proved. Some types with small inter-class differences, such as sense identification, Medium identification, and spareidentification, can also be classified accurately. However, the main confusion is that between School and Commercial, Resort and Park. As shown in FIG. 2, School has the same image distribution as Commercial; resort and Park have similar object and image textures, such as green belts and buildings. Thus, these classes are easily confused.
Table 1 lists the classification accuracy using this method and the classification accuracy using a single convolutional neural network structure for classification or as a feature extractor. As can be seen from the table, by fusing the full-link layer features of 3 simple CNN structures, better classification performance than that of a single CNN can be obtained, which also indicates that the classification performance of CNN can be significantly improved by using complementarity of features extracted from different CNN structures. In addition, the convolutional neural network can be combined with an extreme learning machine to improve the classification accuracy.
TABLE 1
While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (5)

1. The scene classification method based on multi-structure convolutional neural network feature fusion is characterized by comprising the following steps:
s1, selecting CaffeNet, VGG-S and VGG-f models, and pre-training the CaffeNet, VGG-S and VGG-f models on the ImageNet data set;
s2, selecting an original scene image set subjected to scene classification;
s3, fine-tuning the CaffeNet, VGG-S and VGG-f models after pre-training by using the original scene image set to obtain networks CaffeNet-t, VGG-S-t and VGG-f-t;
s4, respectively processing the original scene image set by respectively using CaffeNet-t, VGG-S-t and VGG-F-t as feature extractors to extract the features of each image in the original scene image set, and obtaining feature sets F Caffe-t, F VGG-S-t and F VGG-F-t from the penultimate full-connected layer of the convolutional neural network;
s5, fusing three feature sets F Caffe-t, F VGG-S-t and F VGG-F-t through a serial feature fusion strategy to obtain a fusion feature set;
s6, training the extreme learning machine classifier by using the fusion feature set, and obtaining a final classifier after the training is finished;
and S7, inputting the scene image to be detected into a final classifier for scene classification.
2. The method for scene classification based on multi-structure convolutional neural network feature fusion of claim 1, wherein the fine-tuning of CaffeNet, VGG-S and VGG-f models in step S3 specifically comprises:
S31, setting the fine tuning iteration number N, the learning rate alpha and the batch size mini _ batch;
S32, forward propagation training: and calculating the real classification effect of the network structure under the current coefficient, wherein the iteration process is as follows:
xi+1=fi(ui),
ui=Wixi+bi
wherein, x i is the input of the i-th layer, W i is the weight vector of the i-th layer, which acts on the input data, b i is the additional offset vector of the i-th layer, f i (-) represents the activation function of the i-th layer, and u i is the result of convolution operation on the input;
S33, back propagation training: and continuously iterating and updating the coefficient through comparison between the network output and the real label to enable the output result to be close to the expected value, wherein the iteration process is as follows:
wherein, the learning rate alpha is a control factor of the back propagation strength, and L (W, b) is a loss function;
s34, repeating the steps S32 and S33N times according to the iteration number N set in S31.
3. the method for scene classification based on multi-structure convolutional neural network feature fusion of claim 1, wherein in step S4, the feature extracted from the penultimate fully connected layer of cafneet-t is F Caffe-t, the feature extracted from the penultimate fully connected layer of VGG-S-t is F VGG-S-t, the feature extracted from the penultimate fully connected layer of VGG-F-t is F VGG-F-t, and the feature dimensions are 4096.
4. The method for classifying scenes based on multi-structure convolutional neural network feature fusion according to claim 1, wherein the serial feature fusion strategy in step S5 specifically includes:
s51, using a KPCA dimension reduction method to respectively reduce the dimension of each 4096-dimensional feature F Caffe-t, F VGG-S-t and F VGG-F-t, wherein the feature dimensions after dimension reduction are all 2048;
S52, fusing the feature after dimensionality reduction by using a serial feature fusion method, wherein the dimensionality of the feature after fusion is 6144;
and S53, performing dimension reduction processing on the fused feature vector again by using a KPCA dimension reduction method, and taking the finally obtained fusion feature as the final representation of the image, wherein the dimension is 4096.
5. Scene classification system based on multi-structure convolution neural network feature fusion, its characterized in that includes: a processor and a storage device; the processor loads and executes the instructions and data in the storage device to realize the scene classification method based on the multi-structure convolutional neural network feature fusion as claimed in any one of claims 1 to 4.
CN201910702273.3A 2019-07-31 2019-07-31 scene classification method and system based on multi-structure convolutional neural network feature fusion Pending CN110555461A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910702273.3A CN110555461A (en) 2019-07-31 2019-07-31 scene classification method and system based on multi-structure convolutional neural network feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910702273.3A CN110555461A (en) 2019-07-31 2019-07-31 scene classification method and system based on multi-structure convolutional neural network feature fusion

Publications (1)

Publication Number Publication Date
CN110555461A true CN110555461A (en) 2019-12-10

Family

ID=68736927

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910702273.3A Pending CN110555461A (en) 2019-07-31 2019-07-31 scene classification method and system based on multi-structure convolutional neural network feature fusion

Country Status (1)

Country Link
CN (1) CN110555461A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111291900A (en) * 2020-03-05 2020-06-16 支付宝(杭州)信息技术有限公司 Method and device for training risk recognition model

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111291900A (en) * 2020-03-05 2020-06-16 支付宝(杭州)信息技术有限公司 Method and device for training risk recognition model

Similar Documents

Publication Publication Date Title
CN106845529B (en) Image feature identification method based on multi-view convolution neural network
Thai et al. Image classification using support vector machine and artificial neural network
CN108537192B (en) Remote sensing image earth surface coverage classification method based on full convolution network
CN106845401B (en) Pest image identification method based on multi-space convolution neural network
CN110443143B (en) Multi-branch convolutional neural network fused remote sensing image scene classification method
CN110414377B (en) Remote sensing image scene classification method based on scale attention network
CN103927531A (en) Human face recognition method based on local binary value and PSO BP neural network
CN107644221A (en) Convolutional neural networks traffic sign recognition method based on compression of parameters
CN109165682B (en) Remote sensing image scene classification method integrating depth features and saliency features
CN111259905A (en) Feature fusion remote sensing image semantic segmentation method based on downsampling
CN110059878B (en) Photovoltaic power generation power prediction model based on CNN LSTM and construction method thereof
CN112784779A (en) Remote sensing image scene classification method based on feature pyramid multilevel feature fusion
CN111460980A (en) Multi-scale detection method for small-target pedestrian based on multi-semantic feature fusion
CN112132207A (en) Target detection neural network construction method based on multi-branch feature mapping
CN110705374A (en) Transmission line defect identification method based on improved RetinaNet network
CN110555461A (en) scene classification method and system based on multi-structure convolutional neural network feature fusion
CN110263644B (en) Remote sensing image classification method, system, equipment and medium based on triplet network
CN108446312B (en) Optical remote sensing image retrieval method based on deep convolution semantic net
CN113505792A (en) Multi-scale semantic segmentation method and model for unbalanced remote sensing image
CN113688867A (en) Cross-domain image classification method
CN107529647B (en) Cloud picture cloud amount calculation method based on multilayer unsupervised sparse learning network
CN113256546A (en) Depth map completion method based on color map guidance
CN112164077A (en) Cell example segmentation method based on bottom-up path enhancement
CN110633706B (en) Semantic segmentation method based on pyramid network
CN112395953A (en) Road surface foreign matter detection system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination