CN106991382A - A kind of remote sensing scene classification method - Google Patents

A kind of remote sensing scene classification method Download PDF

Info

Publication number
CN106991382A
CN106991382A CN201710147637.7A CN201710147637A CN106991382A CN 106991382 A CN106991382 A CN 106991382A CN 201710147637 A CN201710147637 A CN 201710147637A CN 106991382 A CN106991382 A CN 106991382A
Authority
CN
China
Prior art keywords
image
remote sensing
classification
scale
pyramid
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710147637.7A
Other languages
Chinese (zh)
Inventor
刘青山
杭仁龙
葛玲玲
宋慧慧
孙玉宝
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Information Science and Technology
Original Assignee
Nanjing University of Information Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Information Science and Technology filed Critical Nanjing University of Information Science and Technology
Priority to CN201710147637.7A priority Critical patent/CN106991382A/en
Publication of CN106991382A publication Critical patent/CN106991382A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/13Satellite images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • G06F18/2193Validation; Performance evaluation; Active pattern learning techniques based on specific statistical tests
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/259Fusion by voting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10032Satellite or aerial image; Remote sensing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20016Hierarchical, coarse-to-fine, multiscale or multiresolution image processing; Pyramid transform

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Astronomy & Astrophysics (AREA)
  • Remote Sensing (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a kind of remote sensing scene classification method, comprise the following steps:Generate multi-scale image;Extract multiple dimensioned depth characteristic;Merge convolution feature;Assemble multiple dimensioned classification results.The present invention proposes a kind of adaptive depth pyramid matching (ADPM) model, multi-scale image is sent to the convolutional neural networks with spatial pyramid pond to extract depth characteristic, SVM classifier, which is sent to, after the depth characteristic extracted in all convolutional layers is merged obtains classification results, assemble multiple dimensioned result and more information is provided, in order to remote sensing scene classification.Compared with spatial relationship pyramid (PSR), partial detection device (Partlets) method, semi-supervised projection (SSEP) method, under identical experiment condition, the remote sensing scene classification performance of the inventive method is improved, and classification results are more accurate.

Description

Remote sensing scene classification method
Technical Field
The invention belongs to the technical field of image information processing, and relates to a remote sensing scene classification method.
Background
With the development of remote sensing technology, a large number of high-resolution earth observation images are acquired from satellites and airplanes. Unlike other images, remote sensing scenes exhibit some special characteristics. For example, there are various sizes, colors, and orientations in a scene. In various applications such as land resource management and urban planning, remote sensing scene classification is a fundamental work, and is an important research topic. It has become an urgent need to automatically and accurately interpret such a large image library.
Over the past few years, a number of feature representation models have been proposed for use in scene classification. One of the most common models is the visual bag of words, which generally includes the following three steps: 1) extracting bottom layer visual features of the image, such as the description of Scale Invariant Feature Transform (SIFT) and Histogram of Oriented Gradients (HOG); 2) forming a visual vocabulary by clustering features using k-means or other methods; 3) visual features are mapped to the nearest word, and a medium level feature representation is generated through a word histogram. This model and its variants have been extensively studied in the field of remote sensing.
Although the visual bag of words is somewhat effective in remote sensing scene classification, it provides an unordered set of local descriptors and does not take into account spatial information. To overcome this drawback, a spatial pyramid matching model is developed. The model first segments the original image into different levels of resolution. Second, for each level of resolution, a histogram of local features is extracted from each space. Finally, the spatial histogram is represented by a weighted pyramid matching kernel. Since remote sensing images generally do not have an absolute frame of reference, the relative spatial arrangement of image elements becomes important. Therefore, it is proposed to represent photometric and geometric information of an image with a spatial pyramid co-occurrence model. Unlike segmenting images into uniform cells, the spatial pyramid co-occurrence model uses random spatial segmentation to describe various image layouts.
All of the above methods are based on manual extraction of features, which relies heavily on expert experience and domain knowledge. Furthermore, these features make it difficult to achieve an optimal balance between discrimination and robustness. This is mainly due to the fact that the details of the real data are not taken into account. Deep learning algorithms, especially convolutional neural networks, have shown great potential in solving this problem, because high-level semantic features can be automatically learned from a hierarchical manner of original images, which has attracted more and more attention in remote sensing communities.
However, it is difficult to apply convolutional neural networks directly to the remote sensing scene image classification because millions of parameters are often trained, and the number of available training sets is small. Many related studies have shown that features extracted from convolutional neural networks can be used as generic descriptors. Thus, image representations learned from large-scale annotation data, such as ImageNet, using neural networks can be effectively transferred to a broad visual recognition task with a limited amount of training data. With this in mind, relevant studies validated the feasibility of remote sensing scene classification using ImageNet pre-trained convolutional neural networks. The classification performance is impressive by adopting the pre-trained convolutional neural network and finely adjusting the remote sensing scene data. At present, the generalization capability of extracting features from the fully-connected layer of the convolutional neural network is evaluated on remote sensing scene classification, and the latest result is displayed on a public remote sensing scene data set.
Although the problem of overfitting can be alleviated by transfer learning, some problems still exist in remote sensing scene classification based on the convolutional neural network. First, most convolutional neural networks utilize only the last fully-connected layer as a subsequent classification feature. It is not reasonable to directly discard the features of the previous convolutional layer, as these may be beneficial to the classification goal. In fact, features extracted from convolutional layers are more generic than those extracted from the connection layers, and therefore these features may be more suitable for transfer learning. In addition, the features extracted from the convolutional layer contain more spatial information than the fully-connected layer is activated, facilitating image classification. Recently, the importance of the characteristics of convolutional layers has been recognized, but they use only the last convolutional layer, ignoring the others.
It is also a notable problem that objects of interest often have different dimensions in different remote sensing scenes, and even a scene may contain objects of different sizes. However, the most popular convolutional neural networks require a fixed size input image (e.g., 227 x 227 pixels). A common solution is to deform or fix the original remote sensing image to a predefined size, which inevitably results in loss of valid discrimination information.
Inspired by the spatial pyramid model, we consider the features of all convolutional layers as a multi-resolution representation of the input image. The pyramid matching kernel is then used to integrate a unified representation. Unlike the spatial pyramid model, we use low-level descriptors as depth features to learn optimal fusion weights between different convolutional layers from the data itself, rather than being predefined. Information loss due to the fixed input image size is reduced, and more supplementary information can be learned from different scales by feeding the multi-scale image into the convolutional neural network. Considering the computational cost of learning multi-scale depth features, we select convolutional neural networks with spatial pyramid pooling as our underlying depth network. Adding a spatial pyramid pooling layer before the fully connected network allows the input image to be of arbitrary size. Therefore, a trained spatial pyramid pooling network can extract multi-scale features from multi-scale input images, and classification of remote sensing scenes is facilitated.
Disclosure of Invention
The technical problem to be solved by the invention is to overcome the defects of the prior art, provide a remote sensing scene classification method, fully utilize the advantages of multi-scale depth feature extraction and a self-adaptive depth pyramid matching model, better classify the remote sensing scene, and have better classification performance and classification accuracy.
The remote sensing image classification method comprises the following steps:
step 1), generating an image with different scales of NxN by a remote sensing image to be classified through a deformation method, wherein N can take a plurality of values according to the size of the image;
step 2), sending the multi-scale image into a convolutional neural network with space pyramid pooling for training so as to extract multi-scale depth features;
step 3), for the input image of each scale, applying a self-adaptive depth pyramid matching model to fuse the feature representations extracted from all the convolutional layers;
and 4) sending the feature representation learned by each scale image into a classifier to obtain a final classification result, and then integrating a plurality of results of all scales by utilizing a majority voting strategy, namely correctly classifying the remote sensing image scene.
In order to avoid the loss of effective discrimination information, the invention further adopts the following improved scheme: generating the remote sensing scene images to be classified in the step 1) into different scales, such as 128 × 128, 192 × 192, 227 × 227, 256 × 256 and 384 × 384, by a deformation method.
Advantageous effects
Under the same experimental conditions, the classification accuracy of the method is higher than the accuracy of a space relation Pyramid (PSR), a local detector (Partlets) method and a semi-supervised projection (SSEP) method;
and secondly, a plurality of results with all scales are integrated by using a majority voting strategy, more identifiable information can be provided, and the classification accuracy is improved.
Drawings
FIG. 1 is a basic flow chart of the remote sensing scene classification method of the present invention;
FIG. 2 is a system architecture of the multi-scale depth feature extraction process in the remote sensing image classification method of the present invention;
FIG. 3 is a flow chart of the adaptive depth pyramid matching method of the present invention;
FIG. 4 is a histogram of each type of accuracy of the method of the present invention and the spatial relationship Pyramid (PSR), local detector (Partlets) method.
FIG. 5 is a histogram of the accuracy of each type of the method of the present invention and the semi-supervised projection (SSEP) method.
Detailed Description
The technical scheme of the invention is explained in detail in the following with the accompanying drawings:
the idea of the invention is to fully utilize the advantages of multi-scale depth feature extraction and a self-adaptive depth pyramid matching model, fully mine all convolutional layer feature information of a convolutional neural network, and integrate a plurality of results of all scales by adopting a majority voting strategy, so that the remote sensing scene can be better classified, the classification performance is better, and the classification accuracy is improved.
The basic flow of the method of the invention is shown in fig. 1, and specifically comprises the following steps:
step 1), generating a multi-scale image: and dividing the remote sensing scene image to be classified into a plurality of images with different scales NxN in a deformation mode to obtain a group of multi-scale image sets of the images.
The value of N can be determined according to factors such as the spatial resolution of the sensor, the size of a target object in the remote sensing image and the like. In specific implementation, in order to avoid the loss of the discrimination information, besides the original image is retained, the image needs to be deformed into a plurality of multi-scale image blocks, so as to implement a group of multi-scale image block set structure. Taking 256 × 256 as an example of the original image, N may be 128, 192, 227, 256, respectively.
Step 2), extracting multi-scale depth features: and (3) sending the multi-scale image into a convolutional neural network with space pyramid pooling for training so as to extract the multi-scale depth features.
The architecture for extracting multi-scale depth features, as shown in fig. 2, comprises five convolutional layers, a spatial pyramid pooling layer, two fully-connected layers, and a softmax layer. Similar to spatial pyramid matching, we map the features of the partitions to increasingly thinner sub-regions, pooling the features within each sub-region by maximal pooling. Assume that each feature map after the last convolutional layer is of size a x a and each feature map is divided into sub-regions of size n x n. The spatial pyramid pooling can be regarded as a convolution operator with a window size of a/n and a step size of a/n in a sliding window mode. Here we have chosen three levels of spatial pyramid pooling configurations, n × n being 1 × 1, 2 × 2 and 4 × 4 respectively. The final output of spatial pyramid pooling is to concatenate the three levels of pooling results into a vector, producing a fixed length representation, regardless of the size of the input image. These input images of different scales share a spatial pyramid pooling network.
The key of the multi-scale depth feature extraction is the training of the network. In order to ensure the effectiveness of the training network, the network is pre-trained by using an ImageNet 2012 data set, the weight parameters of the first five convolutional layers are transferred and fixed, and then the space pyramid pooling network is finely tuned by using a remote sensing scene training sample.
Step 3), fusing convolution characteristics: and for the input image of each scale, fusing the feature representations extracted from all the convolutional layers by using an adaptive depth pyramid matching model.
Feature representations extracted from all convolutional layers are fused using an adaptive depth pyramid matching model. An adaptive depth pyramid matching flow chart, as shown in fig. 3, forms histogram representation of feature representations extracted from all convolutional layers by visual bag of words, learns the optimal fusion weights between all convolutional layers from the data itself, rather than defining the fusion weights in advance, and weights the optimal fusion weights to obtain the histogram of fusion features extracted from all convolutional layers.
The fusion convolution characteristic uses the self-adaptive depth pyramid matching model in the method. Assuming a three-dimensional matrixThe matrix representing an image I1The ith layer feature mapping of (1); then, in each coordinate (i, j), 1 ≦ i ≦ nl,1≤j≤nlWhere by f1.l (i,j)Defining an image I1A p-dimensional representation of a feature of a local block. In this way we obtain image I1N of the l-th layerl×nlA local feature vector. We used the k-means method to aggregate all features into a cluster containing D centersEach feature represents f1.l (i,j)Assigned to the nearest visual wordThen, F1,lCan be expressed as a histogram representation as follows:
wherein,representation of a feature f1.l (i,j)Is thatOtherwise, the reverse is carried outRepresentation of a feature f1.l (i,j)Is not the most recent visual word ofFinally, image I1And I2The depth pyramid matching kernel of (a) is as follows:
wherein L represents the total number of convolutional layers, ωlIs the fusion weight of the l-th layer,
for remote sensing scene classification, the method also needsLabel information of the training image is taken into account. Thus, instead of using predefined values, the optimal weights ω are adaptively learned from the training data itselfl. The kernel matrix K of the training data should be close to the ideal matrix Y. Element K in the kernel matrix Ki,jIs defined as an image I1And I2The deep pyramid matches the kernel. Element Yi,jImage label y is represented by 1i=yjOn the contrary, element Yi,j0 denotes the image label yi≠yj
The objective function of the adaptive depth pyramid matching model in the invention is as follows:
wherein,represents the sum of the distances between the matrices K and Y, and hasRegularization termBy all weights ωlComposition, overfitting can be prevented.
By usingInstead of K, one can deduceWherein the element A of the matrix Ai,j=tr(Ki TKj) Element b of vector bj=tr(YTKj),c=tr(YTY). Then, the objective function can be transformed into a typical quadratic programming problem:
after the quadratic programming optimal solution omega is obtained, the depth pyramid matching kernel matrix K of the training data can be calculated.
Step 4, aggregating multi-scale classification results: and (3) the fusion features learned from each scale image are sent to a classifier to obtain a final classification result, and then a plurality of results of all scales are integrated by utilizing a majority voting strategy, so that the correct classification of the remote sensing scene image can be obtained.
The invention is realized by adopting the following specific method: and (3) the fusion features learned by each scale image are represented by a depth pyramid matching kernel matrix K, the matrix K is sent to a support vector machine classifier for classification, and the classification results from all scales are integrated by a majority voting method to realize the final classification result.
The classifier in the method of the invention is a support vector machine, and the following briefly describes the classification model of the support vector machine.
First, the basic principle and training process of the two-class SVM classifier are briefly described. Given a set of annotationsWherein xi∈Rd,yi∈{-1,1}。xiAs underlying visual feature vectors, y, of the feedback samplesiFor class labeling, the class of positive feedback samples is labeled 1, the class of negative feedback samples is labeled-1, RdIs a d-dimensional vector space on R in the real number domain. The samples are mapped into the high-dimensional space using a non-linear mapping, as follows:
Φ:Rd→F x→Φ(x) (5)
where F is the mapped high dimensional space and Φ is the corresponding mapping function. The decision function is represented in the form:
g(x)=w·Φ(x)+b (6)
accordingly, the support vector machine classification surface can be written as:
w·Φ(x)+b=0 (7)
where w is the weight vector and b is the offset constant.
Points falling on two hyperplanes w · Φ (x) + b ± + -1 are called support vectors, the distance from a support vector to a classification plane is called a classification interval, and the size isThe size of the classification interval represents the generalization capability of the classifier, so we want to maximize the interval of the classifier:
s.t.yi(w·Φ(xi)+b)≥1,i=1,…,N
and obtaining the classification surface of the support vector machine according to the solution of the formula. Solving the quadratic programming problem in the above formula by a lagrange multiplier method can obtain:
wherein x isiTo support the vector, yiAnd αiAnd respectively marking the corresponding category of the support vector and the Lagrange coefficient. The output of the sample x obtained by the two-class SVM classifier is as follows:
the kernel function is utilized to avoid the display expression of the nonlinear mapping, and the output of the image sample obtained by the two-class SVM classifier can be rewritten as follows:
wherein K (·) is a kernel function, and K (x)i,x)=Φ(xi)TΦ (x), superscript T denotes the transpose matrix. According to the above formula, for any one of the samples that is standard, if the value of f (x) is greater than 0, the class of the sample is labeled 1, and if the value of f (x) is less than 0, the class is labeled-1.
And each two-class classifier generates a classification hyperplane, and calculates the distance from the fusion features learned by each scale image to the classification hyperplane, wherein each scale image belongs to the class with the largest distance. And integrating a plurality of results of all scales by utilizing a majority voting strategy, namely correctly classifying the remote sensing image scene.
To facilitate understanding of the technical solution of the present invention, two specific examples are given below.
The first embodiment of the invention applies the technical scheme provided by the invention to the classification of 21-Class-Land-Use remote sensing data sets. The data set was manually extracted from an aerial orthophoto download of national maps from the United States Geological Survey (USGS). It includes 21 different land utilizations and land cover types including agricultural products, airplanes, baseball parks, beach, buildings, jungles, dense houses, forests, highways, golf courses, ports, intersections, medium density houses, mobile home parks, overpasses, parking lots, rivers, runways, sparse houses, oil storage tanks and tennis courts. Each class contains 100 RGB images with a spatial resolution of one foot (about 0.3 m) and an image size of 256 x 256 pixels. By using the remote sensing scene classification method based on the adaptive depth space pyramid matching, the depth features of the multi-scale images extracted from the convolutional neural network are fused and sent to the classifier, so that the classification of the remote sensing scene images is obtained.
In this embodiment, a Support Vector Machine (SVM) is selected as the classification model, and in order to verify the effectiveness of the present invention, the classification result is compared with a spatial relationship Pyramid (PSR) and a local detector (Partlets) method, respectively. Generating NxN images with different scales from a remote sensing scene image to be classified by a deformation method, sending the multi-scale image into a convolutional neural network with space pyramid pooling for training so as to extract multi-scale depth features, fusing feature representations extracted from all convolutional layers for input images of each scale by using a self-adaptive depth pyramid matching model, sending the feature representations learned from the images of each scale into a classifier to obtain a final classification result, and integrating a plurality of results of all scales by using a majority voting strategy, namely correctly classifying the remote sensing scene image.
The classification process of this embodiment is specifically as follows:
1. generating a multi-scale image:
and reserving 256 × 256 original images, generating images with the dimensions of 128 × 128, 192 × 192 and 227 × 227 by the remote sensing scene image to be classified through a deformation method, and forming a group of multi-scale image sets of the images.
2. Extracting multi-scale depth features:
to ensure the effectiveness of training the network, the network is pre-trained using 227 x 227 remote sensing scenarios as input. The data set is randomly divided into a training set and a testing set, the training set is used for fine tuning a full connection layer of the spatial pyramid pooling network, and the testing set is used for evaluating the performance of the classifier. To reduce the impact of random selection, we repeatedly performed each algorithm on ten different training/test segmented data sets. The spatial pyramid pooling network, similar to spatial pyramid matching, maps the features of the partitions to increasingly thinner sub-regions, and pools the features in each sub-region by maximal pooling. Assume that each feature map after the last convolutional layer is of size a x a and each feature map is divided into sub-regions of size n x n. The spatial pyramid pooling can be regarded as a convolution operator with a window size of a/n and a step size of a/n in a sliding window mode. Here we have chosen three levels of spatial pyramid pooling configurations, n × n being 1 × 1, 2 × 2 and 4 × 4 respectively. The final output of spatial pyramid pooling is to concatenate the three levels of pooling results into a vector, producing a fixed length representation, regardless of the size of the input image. These input images of different scales share a spatial pyramid pooling network. And then, the multi-scale image is sent to a convolutional neural network with space pyramid pooling for training so as to extract the multi-scale depth features.
3. Fusing convolution characteristics:
for each scale of the input image, the feature representations extracted from all the convolutional layers, for each pixel of the convolutional layer feature map, form a visual code using K-means. f. of1.l (i,j)A p-dimensional feature representing a local block of the ith layer of the image,represents the distance f1.l (i,j)Most recent visual word, image I1Is mapped toThe feature representations extracted from all convolutional layers are then formed into a histogram representation by visual bag of words.
Feature representations extracted from all convolutional layers are fused using an adaptive depth pyramid matching model. Image I1And I2Depth pyramid matching kernel ofL represents the total number of convolutional layers, ωlIs the fusion weight of the l-th layer,adaptive depth pyramid matching modelThe regularization term parameter λ of (3) in the type objective function, which prevents overfitting, is empirically taken to be 0.5. Learning the optimal fusion weights among all convolutional layers from the data itself, and then weighting the optimal fusion weights can obtain histograms of the fusion features extracted from all convolutional layers.
4. Aggregating the multi-scale classification results:
the deep features of the multi-scale image with cross-kernel histograms are fed into a classifier to obtain a classification result, which can be implemented using LIBSVM software package. And integrating a plurality of results of all scales by utilizing a majority voting strategy, thereby finally completing the classification of the remote sensing scene images.
In order to verify the effect of the method, the remote sensing scene classification method based on the adaptive depth space pyramid matching, which is provided by the invention, is respectively compared with a space relation Pyramid (PSR) method and a local detector (Partlets) method.
FIG. 4 is a histogram of each classification accuracy under the method of the present invention and the spatial relationship Pyramid (PSR) and local detector (Partlets) method. As can be seen from the figure, the accuracy of the classification method of the present invention achieves the highest accuracy over 15 classes compared to the other two methods. This shows that the method of the present invention can achieve higher classification accuracy.
Table 1 shows a comparison of the classification accuracy of the 3 classification methods.
TABLE 1 Classification accuracy comparison
Method of producing a composite material Accuracy of classification
PSR 89.10
Partlets 91.33
ADPM-192 92.67
ADPM-227 92.04
ADPM-256 93.52
Multi-scale ADPM 94.86
As can be seen from Table 1, the classification accuracy of the method of the present invention is significantly higher than that of the other two classification methods, especially the multi-scale method of fusing classification results, and the accuracy is improved by nearly 4% compared with the other methods. In addition, results of different scales are different, and the classification accuracy of the multi-scale method by fusing classification results is obviously higher than that of a single-scale method.
In conclusion, compared with the methods of a spatial relationship Pyramid (PSR) and a local detector (Partlets), the method of the present invention has obvious advantages in both classification performance and classification accuracy.
The second embodiment applies the technical scheme provided by the invention to the classification of 19-Class Satellite Scene remote sensing data sets. This data set consists of 19 scenes including airports, beaches, bridges, commercial areas, deserts, farmlands, football fields, forests, industrial areas, grasslands, mountains, parks, parking lots, ponds, ports, train stations, residential areas, rivers and viaducts. Each class has 50 images with a size of 600 x 600 pixels. These images were extracted from the larger satellite images using google earth software. By using the remote sensing scene classification method based on the adaptive depth space pyramid matching, the depth features of the multi-scale images extracted from the convolutional neural network are fused and sent to the classifier, so that the classification of the remote sensing scene images is obtained.
In this embodiment, a Support Vector Machine (SVM) is selected as the classification model, and in order to verify the effectiveness of the present invention, the classification result is compared with a semi-supervised projection (SSEP) method. Generating NxN images with different scales from a remote sensing scene image to be classified by a deformation method, sending the multi-scale image into a convolutional neural network with space pyramid pooling for training so as to extract multi-scale depth features, fusing feature representations extracted from all convolutional layers for input images of each scale by using a self-adaptive depth pyramid matching model, sending the feature representations learned from the images of each scale into a classifier to obtain a final classification result, and integrating a plurality of results of all scales by using a majority voting strategy, namely correctly classifying the remote sensing scene image.
The classification process of this embodiment is specifically as follows:
1. generating a multi-scale image:
and (2) reserving an original image with the size of 600 × 600, generating images with the scale sizes of 128 × 128, 192 × 192, 227 × 227, 256 × 256 and 384 × 384 by the remote sensing scene image to be classified through a deformation method, and forming a group of multi-scale image sets of the images.
2. Extracting multi-scale depth features:
to ensure the effectiveness of training the network, the network is pre-trained using the remote sensing scenario as an input. The data set is randomly divided into a training set and a testing set, the training set is used for fine tuning a full connection layer of the spatial pyramid pooling network, and the testing set is used for evaluating the performance of the classifier. To reduce the impact of random selection, we repeatedly performed each algorithm on ten different training/test segmented data sets. The spatial pyramid pooling network, similar to spatial pyramid matching, maps the features of the partitions to increasingly thinner sub-regions, and pools the features in each sub-region by maximal pooling. Assume that each feature map after the last convolutional layer is of size a x a and each feature map is divided into sub-regions of size n x n. The spatial pyramid pooling can be regarded as a convolution operator with a window size of a/n and a step size of a/n in a sliding window mode. Here we have chosen three levels of spatial pyramid pooling configurations, n × n being 1 × 1, 2 × 2 and 4 × 4 respectively. The final output of spatial pyramid pooling is to concatenate the three levels of pooling results into a vector, producing a fixed length representation, regardless of the size of the input image. These input images of different scales share a spatial pyramid pooling network. And then, the multi-scale image is sent to a convolutional neural network with space pyramid pooling for training so as to extract the multi-scale depth features.
3. Fusing convolution characteristics:
for each scale of the input image, the feature representations extracted from all the convolutional layers, for each pixel of the convolutional layer feature map, form a visual code using K-means. f. of1.l (i,j)A p-dimensional feature representing a local block of the ith layer of the image,represents the distance f1.l (i,j)Most recent visual word, image I1Is mapped toThe feature representations extracted from all convolutional layers are then formed into a histogram representation by visual bag of words.
Feature representations extracted from all convolutional layers are fused using an adaptive depth pyramid matching model. Image I1And I2Depth pyramid matching kernel ofL represents the total number of convolutional layers, ωlIs the fusion weight of the l-th layer,and (3) in the adaptive depth pyramid matching model objective function, empirically taking 0.5 as a regular term parameter lambda for preventing overfitting. Learning the optimal fusion weights among all convolutional layers from the data itself, and then weighting the optimal fusion weights can obtain histograms of the fusion features extracted from all convolutional layers.
4. Aggregating the multi-scale classification results:
the deep features of the multi-scale image with cross-kernel histograms are fed into a classifier to obtain a classification result, which can be implemented using LIBSVM software package. And integrating a plurality of results of all scales by utilizing a majority voting strategy, thereby finally completing the classification of the remote sensing scene images.
In order to verify the effect of the method, the remote sensing scene classification method based on the adaptive depth space pyramid matching and the semi-supervised projection (SSEP) method are respectively compared.
FIG. 5 is a histogram of the accuracy of each classification under the method of the present invention and the semi-supervised projection (SSEP) method. As can be seen from the figure, the accuracy of the classification method of the present invention achieves the highest accuracy on class 14 compared to the semi-supervised projection (SSEP) method. This shows that the method of the present invention can achieve higher classification accuracy.
Table 1 shows a comparison of the classification accuracy of the 3 classification methods.
TABLE 1 Classification accuracy comparison
Method of producing a composite material Accuracy of classification
SSEP 73.82
ADPM-227 82.14
ADPM-256 83.71
ADPM-384 81.91
Multi-scale ADPM 84.67
As can be seen from Table 1, the classification accuracy of the method of the present invention is significantly higher than that of a semi-supervised projection (SSEP) method, especially the accuracy of a multi-scale method with a fused classification result is improved by nearly 8% compared with other methods. In addition, results of different scales are different, and the classification accuracy of the multi-scale method by fusing classification results is obviously higher than that of a single-scale method.
In conclusion, compared with a semi-supervised ensemble projection (SSEP) method, the method of the present invention has obvious advantages in both classification performance and classification accuracy.

Claims (2)

1. A remote sensing scene classification method is characterized by comprising the following steps:
step 1), generating an image with different scales of NxN by a remote sensing scene image to be classified through a deformation method, wherein N can take a plurality of values according to the size of the image;
step 2), sending the multi-scale image into a convolutional neural network with space pyramid pooling for training so as to extract multi-scale depth features;
step 3), for the input image of each scale, fusing the feature representations extracted from all the convolutional layers by using a self-adaptive depth pyramid matching model;
and 4) sending the feature representation learned by each scale image into a classifier to obtain a final classification result, and then integrating a plurality of results of all scales by utilizing a majority voting strategy, namely correctly classifying the remote sensing scene image.
2. The remote sensing image classification method according to claim 1, characterized in that the remote sensing scene image to be classified in step 1) is generated into different scales by deformation methods, such as 128 x 128, 192 x 192, 227 x 227, 256 x 256, 384 x 384.
CN201710147637.7A 2017-03-13 2017-03-13 A kind of remote sensing scene classification method Pending CN106991382A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710147637.7A CN106991382A (en) 2017-03-13 2017-03-13 A kind of remote sensing scene classification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710147637.7A CN106991382A (en) 2017-03-13 2017-03-13 A kind of remote sensing scene classification method

Publications (1)

Publication Number Publication Date
CN106991382A true CN106991382A (en) 2017-07-28

Family

ID=59412104

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710147637.7A Pending CN106991382A (en) 2017-03-13 2017-03-13 A kind of remote sensing scene classification method

Country Status (1)

Country Link
CN (1) CN106991382A (en)

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107578003A (en) * 2017-08-29 2018-01-12 中国科学院遥感与数字地球研究所 A kind of remote sensing images transfer learning method based on GEOGRAPHICAL INDICATION image
CN107644426A (en) * 2017-10-12 2018-01-30 中国科学技术大学 Image, semantic dividing method based on pyramid pond encoding and decoding structure
CN107958219A (en) * 2017-12-06 2018-04-24 电子科技大学 Image scene classification method based on multi-model and Analysis On Multi-scale Features
CN108491757A (en) * 2018-02-05 2018-09-04 西安电子科技大学 Remote sensing image object detection method based on Analysis On Multi-scale Features study
CN108537121A (en) * 2018-03-07 2018-09-14 中国科学院西安光学精密机械研究所 Self-adaptive remote sensing scene classification method based on meteorological environment parameter and image information fusion
CN109508582A (en) * 2017-09-15 2019-03-22 中国公路工程咨询集团有限公司 The recognition methods of remote sensing image and device
CN109508580A (en) * 2017-09-15 2019-03-22 百度在线网络技术(北京)有限公司 Traffic lights recognition methods and device
CN109978071A (en) * 2019-04-03 2019-07-05 西北工业大学 Hyperspectral image classification method based on data augmentation and Multiple Classifier Fusion
CN110287962A (en) * 2019-05-20 2019-09-27 平安科技(深圳)有限公司 Remote Sensing Target extracting method, device and medium based on superobject information
CN110321866A (en) * 2019-07-09 2019-10-11 西北工业大学 Remote sensing images scene classification method based on depth characteristic Sparse Least
CN110555461A (en) * 2019-07-31 2019-12-10 中国地质大学(武汉) scene classification method and system based on multi-structure convolutional neural network feature fusion
CN110633640A (en) * 2019-08-13 2019-12-31 杭州电子科技大学 Method for identifying complex scene by optimizing PointNet
CN110717553A (en) * 2019-06-20 2020-01-21 江苏德劭信息科技有限公司 Traffic contraband identification method based on self-attenuation weight and multiple local constraints
CN110781926A (en) * 2019-09-29 2020-02-11 武汉大学 Support vector machine multi-spectral-band image analysis method based on robust auxiliary information reconstruction
CN111340750A (en) * 2018-12-18 2020-06-26 詹宝珠 Convolutional neural network analysis method and electronic device
CN111639672A (en) * 2020-04-23 2020-09-08 中国科学院空天信息创新研究院 Deep learning city functional area classification method based on majority voting
CN111860207A (en) * 2020-06-29 2020-10-30 中山大学 Multi-scale remote sensing image ground object classification method, system, device and medium
CN112016596A (en) * 2020-08-10 2020-12-01 西安科技大学 Evaluation method for farmland soil fertility based on convolutional neural network
CN112149582A (en) * 2020-09-27 2020-12-29 中国科学院空天信息创新研究院 Hyperspectral image material identification method and system
CN112766083A (en) * 2020-12-30 2021-05-07 中南民族大学 Remote sensing scene classification method and system based on multi-scale feature fusion
CN113724381A (en) * 2021-07-23 2021-11-30 广州市城市规划勘测设计研究院 Dynamic three-dimensional scene rapid reconstruction method based on high-resolution remote sensing image
CN114638272A (en) * 2022-05-19 2022-06-17 之江实验室 Identity recognition method and device based on fingertip pulse wave signals
CN114664048A (en) * 2022-05-26 2022-06-24 环球数科集团有限公司 Fire monitoring and fire early warning method based on satellite remote sensing monitoring

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103258214A (en) * 2013-04-26 2013-08-21 南京信息工程大学 Remote sensing image classification method based on image block active learning
CN103413142A (en) * 2013-07-22 2013-11-27 中国科学院遥感与数字地球研究所 Remote sensing image land utilization scene classification method based on two-dimension wavelet decomposition and visual sense bag-of-word model
CN105069481A (en) * 2015-08-19 2015-11-18 西安电子科技大学 Multi-label natural scene classification method based on spatial pyramid and sparse coding
CN105956532A (en) * 2016-04-25 2016-09-21 大连理工大学 Traffic scene classification method based on multi-scale convolution neural network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103258214A (en) * 2013-04-26 2013-08-21 南京信息工程大学 Remote sensing image classification method based on image block active learning
CN103413142A (en) * 2013-07-22 2013-11-27 中国科学院遥感与数字地球研究所 Remote sensing image land utilization scene classification method based on two-dimension wavelet decomposition and visual sense bag-of-word model
CN105069481A (en) * 2015-08-19 2015-11-18 西安电子科技大学 Multi-label natural scene classification method based on spatial pyramid and sparse coding
CN105956532A (en) * 2016-04-25 2016-09-21 大连理工大学 Traffic scene classification method based on multi-scale convolution neural network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
QINGSHAN LIU.ET.: "Adaptive Deep Pyramid Matching for Remote Sensing Scene Classification", 《IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING》 *

Cited By (39)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107578003B (en) * 2017-08-29 2020-04-14 中国科学院遥感与数字地球研究所 Remote sensing image transfer learning method based on geographic marking image
CN107578003A (en) * 2017-08-29 2018-01-12 中国科学院遥感与数字地球研究所 A kind of remote sensing images transfer learning method based on GEOGRAPHICAL INDICATION image
US11037005B2 (en) 2017-09-15 2021-06-15 Baidu Online Network Technology (Beijing) Co., Ltd. Method and apparatus for identifying traffic light
CN109508582A (en) * 2017-09-15 2019-03-22 中国公路工程咨询集团有限公司 The recognition methods of remote sensing image and device
CN109508580A (en) * 2017-09-15 2019-03-22 百度在线网络技术(北京)有限公司 Traffic lights recognition methods and device
CN109508580B (en) * 2017-09-15 2022-02-25 阿波罗智能技术(北京)有限公司 Traffic signal lamp identification method and device
CN107644426A (en) * 2017-10-12 2018-01-30 中国科学技术大学 Image, semantic dividing method based on pyramid pond encoding and decoding structure
CN107958219A (en) * 2017-12-06 2018-04-24 电子科技大学 Image scene classification method based on multi-model and Analysis On Multi-scale Features
CN108491757A (en) * 2018-02-05 2018-09-04 西安电子科技大学 Remote sensing image object detection method based on Analysis On Multi-scale Features study
CN108491757B (en) * 2018-02-05 2020-06-16 西安电子科技大学 Optical remote sensing image target detection method based on multi-scale feature learning
CN108537121A (en) * 2018-03-07 2018-09-14 中国科学院西安光学精密机械研究所 Self-adaptive remote sensing scene classification method based on meteorological environment parameter and image information fusion
CN108537121B (en) * 2018-03-07 2020-11-03 中国科学院西安光学精密机械研究所 Self-adaptive remote sensing scene classification method based on meteorological environment parameter and image information fusion
CN111340750B (en) * 2018-12-18 2023-08-08 詹宝珠 Convolutional neural network analysis method and electronic device
CN111340750A (en) * 2018-12-18 2020-06-26 詹宝珠 Convolutional neural network analysis method and electronic device
CN109978071A (en) * 2019-04-03 2019-07-05 西北工业大学 Hyperspectral image classification method based on data augmentation and Multiple Classifier Fusion
WO2020232905A1 (en) * 2019-05-20 2020-11-26 平安科技(深圳)有限公司 Superobject information-based remote sensing image target extraction method, device, electronic apparatus, and medium
CN110287962A (en) * 2019-05-20 2019-09-27 平安科技(深圳)有限公司 Remote Sensing Target extracting method, device and medium based on superobject information
CN110287962B (en) * 2019-05-20 2023-10-27 平安科技(深圳)有限公司 Remote sensing image target extraction method, device and medium based on super object information
CN110717553A (en) * 2019-06-20 2020-01-21 江苏德劭信息科技有限公司 Traffic contraband identification method based on self-attenuation weight and multiple local constraints
CN110717553B (en) * 2019-06-20 2023-08-04 江苏德劭信息科技有限公司 Traffic contraband identification method based on self-attenuation weight and multiple local constraints
CN110321866A (en) * 2019-07-09 2019-10-11 西北工业大学 Remote sensing images scene classification method based on depth characteristic Sparse Least
CN110321866B (en) * 2019-07-09 2023-03-24 西北工业大学 Remote sensing image scene classification method based on depth feature sparsification algorithm
CN110555461A (en) * 2019-07-31 2019-12-10 中国地质大学(武汉) scene classification method and system based on multi-structure convolutional neural network feature fusion
CN110633640A (en) * 2019-08-13 2019-12-31 杭州电子科技大学 Method for identifying complex scene by optimizing PointNet
CN110781926B (en) * 2019-09-29 2023-09-19 武汉大学 Multi-spectral band image analysis method of support vector machine based on robust auxiliary information reconstruction
CN110781926A (en) * 2019-09-29 2020-02-11 武汉大学 Support vector machine multi-spectral-band image analysis method based on robust auxiliary information reconstruction
CN111639672B (en) * 2020-04-23 2023-12-19 中国科学院空天信息创新研究院 Deep learning city function classification method based on majority voting
CN111639672A (en) * 2020-04-23 2020-09-08 中国科学院空天信息创新研究院 Deep learning city functional area classification method based on majority voting
CN111860207A (en) * 2020-06-29 2020-10-30 中山大学 Multi-scale remote sensing image ground object classification method, system, device and medium
CN111860207B (en) * 2020-06-29 2023-10-24 中山大学 Multi-scale remote sensing image ground object classification method, system, device and medium
CN112016596A (en) * 2020-08-10 2020-12-01 西安科技大学 Evaluation method for farmland soil fertility based on convolutional neural network
CN112016596B (en) * 2020-08-10 2024-04-09 西安科技大学 Farmland soil fertility evaluation method based on convolutional neural network
CN112149582A (en) * 2020-09-27 2020-12-29 中国科学院空天信息创新研究院 Hyperspectral image material identification method and system
CN112766083A (en) * 2020-12-30 2021-05-07 中南民族大学 Remote sensing scene classification method and system based on multi-scale feature fusion
CN112766083B (en) * 2020-12-30 2023-10-27 中南民族大学 Remote sensing scene classification method and system based on multi-scale feature fusion
CN113724381B (en) * 2021-07-23 2022-06-28 广州市城市规划勘测设计研究院 Dynamic three-dimensional scene rapid reconstruction method based on high-resolution remote sensing image
CN113724381A (en) * 2021-07-23 2021-11-30 广州市城市规划勘测设计研究院 Dynamic three-dimensional scene rapid reconstruction method based on high-resolution remote sensing image
CN114638272A (en) * 2022-05-19 2022-06-17 之江实验室 Identity recognition method and device based on fingertip pulse wave signals
CN114664048A (en) * 2022-05-26 2022-06-24 环球数科集团有限公司 Fire monitoring and fire early warning method based on satellite remote sensing monitoring

Similar Documents

Publication Publication Date Title
CN106991382A (en) A kind of remote sensing scene classification method
US10984532B2 (en) Joint deep learning for land cover and land use classification
Hua et al. Recurrently exploring class-wise attention in a hybrid convolutional and bidirectional LSTM network for multi-label aerial image classification
Yao et al. Semantic annotation of high-resolution satellite images via weakly supervised learning
Lian et al. Road extraction methods in high-resolution remote sensing images: A comprehensive review
Gong et al. Superpixel-based difference representation learning for change detection in multispectral remote sensing images
US10922589B2 (en) Object-based convolutional neural network for land use classification
Xia et al. Spectral–spatial classification for hyperspectral data using rotation forests with local feature extraction and Markov random fields
Kavzoglu Object-oriented random forest for high resolution land cover mapping using quickbird-2 imagery
Wu et al. A scene change detection framework for multi-temporal very high resolution remote sensing images
Cheng et al. Multi-class geospatial object detection and geographic image classification based on collection of part detectors
Zhao et al. Dirichlet-derived multiple topic scene classification model for high spatial resolution remote sensing imagery
Tang et al. Improving image classification with location context
Li et al. A new accuracy assessment method for one-class remote sensing classification
CN110728295B (en) Semi-supervised landform classification model training and landform graph construction method
CN104408483B (en) SAR texture image classification methods based on deep neural network
Tao et al. Scene context-driven vehicle detection in high-resolution aerial images
Abid et al. UCL: Unsupervised Curriculum Learning for water body classification from remote sensing imagery
Yee et al. DeepScene: Scene classification via convolutional neural network with spatial pyramid pooling
CN113223042B (en) Intelligent acquisition method and equipment for remote sensing image deep learning sample
CN105574545B (en) The semantic cutting method of street environment image various visual angles and device
Liu et al. Learning multi-scale deep features for high-resolution satellite image classification
Alhichri et al. Tile‐Based Semisupervised Classification of Large‐Scale VHR Remote Sensing Images
Han et al. The edge-preservation multi-classifier relearning framework for the classification of high-resolution remotely sensed imagery
Singh et al. Semantically guided geo-location and modeling in urban environments

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20170728

WD01 Invention patent application deemed withdrawn after publication