CN111368776A

CN111368776A - High-resolution remote sensing image classification method based on deep ensemble learning

Info

Publication number: CN111368776A
Application number: CN202010173481.1A
Authority: CN
Inventors: 席江波; 聂聪冲; 孙悦鸣; 姜万冬
Original assignee: Changan University
Current assignee: Changan University
Priority date: 2020-03-13
Filing date: 2020-03-13
Publication date: 2020-07-03
Anticipated expiration: 2040-03-13
Also published as: CN111368776B

Abstract

The invention discloses a high-resolution remote sensing image classification method based on deep ensemble learning, which has the following thought: firstly, using pixel brightness values as classification features to perform a full-connection network classification experiment; secondly, performing object-oriented segmentation, and extracting a convolution block by taking the center of gravity as the center to perform convolution neural network classification; cutting all the original images into image blocks, and performing one vs all multi-element classification by using a U-Net complete convolution network; and finally, training a fully-connected network on the classification results of the first three deep network-based classifiers to perform probability combination, thereby realizing better classification performance. By using the deep ensemble learning method, the classification accuracy can be improved by effectively combining information such as spectrum, space and the like, the advantages of the base classifier are integrated, and the classification accuracy better than that of a single classifier can be obtained on the single classification and the overall classification accuracy of the high-resolution remote sensing image classification.

Description

High-resolution remote sensing image classification method based on deep ensemble learning

Technical Field

The invention belongs to the technical field of remote sensing image analysis, and particularly relates to a high-resolution remote sensing image classification method based on deep ensemble learning.

Background

The remote sensing image classification goes through the process from visual interpretation to computer automatic classification, the traditional computer classification mainly comprises supervised classification and unsupervised classification based on pixels, the characteristic of the classification is mainly spectral characteristic, and the classification is carried out according to the difference of the pixel brightness values; with the improvement of the spatial resolution of the remote sensing image, the difference of the pixel brightness values of the same ground features is large, and the traditional pixel brightness value classification-based method cannot meet the classification precision required by the high-resolution remote sensing image.

Since the convolutional neural network designed by Alex Krizhevsky in 2012 plucks the canopy of the ILSVRC visual competition at one stroke, the hot tide of deep learning is raised, the application range of the convolutional neural network is wider and wider, and the accuracy is higher and higher along with the deepening of the network depth. The convolutional neural network is applied to high-spatial-resolution remote sensing image classification, automatic feature extraction is carried out by applying a convolutional layer and a pooling layer, and a weight is continuously corrected according to training loss, so that effective features suitable for a specific scene are extracted, and classification is further carried out by a classifier. Compared with the traditional computer classification, the method focuses more on the feature extraction in the early stage, and the classifier parameter optimization is not carried out to improve the classification precision.

However, for a high-resolution remote sensing image, a single classifier always has a certain problem, spectral information and spatial information of the image cannot be combined, and classification accuracy needs to be further improved.

Disclosure of Invention

Aiming at the defects of the traditional high-resolution remote sensing image classification method, the invention provides a high-resolution remote sensing image classification method based on deep ensemble learning. The method adopts three methods of pixel brightness value classification based on a full-connection network, object-oriented classification based on a convolutional neural network and remote sensing image classification based on a complete convolutional network for deep ensemble learning, effectively combines the spectrum and space information of the high-resolution remote sensing image, and improves the classification accuracy; the classification precision of single classification and overall classification of high-resolution remote sensing image classification is improved, and better classification performance is realized.

In order to achieve the technical purpose, the invention is realized by adopting the following technical scheme.

A high-resolution remote sensing image classification method based on deep ensemble learning comprises the following steps:

step 1, acquiring a training set and a test set; respectively constructing a full-connection network model based on pixel brightness value classification, a convolutional neural network model based on object-oriented classification and a full convolutional network model based on remote sensing image classification;

the samples in the training set and the test set are high-resolution remote sensing images and DSM data thereof respectively, and the DSM is a digital surface model;

step 2, training a full-connection network model based on pixel brightness value classification, a convolutional neural network model based on object-oriented classification and a full convolutional network model based on remote sensing image classification by adopting a training set respectively to obtain corresponding base classifiers correspondingly; respectively testing the three base classifiers by adopting a test set, and correspondingly outputting corresponding class probability values;

step 3, setting a fully-connected network at the output ends of the three base classifiers obtained in the step 2, taking the class probability values output by the three base classifiers as input feature data, taking the label graph as a target class, and training the fully-connected network to obtain a deep integrated learning network;

and 4, classifying the samples in the test set by adopting a deep ensemble learning network, and outputting corresponding prediction class labels.

Compared with the prior art, the invention has the following advantages:

(1) the invention can construct different base classifiers aiming at specific application, obtain good accuracy on the base classifiers, and further integrate the base classifiers through a proper combination strategy to obtain better classification accuracy than a single classifier.

(2) The method adopts three methods of pixel brightness value classification based on a full-connection network, object-oriented classification based on a convolutional neural network and remote sensing image classification based on a complete convolutional network for deep ensemble learning, effectively combines the spectrum and space information of the high-resolution remote sensing image, and improves the classification accuracy; the classification precision of single classification and overall classification of high-resolution remote sensing image classification is improved, and better classification performance is realized.

Drawings

The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.

FIG. 1 is a schematic diagram of a deep ensemble learning network structure according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a fully-connected network model based on pixel brightness value classification according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a convolutional neural network model based on object-oriented classification according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a complete convolution network model based on remote sensing image classification according to an embodiment of the invention;

FIG. 5 is a schematic diagram of a segment and a center of gravity according to an embodiment of the present invention;

fig. 6 is a schematic diagram of a result of classification using only a fully-connected network based on pixel brightness value classification according to an embodiment of the present invention, where (a) is a label graph, (b) is a corresponding predicted classification result graph, and (c) is a median-filtered classification result graph;

FIG. 7 is a diagram illustrating the results of classification using only a convolutional neural network based on object-oriented classification according to an embodiment of the present invention, where (a) is a label graph and (b) is a corresponding predicted classification result graph;

FIG. 8 is a diagram illustrating the results of classification using only a complete convolutional network based on remote sensing image classification in accordance with an embodiment of the present invention; wherein, (a) is a label graph, and (b) is a corresponding prediction classification result graph;

FIG. 9 is a schematic diagram illustrating the results of remote sensing image classification by deep ensemble learning using a deep ensemble learning network according to an embodiment of the present invention, in which (a) is a label graph and (b) is a corresponding predicted classification result graph;

Detailed Description

Referring to fig. 1, a high-resolution remote sensing image classification method based on deep ensemble learning includes the following steps:

specifically, the obtained high-resolution remote sensing image and DSM data corresponding to the high-resolution remote sensing image are used as samples to form a training set and a test set respectively, and the number of samples of the training set is not less than that of the test set.

The full-connection network model based on pixel brightness value classification is composed of N full-connection layers, wherein the first N-1 layers respectively adopt a ReLU function as an activation function, and the last layer adopts a Softmax function as an activation function; the Dropout layer is used for regularization before the last fully connected layer to prevent overfitting. In the embodiment of the invention, a 7-layer fully-connected neural network is built, and the network structure is shown in figure 2.

The convolutional neural network model based on object-oriented classification comprises a plurality of convolutional layers, a maximum pooling layer, full-link layers and output layers, wherein one maximum pooling layer is arranged behind each convolutional layer, the last pooling layer is connected with the plurality of full-link layers, and the last full-link layer is arranged behind the output layers; wherein, the activation function of each convolution layer and the full connection layer is a ReLU function, and the activation function of the output layer is a Softmax function; the dryout layer is used for regularization before the last fully connected layer.

In the embodiment of the invention, a convolutional network comprising four convolutional layers, a pooling layer and three full-connection layers is built, the number of filter templates of the four convolutional layers is 128, 96, 64 and 64 respectively, the size of the filter template is 5 x 5, the filter templates are initialized by using Henormal, activation functions are ReLU functions, each convolutional layer is followed by a maximum pooling layer, the size of the filter template of the pooling layer is 3 x 3, the step size is 1, the number of neurons of the first full-connection layer and the second full-connection layer is 128, a Dropout layer is used for regularization after the second full-connection layer, an output layer comprising 6 neurons is followed, and the output layer adopts a Softmax function as the activation function. The network structure is shown in fig. 3.

The complete convolution network model based on remote sensing image classification is composed of a plurality of binary classifiers, wherein each binary classifier comprises a convolution layer, a Dropout regularization layer, a pooling layer, and a convolution layer, a fusion layer and an upper sampling layer which are symmetrical to the convolution layer, the Dropout regularization layer and the pooling layer.

The invention uses U-Net network to train, cut 15 images and DSM and label graph into 512 x 512 size images, and then expand data, apply the architecture as shown in figure 4 to train 5 groups of binary classifiers, and then use one vs all method to get the result of multi-element classification.

2.1, preprocessing the training set to obtain a preprocessed training set;

wherein, the preprocessing comprises unbalanced sample processing, data expansion and normalization;

specifically, the unbalanced sample treatment is: in the remote sensing image classification process, if the number of samples of each class in the training data set is extremely unbalanced, the number is an important factor influencing the classification result. In actual processing, the effects of unbalanced samples can be reduced or eliminated, both in terms of data and algorithms. In the data layer, the data resampling is carried out on the small category, namely, on the basis of the original samples of the small category, one of the value spaces of each attribute feature of all the samples under the category is randomly selected to form a new sample, so that the sample data of the small category is increased. Or data down-sampling large classes, i.e. randomly discarding a fraction of samples to keep the class number balanced, but this approach affects the generalization ability of the model. In the aspect of algorithm, the number of samples of each category is counted firstly, then corresponding weight coefficients are set, different sampling proportions are tried for each category, and therefore penalty cost of misclassification of the samples of the small categories is reduced.

The data expansion is as follows: horizontally turning (mirroring) or randomly rotating the original images in the training set; wherein, the original data volume can be doubled by horizontal turning; the random rotation is to designate an angle, so that the original image randomly generates the number of angles of the required expansion multiple within the given angle, and rotates.

Normalization: for color images of three wave bands of near infrared, red and green of QB (Quickbird) remote sensing satellite data, the gray values are all between 0 and 255, so the normalization is to simply scale the gray values from 0 to 255 to 0 to 1; for aviation images, color images comprising three bands of near infrared, red and green and DSMs with one band are adopted, and the normalization method of the three bands of near infrared, red and green is as above; for DSM data, firstly counting all values, finding out a minimum value and a value interval, subtracting the minimum value from each pixel value, and dividing the value number of the value interval; finally, all pixel values are normalized to be between 0 and 1. For example: by counting the values, the values are all found to be between 200 and 300, so that the minimum value 200 is subtracted from all the pixel values, and then the pixel values are divided by the value interval 100, thereby classifying all the pixel values to be between 0 and 1.

2.2, training the full-connection network model based on pixel brightness value classification by adopting the training set after preprocessing, wherein the specific process is as follows:

taking each pixel in each sample in the preprocessed training set as an input unit, and forming an input feature vector by the brightness values of each pixel after the normalization of four wave bands, namely near infrared, red, green and DSM; and updating the optimized network parameters by using a random gradient descent method until all samples in the preprocessed training set are input, and obtaining the base classifier corresponding to the fully-connected network.

The method comprises the following steps of testing a base classifier corresponding to the full-connection network by adopting a test set, and specifically: taking each original image in the test set as a test sample, forming an input feature vector by pixel brightness values of four wave bands of near infrared, red, green and DSM of each pixel of one test sample, and correspondingly outputting a corresponding class probability value;

in the method using pixels as the classification unit in this embodiment, because of intra-class heterogeneity, many isolated pixels may appear as a result, so that the classification result is relatively broken, and therefore, the classification result is smoothed by using median filtering, and the smoothing window size is 9 × 9.

Illustratively, the training data is the feature row vector of the first 10 images of the 15 images, and the test data is the feature row vector of the last 5 images.

Model training and testing: in the training phase, the Batch _ size is set to 256, the epoch is 50, the gradient descent is performed using the stochastic gradient descent method, and the learning rate is 10^-4Learning rate decay is set to 10^-6Finally, training was performed on 10 images and testing was performed on another image. The results are shown in FIG. 6 and Table 1;

the network classification precision evaluation of the invention mainly compares the classification result with the reference label graph to quantitatively measure the accuracy of the classification result. The commonly used evaluation indexes mainly comprise overall precision, accuracy, F1 score and the like; the method takes the F1 score and the accuracy as indexes to evaluate the classification precision.

The F1 score is the harmonic mean of precision and recall, defined as:

wherein Precision represents the Precision rate,

recall represents the Recall rate of the call,

TP is true positive, i.e., the prediction is positive, and the label is also positive; FP was false positive, i.e. prediction was positive and signature was negative; FN is false negative, i.e. predicted negative, and the tag is positive.

ACC denotes the accuracy (accuracuracy), which is defined as:

wherein TP is true positive, i.e., the prediction is positive, and the label is also positive; FP was false positive, i.e. prediction was positive and signature was negative; TN is true negative, i.e. predicted negative, the label is also negative; FN is false negative, i.e. predicted negative, and the tag is positive.

TABLE 1 Pixel luminance based classification results for fully connected networks

As can be seen from fig. 6 and table 1, the fully-connected network model can accurately identify the outline of the building, but the chimney or skylight on the roof of the building can be mistakenly classified into the automobile category due to the large intra-category difference; because the number of the automobile types is basically balanced with large types such as buildings, impervious surfaces and the like after the automobile types are expanded, and because of the diversity of the automobile colors, a plurality of impervious surfaces are wrongly classified into automobiles (the accuracy of the automobiles in the corresponding table 1 is low); the confusion between the low vegetation and the tree is serious, because the spectral characteristics of the low vegetation and the tree are similar, DSM assistance is required to distinguish the trees from each other, but because of the difference of the land surface height in a large range, the DSM value of the tree in the overall range is not always larger than that of the low vegetation; the three types of buildings, impervious surfaces and automobiles are basically not mixed with the two types of low vegetation and trees.

2.2, training the convolutional neural network model based on object-oriented classification by adopting the preprocessed training set, wherein the specific process is as follows:

2.2.1, performing multi-scale segmentation on the label graph of the preprocessed sample in the training set to correspondingly obtain a plurality of label graph segmentation blocks; wherein, the gray value in each partition block is the same; according to the embodiment of the invention, eCoginization software is adopted to carry out multi-scale segmentation on the tag graph, the segmentation result is derived into a tif grid graph, the segmentation scale parameter is 20, the shape index is 0.4, and the compactness is 0.5.

2.2.2, extracting the gravity center of each label graph segmentation block; wherein, the abscissa of the gravity center is the sum of the abscissas of all pixels in the segmentation block, and then the sum is divided by the number of the pixels in the segmentation block, and the result is rounded downwards; the vertical coordinate of the gravity center is the sum of the vertical coordinates of all pixels in the segmentation block, and then the sum is divided by the number of the pixels in the segmentation block, and the result is rounded downwards. The results of extracting the center of gravity of the different-sized blocks are shown in fig. 5, in which each hatched portion is a block and black squares are the extracted center of gravity. As can be seen from the figure, the barycentric coordinates of the different sized segments are different.

2.2.3, taking an image block with the size of a × b by taking the gravity center of each label map partition block as the center to obtain a plurality of image blocks of a × b × k, namely extracting a plurality of rolling blocks;

where k is the number of channels in the convolutional layer.

2.2.4, training a convolutional neural network by taking each image block as input data, and performing gradient descent by using a RMSProp (root mean square transfer) optimization method to optimize parameters of the convolutional neural network.

The method comprises the following steps of testing a base classifier corresponding to the convolutional neural network by adopting a test set, and specifically comprises the following steps: carrying out multi-scale segmentation on the original test image in the test set to obtain a plurality of test image segmentation blocks; performing gravity center extraction and convolution block extraction on the test data to obtain input data of the network test; inputting the input data of the network test into the trained base classifier, and outputting the probability value of the corresponding class.

Illustratively, in the embodiment of the invention, training is carried out on a 10-image segmentation block, and testing is carried out on another image, the size of the volume block is 32 × 32 × 4, the training stage is that the Batch _ size is set to be 12, the epoch is 100, gradient descent is carried out by using a RMSProp optimizer method, and the learning rate is 10^-3Learning rate decay is set to 10^-5. The test results are shown in fig. 7 and table 2;

TABLE 2 classification results based on convolutional neural networks for object-oriented classification

As can be seen from fig. 7 and table 2, in the convolutional neural network based on object-oriented classification, the confusion of the skylight of the roof of the building and the car class is greatly reduced, because the spatial information is applied to the classification characteristics, but like the building in the lower left part of the middle of the image, the skylight of the roof is still mistakenly classified into the car class due to the large area of the skylight; in the method, because the number of the expanded automobile types is less compared with the large types, the condition that the building is wrongly divided into the automobiles is greatly reduced, and meanwhile, the divided automobiles are also reduced; confusion between low vegetation and trees is not improved in this method; but compared with the method of taking blocks by a sliding window, the method reduces the confusion condition of class boundaries and reduces the calculation redundancy by the method of object-oriented segmentation.

2.3, training the complete convolution network model based on remote sensing image classification by adopting the training set after preprocessing, wherein the specific process is as follows:

and 2.3.1, cutting the preprocessed images in the training set according to a certain size to obtain image blocks, wherein for example, the training images are completely cut into 512 × 512 image blocks, and different partition block sizes can be designed according to different video memory capacities.

Step 2.3.2, setting a corresponding number of binary classification models according to the classes of the label graphs corresponding to the samples in the preprocessed training set, wherein one label graph class corresponds to one binary classifier; converting each label graph into a corresponding binary label graph; and training a corresponding binary classifier for each binary label graph to obtain corresponding network parameters which respectively correspond to 5 categories, and training 5U-Net networks to obtain classification parameters for different categories.

The binary label graph only comprises one target class, and the rest classes are backgrounds; the binary classifier is a U-Net network;

the method comprises the following steps of testing a base classifier corresponding to the complete convolution network by adopting a test set, specifically: cutting an original test image in the test set to obtain a plurality of image blocks, namely input data of a base classifier corresponding to the complete convolution network; and inputting the input data into the base classifier corresponding to the trained complete convolution network, and outputting a corresponding class probability value.

Illustratively, the original label graph has 5 categories: the method comprises the steps of converting an original label graph into 5 binary label graphs which respectively only comprise one class and other classes become backgrounds, respectively corresponding to the 5 classes, training 5U-Net networks to obtain classification parameters aiming at different classes, wherein the 5 binary label graphs are used for training the 5U-Net networks. The network design is shown in FIG. 3; on the basis of an original network architecture, a regularization layer is added into a down sampling path to improve the generalization capability of the model. The test results are shown in fig. 8 and table 3.

Table 3 shows the classification results of the complete convolution network based on remote sensing image classification

As can be seen from fig. 8 and table 3, in the classification method based on the complete convolutional network, the classification accuracy of each category is improved except for the vehicle category, because a binary classification mode is adopted in the classification process, and a classifier is respectively constructed for each category, the classification method is more pertinent; however, for the automobile category, data expansion is not considered, and the number of the negative samples is much larger than that of the positive samples in the binary classification process, so that the automobile category has almost no correctly classified pixels in the classification result.

specifically, as shown in fig. 1, a fully-connected network is established at the output ends of three basis classifiers, and the class probability output by the three basis classifiers and discriminated for each pixel is used as a training feature, so that a fully-connected network is trained on a test image by using a prediction probability and a label graph.

And (3) testing the test images in the test set by adopting the deep ensemble learning network obtained by training in the step (3) to obtain a prediction class label. Therefore, the advantages of each method are integrated to obtain the final classification result. The integrated prediction results of the present invention are shown in fig. 9 and table 4.

Table 4 shows the classification results of the remote sensing images based on the deep ensemble learning network

As can be seen from fig. 9 and table 4, for both low vegetation and high vegetation, the base class classifier before the extraction has a good extraction effect, and the classification accuracy after the integration is not greatly improved. However, the method integrates the classification advantages of two methods based on pixels and object-oriented methods for vehicles and the classification advantages of a complete convolution network for buildings, has no obvious short board on the classification precision of each small category, and improves the overall classification precision to a certain extent. The problem of confusion between building rooftops and vehicles, which arises in both pixel-based and object-oriented approaches, is substantially eliminated, while the problem of confusion between portions of the building and the impervious surface is ameliorated.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention; thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A high-resolution remote sensing image classification method based on deep ensemble learning is characterized by comprising the following steps:

2. The deep ensemble learning-based high-resolution remote sensing image classification method according to claim 1, wherein the fully-connected network model based on pixel brightness value classification is composed of N fully-connected layers, wherein the first N-1 layers respectively adopt a ReLU function as an activation function, and the last layer adopts a Softmax function as an activation function; the Dropout layer is used for regularization before the last fully connected layer to prevent overfitting.

3. The method for classifying high-resolution remote sensing images based on deep ensemble learning according to claim 1, wherein the convolutional neural network model based on object-oriented classification comprises a plurality of convolutional layers, a maximum pooling layer, fully-connected layers and output layers, wherein one maximum pooling layer is arranged behind each convolutional layer, the plurality of fully-connected layers are connected behind the last pooling layer, and the output layer is arranged behind the last fully-connected layer; the activation function of each convolution layer and the activation function of the full connection layer are respectively a ReLU function, and the activation function of the output layer is a Softmax function; the dryout layer is used for regularization before the last fully connected layer.

4. The method for classifying high-resolution remote sensing images based on deep ensemble learning according to claim 1, wherein the complete convolution network model based on remote sensing image classification is composed of a plurality of binary classifiers, and each binary classifier comprises a convolution layer, a Dropout regularization layer, a pooling layer, and a convolution layer, a fusion layer and an upsampling layer which are symmetrical with the convolution layer, the Dropout regularization layer and the pooling layer.

5. The method for classifying high-resolution remote sensing images based on deep ensemble learning according to claim 1, wherein the step 2 comprises the following substeps:

2.1, preprocessing the training set to obtain a preprocessed training set;

wherein the preprocessing comprises unbalanced sample processing, data expansion and normalization;

2.2, training a full-connection network model based on pixel brightness value classification by adopting the preprocessed training set to obtain a base classifier corresponding to the full-connection network; testing the base classifier corresponding to the full-connection network by adopting a test set, and outputting corresponding class probability;

2.3, training the convolutional neural network model based on the object-oriented classification by adopting the preprocessed training set to obtain a base classifier corresponding to the convolutional neural network; testing a base classifier corresponding to the convolutional neural network by adopting a test set, and outputting a corresponding class probability value;

2.4, training a complete convolution network model based on remote sensing image classification by adopting the preprocessed training set to obtain a base classifier corresponding to the complete convolution network; and testing the base classifier corresponding to the complete convolution network by adopting the test set, and outputting a corresponding class probability value.

6. The high-resolution remote sensing image classification method based on deep ensemble learning according to claim 5, wherein the unbalanced sample processing specifically comprises: resampling data of the small class samples, downsampling the large class data or adopting different sampling proportions for different classes;

the data expansion is as follows: horizontally turning or randomly rotating the original images in the training set;

the normalization is as follows: for the color images of the near infrared band, the red band and the green band of the remote sensing satellite data, the gray value is scaled from 0-255 to 0-1;

for aviation images, the aviation images comprise color images of three wave bands of near infrared, red and green and DSM data of one wave band, wherein the color images of the three wave bands of the near infrared, the red and the green are scaled from a gray value of 0-255 to a gray value of 0-1; for DSM data, firstly counting all values, finding out a minimum value and a value interval, subtracting the minimum value from each pixel value, and dividing the value number of the value interval; finally, all pixel values are normalized to be between 0 and 1.

7. The method for classifying high-resolution remote sensing images based on deep ensemble learning according to claim 5, wherein the fully-connected network model based on pixel brightness value classification is trained by adopting a preprocessed training set in the specific process;

8. The deep ensemble learning-based high-resolution remote sensing image classification method according to claim 5, wherein the training of the convolutional neural network model based on object-oriented classification is performed by using a preprocessed training set, and the specific process is as follows:

2.2.1, performing multi-scale segmentation on the label graph of the preprocessed sample in the training set to correspondingly obtain a plurality of label graph segmentation blocks; wherein, the gray value of the pixel in each partition block is the same;

2.2.2, extracting the gravity center of each label graph segmentation block; wherein, the abscissa of the gravity center is the sum of the abscissas of all pixels in the segmentation block, and then the sum is divided by the number of the pixels in the segmentation block, and the obtained result is rounded downwards; the vertical coordinate of the gravity center is the sum of the vertical coordinates of all pixels in the segmentation block, and then the sum is divided by the number of the pixels in the segmentation block, and the obtained result is rounded downwards;

wherein k is the number of channels of the convolutional layer;

2.2.4, training a convolutional neural network by taking each image block as input data, and performing gradient descent by using a root-mean-square transfer optimization method to optimize parameters of the convolutional neural network.

9. The deep ensemble learning-based high-resolution remote sensing image classification method according to claim 5, wherein the training of the complete convolution network model based on remote sensing image classification is performed by using a preprocessed training set, and the specific process is as follows:

step 2.3.1, cutting the preprocessed images in the training set according to a preset size to obtain image blocks;

step 2.3.2, setting a corresponding number of binary classification models according to the classes of the label graphs corresponding to the samples in the preprocessed training set, wherein one label graph class corresponds to one binary classifier; converting each label graph into a corresponding binary label graph;

the binary label graph only comprises one target class, and the rest classes are backgrounds; the binary classifier is a U-Net network.