CN112990365A

CN112990365A - Training method of deep learning model for semantic segmentation of remote sensing image

Info

Publication number: CN112990365A
Application number: CN202110434453.5A
Authority: CN
Inventors: 吴敦; 高广; 孙华; 应良中; 费佳宁; 卞婷玉
Original assignee: Baolue Technology Zhejiang Co ltd
Current assignee: Baolue Technology Zhejiang Co ltd
Priority date: 2021-04-22
Filing date: 2021-04-22
Publication date: 2021-06-18
Anticipated expiration: 2041-04-22
Also published as: CN112990365B

Abstract

The invention relates to a training method of a deep learning model for semantic segmentation of remote sensing images, which comprises the following steps: the method includes the steps of cutting remote sensing images and corresponding labels, generating a training sample set, constructing a deep learning model framework, training a sample filtering model on the training sample set, predicting samples by using the sample filtering model to obtain a prediction result, comparing the prediction result with the corresponding labels of the samples, selecting high-quality samples to form a high-quality sample set, alternately training the training sample set and the high-quality sample set to obtain a plurality of trained prediction models, forming a model pool, obtaining any target remote sensing image, inputting the target remote sensing image into the model pool, and predicting elements in the target remote sensing image by using the model pool.

Description

Training method of deep learning model for semantic segmentation of remote sensing image

Technical Field

The invention relates to the technical field of remote sensing image interpretation, in particular to a training method of a deep learning model for remote sensing image semantic segmentation.

Background

Remote sensing images are an important data source for natural resource investigation, monitoring and management. The remote sensing image semantic segmentation based on deep learning can greatly improve the precision and efficiency of remote sensing interpretation, and can provide more advanced technical means for improving the natural resource investigation, supervision and decision making capability and establishing an efficient and perfect natural resource remote sensing monitoring service system.

Compared with the traditional semantic segmentation methods such as texture element forests and random forests, the existing semantic segmentation method based on deep learning can obtain high-precision, high-efficiency and automatic extraction results. However, the predictive effect of deep learning models often correlates well with the sample quality provided and the parameters and optimizations during training. On one hand, a deep learning algorithm engineer needs to provide a sample subjected to refined labeling for model training, but the remote sensing image has the characteristics of large information amount, large pixel relevance, difficulty in judgment and the like, so that a labeling result aiming at the remote sensing image is easily influenced by the subjectivity of a labeling person, and therefore, the cost for labeling the refined sample of the remote sensing image is high and is difficult to obtain; in order to enable the model to achieve better precision and generalization capability in the training process, an algorithm engineer needs to track the training progress in the whole course to adjust parameters in time so that the neural network can be converged and stop training in time under the condition of not wasting resources excessively, and the training process of the model is time-consuming and labor-consuming; on the last hand, when a model is used for predicting a target image, if target image data and sample data are not in a data field, the generalization capability of a single model is limited, so that the prediction result is poor.

Disclosure of Invention

The invention aims to solve the technical problem of providing a training method of a deep learning model for remote sensing image semantic segmentation, which can simplify the building process of the deep learning model, accelerate the training speed of the deep learning model, increase the generalization capability of the model and reduce the parameter adjustment work of a deep learning algorithm engineer in the training process, and is particularly suitable for the condition of low labeling quality of a remote sensing image training sample.

The technical scheme adopted by the invention is that a training method of a deep learning model for remote sensing image semantic segmentation comprises the following steps:

s1, cutting the remote sensing image and the corresponding label to generate a training sample set;

s2, constructing a deep learning model architecture based on the encoder-decoder;

s3, training a sample filtering model on the training sample set through the deep learning model architecture;

s4, predicting the samples in the training sample set by using the sample filtering model to obtain the prediction results of the samples in the training sample set, comparing the prediction results of the samples in the training sample set with the labels corresponding to the samples in the training sample set, selecting high-quality samples with high labeling quality in the training sample set, and forming a high-quality sample set;

s5, alternately training the training sample set and the high-quality sample set by using the deep learning model architecture, and obtaining a trained prediction model by using a specific learning rate adjusting scheme and an optimizer;

s6, repeating the steps S4-S5 to obtain a plurality of trained prediction models, and forming a model pool by the trained prediction models;

and S7, acquiring any one target remote sensing image, inputting the target remote sensing image into the model pool, predicting elements in the target remote sensing image by using a model parallel integration mode based on the model pool, and obtaining a binary image result of the target remote sensing image by adopting an averaging method.

The invention has the beneficial effects that: the training method of the deep learning model for the remote sensing image semantic segmentation simplifies the building process of the deep learning model, accelerates the training speed of the deep learning model, increases the generalization capability of the model, reduces the parameter adjustment work of a deep learning algorithm engineer in the training process, and is particularly suitable for the condition of low labeling quality of a remote sensing image training sample.

Preferably, in step S2, the encoder-decoder based deep learning model architecture is composed of an encoder and a decoder. The encoder is a pre-training model obtained by training on a training sample set with a plurality of samples, so that the generalization on model migration is better. The model migration is one of migration learning, and can apply the weight learned by the pre-training model in the original data domain to the target domain data with small data volume, so as to realize weight initialization and accelerate the training speed of the model on the target domain data.

The decoder is a convolutional neural network, and can generate a high-dimensional output sequence by using the features extracted by the encoder so as to obtain a binary image result.

Preferably, in step S3, several data enhancement schemes are used in the process of training the sample filtering model, so that the generalization capability of the model can be improved and the model overfitting can be avoided.

Data enhancement schemes include perimeter mirror filling, random cropping, random brightness and contrast, random gamma, random information loss, motion blur, median blur, gaussian noise, horizontal flip, vertical flip, and random scaling rotation.

Preferably, in step S4, the selection of the high-quality samples is performed according to the intersection ratio between the samples in the training sample set and the corresponding prediction results, and one high-quality sample must satisfy that the intersection ratio between the samples in the training sample set and the corresponding prediction results is greater than a predetermined threshold.

Preferably, in step S5, several data enhancement schemes are used in the process of alternately training the training sample set and the high-quality sample set by using the deep learning model architecture, and the generalization capability of the model can be improved by using the data enhancement schemes, so as to avoid overfitting of the model.

Preferably, in step S5, the learning rate adjustment scheme is a cosine annealing-based learning rate adjustment scheme, and the specific steps of using the learning rate adjustment scheme and the optimizer to obtain the trained prediction model include:

SA1, setting a maximum learning rate and a minimum learning rate, and carrying out learning for 1.5 cycles by taking 13 cycles as a half cycle, wherein 39 cycles are total;

SA2, adjusting the maximum learning rate to be one fifth of the maximum value of the learning rate, and training by using a linear preheating learning rate adjustment mode for 1 round in total;

SA3. training half a cycle using the learning rate adjustment scheme of cosine annealing for 13 rounds.

The optimizer is a Lookahead optimizer, the internal optimizer of the optimizer is Adam, the optimizer enables the model to be converged faster, the generalization performance is better, and robustness to super-parameter change is stronger.

Preferably, in step S5, the specific step of alternately training the training sample set and the high-quality sample set by using the deep learning model architecture includes:

training a high-quality sample set and applying a data enhancement scheme of random cropping, wherein the cropping size is 384 × 384, and repeating for 5 rounds;

training a training sample set and applying a data enhancement scheme of random cropping, wherein the cropping size is 384 × 384, and repeating for 2 rounds;

training a high-quality sample set and applying a data enhancement scheme of random cropping, wherein the cropping size is 480 × 480, and repeating for 2 rounds;

training a training sample set and applying a data enhancement scheme of random cropping, wherein the cropping size is 512 x 512, and repeating for 2 rounds;

training a training sample set and applying a data enhancement scheme of random cropping, wherein the cropping size is 480 × 480, and repeating for 1 round;

training a high-quality sample set and applying a data enhancement scheme of random cropping, wherein the cropping size is 512 x 512, and repeating for 1 round;

the data enhancement scheme of alternating training and random cutting can effectively enhance the generalization capability of the model and accelerate the model

The training speed.

Preferably, in step S7, the averaging method averages the representation of each model in the model pool on the target remote sensing image, and can reduce the bias of the prediction results of the individual prediction models, thereby improving the intersection ratio of the models integrated in parallel on the target remote sensing image.

Drawings

FIG. 1 is a flow chart of a training method of a deep learning model for semantic segmentation of remote sensing images according to the invention;

FIG. 2 is a schematic structural diagram of a deep learning model architecture based on an encoder-decoder according to the present invention;

FIG. 3 is a diagram illustrating the learning rate of the learning rate adjustment scheme of the present invention;

FIG. 4 is a schematic diagram of a training method for alternately training a training sample set and a high-quality sample set by using a deep learning model architecture according to the present invention.

Detailed Description

The invention is further described below with reference to the accompanying drawings in combination with specific embodiments so that those skilled in the art can practice the invention with reference to the description, and the scope of the invention is not limited to the specific embodiments.

It will be understood by those skilled in the art that in the present disclosure, the terms "longitudinal," "lateral," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like are used in an orientation or positional relationship indicated in the drawings for ease of description and simplicity of description, and do not indicate or imply that the referenced device or element must have a particular orientation, be constructed and operated in a particular orientation, and thus, the above terms should not be construed as limiting the present invention.

The invention relates to a training method of a deep learning model for semantic segmentation of remote sensing images, which comprises the following steps of:

s1, cutting the remote sensing image and a corresponding label to generate a training sample set;

s2, constructing a deep learning model architecture based on a coder-decoder;

s3, training a sample filtering model on the training sample set obtained in the step S1 by adopting the deep learning model architecture in the step S2;

s4, predicting the samples in the training sample set in the step S1 by using the sample filtering model in the step S3 to obtain a prediction result of the samples in the training sample set, comparing the prediction result of the samples in the training sample set with corresponding labels of the samples in the training sample set, selecting high-quality samples with high labeling quality in the training sample set, and forming a high-quality sample set;

s5, alternately training the training sample set in the step S1 and the high-quality sample set in the step S4 by using the deep learning model architecture in the step S2, and obtaining a trained prediction model by using a learning rate adjusting scheme and an optimizer;

s7, obtaining any one target remote sensing image, predicting elements in the target remote sensing image by using a model parallel integration mode based on a model pool, and obtaining a binary image result of the target remote sensing image by adopting an averaging method.

In step S2, the deep learning model architecture based on the encoder-decoder is composed of an encoder and a decoder, the encoder is a pre-training model trained on a data set with a large number of samples, and thus has good generalization in model migration. The model migration is one of the migration learning, and can apply the weight learned by the pre-training model in the original data domain to the target domain data with smaller data volume, so as to realize weight initialization and accelerate the training speed of the model on the target domain data.

In step S3, a large number of data enhancement schemes are used in training the sample filtering model, so that the generalization capability of the model can be improved and the model overfitting can be avoided.

Data enhancement schemes include perimeter mirror filling, random cropping, random sharpening, random brightness and contrast, random gamma, random information loss, motion blur, median blur, gaussian noise, horizontal flip, vertical flip, and random scaling rotation.

In step S4, the selection of the high-quality samples is performed based on the intersection ratio between the samples in the training sample set and the corresponding prediction results, and one high-quality sample must satisfy the intersection ratio greater than a predetermined threshold.

In step S5, a data enhancement scheme is used in the process of using the deep learning model architecture for alternate training, and the generalization capability of the model can be improved and the overfitting of the model can be avoided by using the data enhancement scheme.

The optimizer is a Lookahead optimizer, the internal optimizer is Adam, the optimizer enables the model to be converged faster, has better generalization performance, and is stronger in robustness to super-parameter change.

Preferably, in step S5, the step of alternately training the training sample set and the high-quality sample set using the deep learning model architecture includes:

training the quality sample set and applying a data enhancement scheme of random cropping with a crop size of 512 x 512, repeating 1 round for a total of 13 rounds for a half cycle.

The alternating training and random cropping data enhancement scheme can effectively enhance the generalization capability of the model and accelerate the training speed of the model.

In step S7, the averaging method averages the representation of each model in the model pool on the target remote sensing image, and can reduce the bias of the prediction result of a single prediction model, thereby improving the intersection ratio of the models after parallel integration on the target remote sensing image.

In the following embodiments, identification is performed for an element water body in a specific remote sensing image, a data source is high-resolution remote sensing image data with a resolution of 0.5m, and a format is an RGB three-channel image. The specific implementation steps for extracting the water body by using the method disclosed by the invention are as follows:

(1) dividing a remote sensing image with the resolution of 10000 x 10000 into a plurality of sample pictures with the resolution of 492 x 492 in a JPG format, converting labels corresponding to the remote sensing image into a single-channel binary image, cutting the single-channel binary image into label pictures with the resolution of 492 x 492 in the JPG format, and requiring the label pictures to be aligned with pixels of the sample pictures at corresponding positions;

(2) as shown in fig. 2, an encoder-decoder based deep learning model architecture is constructed; preferably, a pre-training model of EfficientNet-b4 on ImageNet is used as an encoder; preferably selecting a LinkNet convolution neural network as a decoder;

(3) training a sample filtering model on the training sample set through a deep learning model architecture; in the training process, a Lookahead optimizer is adopted, and a learning rate adjustment scheme shown in FIG. 3 is adopted to train for a period, wherein the optimal maximum learning rate is 0.002, the minimum learning rate is 0.0001, and 26 rounds of training are performed in total; applying a data enhancement scheme specifying the probability of occurrence in each round, a preferred data enhancement scheme comprising:

(3-1) selecting one of horizontal turning and vertical turning with the probability of 100%;

(3-2) random gaussian noise with a probability of 20%;

(3-3) selecting one of random brightness, contrast and random gamma, wherein the probability is 100%;

(3-4) selecting one of random sharpening, Gaussian blur and dynamic blur, wherein the probability is 100%;

(3-5) random information is lost, and the probability is 20%;

(3-6) randomly rotating and scaling, wherein the probability is 100%;

(4) predicting the training sample set by using a sample filtering model, and selecting samples with the intersection ratio of more than 80% in the training sample set to be added into the high-quality sample set by comparing the prediction results of the samples in the training sample set with the corresponding labels of the samples in the training sample set;

(5) alternately training on a training sample set and a high-quality sample set by using a deep learning model architecture, wherein an alternate training scheme is shown in fig. 4, and the method specifically comprises the following steps:

(5-1) training a good quality sample set and applying a data enhancement scheme of random cropping, preferably 384 x 384 in crop size, 16 in batch size, repeating 5 rounds;

(5-2) training the training sample set and applying a data enhancement scheme of random cropping, the crop size preferably being 384 x 384, the batch size being 16, repeating 2 rounds;

(5-3) training a high-quality sample set and applying a data enhancement scheme of random cropping, wherein the cropping size is preferably 480 x 480, the batch size is 16, and repeating for 2 rounds;

(5-4) training a training sample set and applying a data enhancement scheme of random cropping, wherein the cropping size is preferably 512 by 512, the batch size is 8, and repeating for 2 rounds;

(5-5) training a high-quality sample set and applying a data enhancement scheme of random cropping, wherein the cropping size is preferably 480 x 480, the batch size is 8, and repeating for 1 round;

(5-6) training a training sample set and applying a data enhancement scheme of random cropping, wherein the cropping size is preferably 512 by 512, the batch size is 8, and 1 round is repeated; a total of 13 rounds is a half training period;

(6) the learning rate adjustment scheme shown in fig. 3 is used in the alternate training process, and the specific steps include:

(6-1) preferably, the maximum learning rate is 0.002 and the minimum learning rate is 0.0001, and 13 rounds are taken as a half training period; preferably 1.5 cycles of learning, 39 cycles total;

(6-2) adjusting the maximum learning rate, preferably 0.0004, and training for 1 round by using a linear preheating learning rate adjusting mode;

(6-3) training for a half period by using a learning rate adjustment scheme of cosine annealing for 13 rounds; simultaneously, using a Lookahead as an optimizer, and obtaining a trained prediction model after 53 rounds of training;

(7) repeating the steps (4) to (6) to obtain 3 trained prediction models to form a model pool;

(8) and predicting the target remote sensing image by using a model parallel integration mode based on the model pool and obtaining a prediction result of the water body by adopting an averaging method.

In fig. 4, the training set is a training sample set input by the model, where a is a good-quality sample set, B is a training sample set, the image size is the sample size after applying random cropping, and the batch size is the number of sample pictures required in model training.

Claims

1. A training method of a deep learning model for semantic segmentation of remote sensing images is characterized by comprising the following steps: the method comprises the following steps:

s2, constructing a deep learning model architecture based on a coder-decoder;

s4, predicting the samples in the training sample set by using the sample filtering model to obtain a prediction result of the samples in the training sample set, comparing the prediction result of the samples in the training sample set with labels corresponding to the samples in the training sample set, selecting high-quality samples with high labeling quality in the training sample set, and forming a high-quality sample set;

s5, alternately training the training sample set and the high-quality sample set by using the deep learning model architecture, and obtaining a trained prediction model by using a learning rate adjusting scheme and an optimizer;

s7, obtaining any one target remote sensing image, predicting elements in the target remote sensing image by using a model parallel integration mode based on the model pool, and obtaining a binary image result of the target remote sensing image by adopting an averaging method.

2. The training method of the deep learning model for the remote sensing image semantic segmentation according to claim 1, characterized by comprising the following steps: in step S2, the encoder-decoder based deep learning model architecture is composed of an encoder and a decoder, the encoder is a pre-training model trained on a training sample set having a plurality of samples, and the decoder is a convolutional neural network.

3. The training method of the deep learning model for the remote sensing image semantic segmentation according to claim 1, characterized by comprising the following steps: in step S3, several data enhancement schemes are used in training the sample filtering model.

4. The training method of the deep learning model for the remote sensing image semantic segmentation according to claim 1, characterized by comprising the following steps: in step S5, several data enhancement schemes are used in the process of alternately training the training sample set and the quality sample set using the deep learning model architecture.

5. The training method of the deep learning model for the semantic segmentation of the remote sensing images as claimed in claim 3 or 4, characterized in that: the data enhancement scheme includes perimeter mirror filling, random cropping, random sharpening, random brightness and contrast, random gamma, random information loss, motion blur, median blur, gaussian noise, horizontal flip, vertical flip, and random scaling rotation.

6. The training method of the deep learning model for the remote sensing image semantic segmentation according to claim 1, characterized by comprising the following steps: in step S4, the selection of the high-quality samples is performed according to the intersection ratio between the samples in the training sample set and the corresponding prediction results, and a high-quality sample must satisfy that the intersection ratio between the samples in the training sample set and the corresponding prediction results is greater than a predetermined threshold.

7. The training method of the deep learning model for the remote sensing image semantic segmentation, according to claim 5, is characterized in that: in step S5, the learning rate adjustment scheme is a cosine-annealing-based learning rate adjustment scheme, and the specific steps of using the learning rate adjustment scheme and the optimizer to obtain the trained prediction model include:

8. The training method of the deep learning model for the remote sensing image semantic segmentation according to claim 7, characterized by comprising the following steps: in step S5, the step of alternately training the training sample set and the high-quality sample set using the deep learning model architecture includes:

training the quality sample set and applying a data enhancement scheme of random cropping with a crop size of 512 x 512, repeat 1 round.