CN112884791A

CN112884791A - Method for constructing large-scale remote sensing image semantic segmentation model training sample set

Info

Publication number: CN112884791A
Application number: CN202110140509.6A
Authority: CN
Inventors: 丁忆; 文力; 胡艳; 李朋龙; 马泽忠; 肖禾; 张泽烈; 王亚林; 敖影; 范文武; 王小攀; 刘建; 刘旭蕾; 郑中; 陈阳
Original assignee: Chongqing Geographic Information And Remote Sensing Application Center
Current assignee: Chongqing Geographic Information And Remote Sensing Application Center
Priority date: 2021-02-02
Filing date: 2021-02-02
Publication date: 2021-06-01
Anticipated expiration: 2041-02-02
Also published as: CN112884791B

Abstract

The invention discloses a method for constructing a large-scale remote sensing image semantic segmentation model training sample set, which is characterized in that existing remote sensing image vector data and a multi-stage remote sensing image are registered, and a primary sample set is automatically cut and extracted through a sliding window algorithm according to the density of image spots; extracting the characteristics of each image in the primary sample set, classifying by adopting a clustering algorithm, and removing samples with poor image quality to obtain an intermediate sample set; and inputting the intermediate sample set into the semantic segmentation model in batches for iterative optimization training, predicting the samples after each iterative optimization is completed, and eliminating error samples in the intermediate sample set to obtain a target sample set. The remarkable effects are as follows: the generation of a whole image and a mask with a large occupied space can be avoided, the sliding times of a sliding window are reduced, and the extraction speed and the data quality of a sample are improved; the purity of the correct sample in the sample set is improved, and the cost for manufacturing a large-scale sample set is greatly reduced.

Description

Method for constructing large-scale remote sensing image semantic segmentation model training sample set

Technical Field

The invention relates to the technical field of remote sensing image feature extraction, remote sensing image semantic segmentation and sample set manufacturing, in particular to a method for constructing a large-scale remote sensing image semantic segmentation model training sample set.

Background

The high-resolution train satellite is successfully launched, PB-level remote sensing image data can be provided every year, meanwhile, a large amount of vector data are marked for ground objects by using the remote sensing images in projects such as 'national soil survey for the third time' and 'geographical national condition monitoring', if the data are effectively utilized to form a sample set with a uniform format and an ultra-large magnitude, a deep learning model with higher precision and higher generalization capability can be trained, and important support is provided for related projects and scientific research work. The currently mainstream methods for making a remote sensing image sample set mainly include the following two methods:

firstly, manually and directly carrying out visual interpretation and marking on a remote sensing image to manufacture a sample;

secondly, a sample set is manufactured by utilizing the existing manual interpretation vector result and the remote sensing image strictly corresponding to the manual interpretation vector result through an automatic cutting method.

Although the sample sets manufactured by the two methods have high precision, the sample set manufactured by the first method consumes a large amount of manpower and material resources, is extremely low in efficiency, and is difficult to construct a million-level sample set; in the second method, the vector and the remote sensing image must be strictly corresponding, i.e. the vector is obtained by sketching the image, but the strictly corresponding data is very limited, and the requirement of large-scale sample set production cannot be met, if the vector and other images are used for producing samples, the problems that the sample set obtained by automatic cutting has wrong samples caused by cloud coverage, shadow shielding and the like due to time phase change, image quality and other factors, and the sample label precision is not accurate due to incomplete matching of the vector and the image can be caused. Both methods cannot be used for large-scale sample set production, the first method causes the sample set production cost to be too high, and the second method causes the sample set quality to be difficult to guarantee.

Disclosure of Invention

Aiming at the defects of the prior art, the invention aims to provide a method for constructing a large-scale remote sensing image semantic segmentation model training sample set, which can register the existing vector data with a multi-period remote sensing image, automatically cut and extract samples, improve the manufacturing efficiency of the sample set and the quality precision of the sample set by optimizing the sample cutting and extracting process, introducing the methods of automatic detection and elimination of wrong samples and iterative optimization and purification of the sample set, and can be applied to the fields of remote sensing image semantic segmentation, remote sensing image data set manufacturing and the like.

In order to achieve the purpose, the technical scheme adopted by the invention is as follows:

a method for constructing a large-scale remote sensing image semantic segmentation model training sample set is characterized by comprising the following steps:

step 1, registering existing remote sensing image vector data with a multi-stage remote sensing image, and automatically cutting and extracting a primary sample set through a sliding window algorithm;

in the extraction process of the primary sample set, when the distribution of the pattern spots in the remote sensing image vector data after registration is sparse, the steps are as follows:

a1, acquiring the four-to-range of each image spot, and expanding a buffer area with random size to the periphery on the basis of the four-to-range;

step A2, rasterizing the image spot range after the buffer area is expanded to obtain a mask of a target ground object in the area;

step A3, sliding a sliding window with fixed size on the mask, and calculating the ratio between the foreground and the background;

a4, when the ratio is larger than the set threshold, cutting the data of the area on the image and the mask as the image and the label;

step A5, integrating the cut data and the labels to obtain a primary sample set;

when the distribution of the pattern spots in the remote sensing image vector data after registration is dense, the method comprises the following steps:

b1, rasterizing the vector data of the remote sensing image to obtain a mask covering the whole remote sensing image;

step B2, using a sliding window with fixed size to slide on the remote sensing image and the mask simultaneously to extract data as an image and a label;

b3, integrating the cut data and the label to obtain a primary sample set;

step 2, extracting the characteristics of each image in the primary sample set, classifying by adopting a clustering algorithm, and removing samples with poor image quality to obtain an intermediate sample set;

and 3, inputting the intermediate sample set into the semantic segmentation model in batches for iterative optimization training, predicting the samples after each iterative optimization is completed, and eliminating error samples in the intermediate sample set to obtain a target sample set.

Further, the range size of the buffer area is 1-2 times of that of the sliding window.

Further, the step of obtaining the intermediate sample set in step 2 is:

2.1, respectively extracting texture features and color features of each image in the primary sample set;

2.2, fusing the texture features and the color features in a direct splicing mode to obtain a fusion feature set of the image;

and 2.3, clustering the fusion feature set of the image through a clustering algorithm, analyzing the category generated by clustering the primary sample set, deleting the samples containing cloud and shadow occlusion in the primary sample set, and reserving the samples containing the target ground objects.

Further, in step 2.1, GLCM is used to extract the texture features, and a color histogram is used to extract the color features.

Further, the clustering algorithm in step 2.3 adopts a density clustering algorithm, which specifically comprises the following steps:

step 2.3.1, setting a domain distance threshold value among elements, setting a minimum contained point number in a range taking the domain distance threshold value as a neighborhood radius, and if the number of elements contained in the neighborhood radius of one element is not less than the minimum contained point number, taking the element as a core object;

step 2.3.2, finding out a core object in the fusion feature set of the image, and adding the core object into the core object set;

step 2.3.3, randomly selecting an element which is not accessed in the core object set, firstly marking the element as accessed, then marking the type of the element, and finally adding the non-core object which is not accessed in the neighborhood radius into the seed set;

step 2.3.4, judging whether the seed set is empty, if so, finishing the generation of the cluster and entering step 2.3.6, otherwise, entering step 2.3.5;

2.3.5, randomly selecting a seed from the seed set, judging that the seed is a non-core object, if so, adding an object in the radius of the seed field into the seed set, and skipping to the step 2.3.4;

and 2.3.6, judging whether all the elements in the core object set are accessed, if so, finishing clustering the fusion feature set of the images, otherwise, skipping to the step 2.3.3.

Further, the semantic segmentation model is any one of an FCN network model, a SegNet network model, a cavity convolution model, a deep Lab series model, a RefineNet model and a PSPNet model.

Further, the step of obtaining the target sample set in step 3 is as follows:

step 3.1, training the selected semantic segmentation model by adopting an intermediate sample set;

step 3.2, predicting the samples in the middle sample set by adopting the trained semantic segmentation model, and comparing the prediction result with the true value of the samples to obtain the prediction accuracy of each sample;

3.3, deleting samples with prediction accuracy rates smaller than a preset coincidence rate threshold value;

and 3.4, judging whether the accuracy of the model is not less than a preset accuracy threshold, if so, finishing the manufacture of the target sample set, and otherwise, returning to the step 3.1.

Further, the training step of the semantic segmentation model in step 3.1 is as follows:

step 3.1.1, selecting a model, a loss function and an optimizer, and setting related parameters;

step 3.1.2, inputting samples in the intermediate sample set into the selected model in batches, extracting image features through downsampling by utilizing a cavity convolution structure, and upsampling the feature map to obtain an output result;

step 3.1.3, calculating the error between the output result and the label;

step 3.1.4, calculating the gradient of each parameter in the model, and optimizing the parameters in a back propagation mode;

and 3.1.5, judging whether the model is converged, if so, stopping training, and obtaining the accuracy rate of the model, otherwise, skipping to the step 3.1.2.

Further, the calculation formula of the error between the output result in step 3.1.3 and the label is:

L_fl＝-(1-y′*y)^γlog(y′)，

wherein y' is a model output result, y is a model input result, and gamma is a hyper-parameter.

Compared with the prior art, the invention has the following beneficial effects:

(1) the image cutting process is improved, only the area near the image spots is rasterized, when the image spots are sparsely distributed, the generation of the whole image and a mask with a large occupied space can be avoided, the sliding times of a sliding window are reduced, and the extraction speed of a sample is improved.

(2) The samples with poor image quality in the sample set are automatically screened out through a feature extraction algorithm and an automatic clustering algorithm, the samples with cloud shielding and shadow shielding are removed, and the data quality of the sample set is improved.

(3) The sample set is purified in an iterative optimization semantic segmentation model mode, wrong samples with inaccurate labels are automatically screened out, the purity of correct samples in the sample set is improved, and the labor cost for manufacturing the sample set is greatly reduced.

(4) The method can fully utilize the existing vector results, correspondingly extract data and labels from the vectors and the multiple remote sensing images, and ensure the quality of the sample set by a series of methods, thereby reducing the cost for manufacturing a large-scale sample set.

Drawings

FIG. 1 is a flow chart of a method of the present invention;

FIG. 2 is a schematic diagram of a sample for screening for poor image quality according to the present invention;

FIG. 3 is a schematic representation of the purification of a sample set using DeepLabV3+ according to the present invention.

Detailed Description

The following provides a more detailed description of the embodiments and the operation of the present invention with reference to the accompanying drawings.

As shown in fig. 1, a method for constructing a large-scale remote sensing image semantic segmentation model training sample set includes the following specific steps:

in this example, in order to avoid generating a mask with a large occupied space and a whole image when the distribution of the image patches is sparse, in the extraction process of the primary sample set, respective processing is performed according to the density of the image patches to reduce the sliding frequency of the sliding window and improve the extraction speed of the sample.

When the distribution of the pattern spots in the remote sensing image vector data after registration is sparse, the steps are as follows:

a1, obtaining the range of four to four of each image spot, and expanding a buffer area with random size to the periphery on the basis of the range of four to four, wherein the range of the buffer area is 1-2 times of that of the sliding window to ensure that the size of the mask after rasterization is larger than that of the sliding window, and the buffer area is used for ensuring that the size of the mask is larger than that of the sliding window and simultaneously retaining the context information around the image spot;

step A3, slide on mask with fixed size sliding window (1024 × 1024), and pass

Calculate the ratio between foreground and background, Counts_targetNumber of foreground pixels, Counts_backgrounIs the background pixel number;

a4, when the ratio is larger than the set threshold 0, cutting the image and the data of the area on the mask as the image and the label;

step B2, using a sliding window (1024 multiplied by 1024) with fixed size to simultaneously slide and extract data on the remote sensing image and the mask as an image and a label;

b3, integrating the cut data and the label to obtain a primary sample set;

step 2, extracting features of each image in the primary sample set, classifying the images by adopting a clustering algorithm, removing samples with poor image quality, and obtaining an intermediate sample set, wherein the method comprises the following specific steps as shown in fig. 2:

step 2.1, because the texture features and the color features help to detect the cloud and the shadow in the image, respectively for each image I in the primary sample set_iAnd calculating the texture characteristic r by using a GLCM algorithm according to the following formula_i ^G：

r(a,b|d,θ)＝{(a,b)|f(a,b)＝i,f(a+da,b+db)＝j；a,b＝0,1,2,...,N-1}；

Wherein d is the relative distance expressed by the number of pixels, theta is the angle of the calculation direction, (a, b) are the pixel coordinates of the image, and N is the number of gray levels of the image;

and extracting image I through a color histogram_iColor characteristic r of_i ^H；

Step 2.2, fusing the texture features r in a direct splicing mode_i ^GAnd color characteristics r_i ^HObtaining a fusion characteristic set r of the image_i；

Step 2.3, clustering the fusion feature set of the image through a DBSCAN (density clustering) algorithm, analyzing the class generated by clustering the primary sample set, deleting the samples containing cloud shielding and shadow shielding in the primary sample set, and reserving the samples containing the target ground objects, wherein the method specifically comprises the following steps:

step 2.3.1, initializing the class k to 0, setting the domain distance threshold epsilon between elements to 10, and calculating the distance between elements according to the following formula:

wherein n is the number of elements.

Meanwhile, the minimum contained point MinPts in the range taking the domain distance threshold epsilon as the neighborhood radius is set, the value of the minimum contained point MinPts is set to be 100, and if the number of elements contained in the neighborhood radius by one element is not less than the minimum contained point MinPts, the element r_iIs a core object;

step 2.3.2, traversing the fusion feature set r of the image_iIf the core object in the fusion feature set of the image is found, adding the core object into a core object set omega;

step 2.3.3, randomly selecting an element I which is not accessed in the core object set omega, firstly marking the element I as accessed, then marking the type of the element I as k, and finally adding the non-core object which is not accessed in the neighborhood radius into the seed set Seeds;

step 2.3.4, judging whether the seed set Seeds are empty, if so, finishing the generation of the cluster and entering step 2.3.6, otherwise, entering step 2.3.5;

2.3.5, randomly selecting a seed from the seed set Seeds, judging that the seed is a non-core object, adding an object within the radius of the seed field into the seed set if the seed is the non-core object, and skipping to the step 2.3.4;

and 2.3.6, judging whether all the elements in the core object set are accessed, if so, finishing clustering the fusion feature set of the image, otherwise, making k equal to k +1, and jumping to the step 2.3.3.

In the specific implementation process, the semantic segmentation model is any one of an FCN network model, a SegNet network model, a hole convolution model, a deep lab series model, a reflonenet model and a PSPNet model. In this embodiment, a deplaybv 3+ model in a deplab series is selected as a semantic segmentation model M, the model M is used to purify the intermediate sample set, and samples with low labeling precision are removed, as shown in fig. 3, the specific steps are as follows:

step 3.1, training the selected semantic segmentation model by adopting an intermediate sample set, wherein the training step is as follows:

3.1.1, selecting a model, wherein the size of an input image is 1024 × 1024 × 3, the Focal local is used as a Loss function, Adam is selected as an optimizer, and related parameters are set, wherein the initial learning rate is set to be 0.001, the attenuation of the learning rate is set to be 0.9, the iteration times of the model is set to be 500, and the number of input images in each batch is set to be 64;

step 3.1.2, inputting samples in the middle sample set into the selected model M in batches, utilizing a cavity convolution structure to extract image characteristics through the following formula downsampling,

where, rate represents the void convolution expansion rate and w [ k ] represents the kth parameter of the convolution kernel. After down sampling, up sampling the characteristic graph to obtain an output result y';

step 3.1.3, calculating the error loss between the output result and the label by the following formula:

L_fl＝-(1-y′*y)^γlog(y′)，

wherein y' is the model output result, y is the model input result, and γ is the hyper-parameter, set to 0.5.

Step 3.1.4, calculating the gradient of each parameter in the model

Parameters are optimized in a back propagation mode;

Step 3.2, predicting the samples in the middle sample set by adopting the trained semantic segmentation model, and comparing the prediction result with the true value of the sample to obtain the coincidence rate of the prediction result and the true value of each sample;

calculating the coincidence rate of the prediction result and the sample truth value by the following formula:

p_i＝f(M(D_i),l_i),i∈[i,n]，

wherein p is_iFor prediction accuracy, n is the number of samples in the intermediate sample set, M (D)_i) To predict the result, l_iThe sample true value is shown, f is an evaluation index, PA (Pixel accuracy) is used as the evaluation index in the experiment, and the calculation formula of PA is as follows:

step 3.3, delete coincidence rate p_iSamples smaller than a preset coincidence rate threshold value alpha;

and 3.4, judging whether the accuracy of the model is not less than a preset accuracy threshold value precision, if so, finishing the manufacture of the target sample set, and otherwise, returning to the step 3.1.

In this example, the initial value of the overlap ratio threshold α is 25%, and the final value is 85%; the model accuracy threshold precision is 90%.

The invention can utilize the existing vector results and combine with the multi-stage remote sensing images to construct a large-scale semantic segmentation model training sample set. The invention screens the samples with poor image quality and inaccurate labels in the sample set by using a clustering algorithm and an iterative optimization method, improves the data quality of the sample set, and most of the operations are automatically completed by using a computer program, thereby avoiding manual one-by-one screening and effectively reducing the cost for manufacturing large-scale sample sets.

The technical solution provided by the present invention is described in detail above. The principles and embodiments of the present invention are explained herein using specific examples, which are presented only to assist in understanding the method and its core concepts. It should be noted that, for those skilled in the art, it is possible to make various improvements and modifications to the present invention without departing from the principle of the present invention, and those improvements and modifications also fall within the scope of the claims of the present invention.

Claims

1. A method for constructing a large-scale remote sensing image semantic segmentation model training sample set is characterized by comprising the following steps:

b3, integrating the cut data and the label to obtain a primary sample set;

2. The method for constructing the large-scale remote sensing image semantic segmentation model training sample set according to claim 1, characterized in that: the range size of the buffer area is 1-2 times of that of the sliding window.

3. The method for constructing the large-scale remote sensing image semantic segmentation model training sample set according to claim 1, characterized in that: the step 2 of obtaining the intermediate sample set comprises the following steps:

4. The method for constructing the large-scale remote sensing image semantic segmentation model training sample set according to claim 3, characterized in that: in step 2.1, GLCM is adopted to extract the texture features, and a color histogram is adopted to extract the color features.

5. The method for constructing the large-scale remote sensing image semantic segmentation model training sample set according to claim 3 or 4, characterized in that: the clustering algorithm in the step 2.3 adopts a density clustering algorithm, and the specific steps are as follows:

6. The method for constructing the large-scale remote sensing image semantic segmentation model training sample set according to claim 1, characterized in that: the semantic segmentation model is any one of an FCN network model, a SegNet network model, a cavity convolution model, a deep Lab series model, a RefineNet model and a PSPNet model.

7. The method for constructing the large-scale remote sensing image semantic segmentation model training sample set according to claim 1 or 6, characterized in that: the step of obtaining the target sample set in step 3 is as follows:

8. The method for constructing the large-scale remote sensing image semantic segmentation model training sample set according to claim 7, characterized in that: the training step of the semantic segmentation model in step 3.1 is as follows:

step 3.1.3, calculating the error between the output result and the label;

9. The method for constructing the large-scale remote sensing image semantic segmentation model training sample set according to claim 8, characterized in that: the calculation formula of the error between the output result and the label in step 3.1.3 is:

L_fl＝-(1-y′*y)^γlog(y′)，