CN110211138B

CN110211138B - Remote sensing image segmentation method based on confidence points

Info

Publication number: CN110211138B
Application number: CN201910494015.0A
Authority: CN
Inventors: 焦李成; 张梦璇; 黄钟键; 冯雨歆; 陈悉儿; 屈嵘; 丁静怡; 张丹; 李玲玲; 郭雨薇; 唐旭; 冯志玺
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2019-06-08
Filing date: 2019-06-08
Publication date: 2022-12-02
Anticipated expiration: 2039-06-08
Also published as: CN110211138A

Abstract

The invention discloses a remote sensing image segmentation method based on confidence points, which mainly solves the defect of low segmentation precision of high-resolution remote sensing images in the prior art. The method comprises the following specific steps: (1) constructing a convolutional neural network; (2) generating two training sets; (3) generating two test sets; (4) predicting the ground feature class labels of the test set; (5) labeling each confidence point pixel; (6) correcting the building category; (7) updating the plant category; and (8) obtaining a final ground object type label. The method has the advantages of improving the precision of the vegetation and the building category, having better segmentation effect on the low-resolution remote sensing image and having better segmentation effect on the high-resolution remote sensing image.

Description

Remote sensing image segmentation method based on confidence points

Technical Field

The invention belongs to the technical field of image processing, and further relates to a remote sensing image segmentation method based on confidence points in the technical field of remote sensing image processing. The method can be used for segmenting the high-resolution multiband remote sensing image acquired by the satellite to obtain the segmentation image with the ground object class label.

Background

Remote sensing image segmentation is a technique and process for dividing a remote sensing image into several specific regions with unique properties. At present, deep neural network learning technologies such as U-Net, PSPNet, deep Lab and the like are more adopted in engineering practice. The method for segmenting the remote sensing image based on the deep learning neural network is characterized in that the neural network is used for extracting the characteristics of the remote sensing image, and the category of each pixel is predicted through a training network, so that the segmentation image with the category label is finally obtained. The method in the prior art has a good segmentation effect on a public data set with low resolution, but the effect is not ideal for a remote sensing image with high resolution, such as a WorldView-3 satellite remote sensing image.

The patent document of Changan university in its application "a region-based multi-feature fusion high-resolution remote sensing image segmentation method" (patent application No. 201610643629.7, publication No. 106296680A) discloses a region-based multi-feature fusion high-resolution remote sensing image segmentation method. The method comprises the steps of firstly segmenting an initial remote sensing image, secondly calculating the textural feature distance, the spectral feature distance and the shape feature distance of any neighborhood in each segmented region in the initial segmented image, and finally applying a RAG method and an NNG method to carry out relevant region combination. Although the method has high segmentation precision and high execution efficiency on the high-resolution second-order remote sensing image with the resolution of 1 meter, the method still has the defects that the remote sensing image is segmented by adopting the spectral characteristic distance, the textural characteristic distance and the shape characteristic distance, and information of other wave bands of the remote sensing image is not utilized, so that the vegetation type segmentation precision is not high. In addition, due to the existence of building shadows and occlusion problems, the segmentation precision of specific building categories is not high.

The patent document of Haihe university discloses a multiband high-resolution remote sensing image segmentation method based on gray level co-occurrence matrix (patent application number: 20131056019.2, publication number: 103578110A). The method comprises the steps of independently segmenting each wave band image by adopting precipitation watershed transformation, and then overlapping segmentation results of each wave band. And finally, combining fragment areas in the segmentation result by using an area combination strategy based on multiband spectral information, thereby realizing image segmentation. Although the method can overcome the phenomena of over-segmentation and under-segmentation, and has good segmentation precision and stability, the method still has the defects that the method is not high in universality of the remote sensing image with high resolution because the method is suitable for the three-band panchromatic-multispectral fusion remote sensing image of the Shanghai region in China with the resolution of 2.5 meters, and in addition, the method adopts a gray level co-occurrence matrix method to segment the remote sensing image, the problems of existence and shielding of building shadows are not considered, and the segmentation precision of specific building categories is not high. .

Disclosure of Invention

The invention aims to provide a remote sensing image segmentation method based on confidence points aiming at the defects of the prior art. The method has high segmentation efficiency, and can realize a good segmentation result for the remote sensing image with high resolution, particularly vegetation and building categories.

The idea for achieving the purpose of the invention is to construct two training sets and two testing sets, construct a LinkNet network, set parameters of each layer, obtain two trained networks after training, send the two testing sets into the two trained networks respectively, predict ground object type labels of the testing sets through the networks, and finally obtain the ground object type labels after remote sensing image segmentation through marking confidence point pixels, updating vegetation types and correcting building types.

In order to achieve the purpose, the method comprises the following specific steps:

(1) Constructing a LinkNet neural network:

(1a) Building a 14-layer LinkNet neural network;

(1b) Setting parameters of each layer in a LinkNet neural network;

(2) Two training sets were generated:

(2a) Selecting at least 20000 images from a remote sensing image data set to form a basic training set, wherein 10000 red, green and blue RGB images are selected, and the rest 10000 images are multispectral images; the ground feature type label of each pixel of each image in the basic training set is one of five ground feature type labels of a road, a viaduct, a building, vegetation and the ground, each red, green and blue RGB image corresponds to one satellite shooting area, each multispectral image comprises eight-waveband information, and the satellite shooting area of each multispectral image is the same as the satellite shooting area of each red, green and blue RGB image;

(2b) Synthesizing all red, green and blue (RGB) images in the basic training set into an enhanced red, green and blue (RGB) image by using a synthesis method, wherein the enhanced red, green and blue (RGB) image is used as a first training set;

(2c) Synthesizing all multispectral images in the basic training set into a multiband pseudo-color image by using a synthesis method to serve as a second training set;

(3) Training a neural network:

(3a) Inputting the first training set into a LinkNet network for training to obtain a trained first segmentation network;

(3b) Inputting the second training set into a LinkNet network for training to obtain a trained second segmentation network;

(4) Two test sets were generated:

(4a) Selecting at least 6000 images from the remote sensing image data set to form a basic test set, wherein 3000 red, green and blue (RGB) images are selected, and the rest 3000 images are multispectral images; the ground feature type label of each pixel of each image in the basic test set is one of five ground feature type labels of a road, an overpass, a building, vegetation and the ground, each red, green and blue RGB image corresponds to one satellite shooting area, each multispectral image comprises eight-waveband information, and the satellite shooting area of each multispectral image is the same as the satellite shooting area of each red, green and blue RGB image;

(4b) Synthesizing the red, green and blue RGB images in the basic test set into an enhanced red, green and blue RGB image by using a synthesis method to serve as a first test set;

(4c) Synthesizing the multispectral images in the basic test set into a multiband pseudo-color image serving as a second test set by using a synthesis method;

(5) And (3) performing ground object class label prediction on the test set:

(5a) Sequentially inputting each image of the first test set into a first segmentation network, and outputting a ground object class label of each pixel in each image of the first test set and a ground object class label probability value of each pixel in each image;

(5b) Sequentially inputting each image of the second test set into a second segmentation network, and outputting the ground object class label of each pixel in each image in the second test set and the ground object class label probability value of each pixel in each image;

(6) Labeling each confidence point pixel:

traversing each pixel in each image in the first test set; marking all pixels with the surface feature label probability value of the pixel being more than 0.9 as confidence point pixels;

(7) Correcting the building type:

traversing all the ground feature type labels in the first test set to be each pixel of the building, changing the ground feature type labels of all the pixels with confidence point pixels in the range of 5 multiplied by 5 pixel points adjacent to each pixel into the building, and changing the ground feature type labels of the rest pixels into the ground; obtaining the ground object class label of each pixel in each image in the updated first test set

(8) Updating the plant category:

(8a) Multiplying the average value of pixel values of coastline waveband Coastal corresponding to all the pixels with the vegetation in the second test set by 0.8 to obtain a threshold value of coastline waveband Coastal, multiplying the average value of pixel values of Yellow waveband Yellow corresponding to all the pixels with the vegetation in the second test set by 0.8 to obtain a threshold value of Yellow waveband Yellow, and multiplying the average value of pixel values of near infrared secondary waveband NIR2 corresponding to all the pixels with the vegetation in the second test set by 0.2 to obtain a threshold value of near infrared secondary waveband NIR 2;

(8b) Traversing each pixel in each image in the second test set, updating the ground object class labels of all pixels of which the probability value of the ground object class label is less than or equal to 0.5, the pixel value corresponding to the coastline wave band Coastal is less than the threshold value of the coastline wave band Coastal, the pixel value of the Yellow wave band Yellow is less than the threshold value of the Yellow wave band Yellow, and the pixel value of the near infrared secondary wave band NIR2 is greater than the threshold value of the near infrared secondary wave band NIR2 into vegetation, and keeping the original ground object class labels of the rest pixels unchanged; obtaining a ground object type label of each pixel in each image in the updated second test set;

(9) Obtaining a final ground object class label:

(9a) Replacing the vegetation type label of each pixel in each image in the updated first test set by using the vegetation type label of each pixel in each image in the updated second test set to obtain the surface feature type label of each pixel in each image in the first test set after vegetation type replacement;

(9b) And the feature class label of each pixel in each image in the first test set after vegetation class replacement is used as the feature class label after the remote sensing image is segmented.

Compared with the prior art, the invention has the following advantages:

firstly, the ground object type labels of the vegetation output by the segmentation network are updated by setting three thresholds of coastline wave bands coast, yellow wave bands Yellow and near infrared two-wave band NIR2, so that the defect of low vegetation type segmentation precision caused by the fact that a remote sensing image is segmented by adopting a spectral characteristic distance, a texture characteristic distance and a shape characteristic distance in the prior art is overcome, even scattered vegetation can be segmented, and the segmentation precision of the vegetation type is improved.

Secondly, because the invention inputs the enhanced red, green and blue RGB image and the multiband pseudo-color image to the LinkNet neural network for training, the invention overcomes the defects that the prior art has better segmentation effect on the public data set with lower resolution and has low universality on the remote sensing image with high resolution, so that the invention not only has better segmentation effect on the remote sensing image with low resolution, but also has better segmentation effect on the remote sensing image with high resolution.

Thirdly, the invention utilizes the confidence point pixel to correct the ground feature type label of the building output by the segmentation network, thereby overcoming the defect of low segmentation precision of the concrete building type caused by the existence of the shadow and the shielding problem of the building in the prior art and improving the segmentation precision of the building type.

Drawings

FIG. 1 is an overall flow chart of the present invention.

The specific implementation mode is as follows:

step 1, constructing a neural network.

Building a 14-layer LinkNet neural network;

the structure of the LinkNet neural network with 14 layers is as follows in sequence: input layer → 1 st convolution layer → 1 st largest pooling layer → 1 st encoder → 2 nd encoder → 3 rd encoder → 4 th decoder → 3 rd decoder → 2 nd decoder → 1 st full convolution layer → 2 nd full convolution layer; wherein, the output of the 1 st coder is connected with the input of the 1 st decoder, the output of the 2 nd coder is connected with the input of the 2 nd decoder, and the output of the 3 rd coder is connected with the input of the 3 rd decoder.

The structure of the encoder is as follows in sequence: input layer → 1 st encoding convolution layer → 2 nd encoding convolution layer → 3 rd encoding convolution layer → 4 th encoding convolution layer; wherein, the input layer is connected with the output of the 2 nd coding convolution layer, and the input of the 3 rd coding convolution layer is connected with the output of the 4 th coding convolution layer; the decoder structure is as follows: input layer → 1 st decoded convolutional layer → 1 st decoded full convolutional layer → 2 nd decoded convolutional layer.

Setting parameters of each layer in a LinkNet neural network;

the parameters of each layer in the linkNet neural network are set as follows: setting the convolution kernel size of the 1 st convolution layer to 7 × 7 and the step size to 2; setting the size of the pooling window of the 1 st largest pooling layer to 3 x 3, and setting the step size to 2; setting the convolution kernel sizes of the 1 st and 2 nd full convolution layers as 3 x 3 and 2 x 2 in sequence, and setting the step sizes as 2; the convolution kernel size for the 2 nd convolution layer is set to 3 x 3 with the step size set to 1.

Wherein the parameter settings of the encoder are as follows: setting the sizes of convolution kernels of the 1 st, 2 nd, 3 th and 4 th coding convolution layers as 3 x 3, and sequentially setting the step sizes as 2, 1 and 1; the parameters of the decoder are as follows: setting the sizes of convolution kernels of the 1 st decoding convolution layer and the 2 nd decoding convolution layer as 1 x 1, and setting the step size as 1; the convolution kernel size of the 1 st decoded full convolution layer is set to 3 x 3 and the step size is set to 2.

And 2, generating two training sets.

Selecting at least 20000 images from a remote sensing image data set to form a basic training set, wherein 10000 red, green and blue RGB images are selected, and the rest 10000 images are multispectral images; the ground feature type label of each pixel of each image in the basic training set is one of five ground feature type labels of a road, a viaduct, a building, vegetation and the ground, each red, green and blue RGB image corresponds to one satellite shooting area, each multispectral image comprises eight-waveband information, and the satellite shooting area of each multispectral image is the same as the satellite shooting area of each red, green and blue RGB image;

the eight-band information means: coastline wave band Coastal, blue wave band Blue, green wave band Green, yellow wave band Yellow, red wave band Red edge, near infrared first wave band NIR1 and near infrared second wave band NIR2.

Synthesizing all red, green and blue RGB images in the basic training set into an enhanced red, green and blue RGB image by using a synthesis method to serve as a first training set;

the enhanced red, green and blue RGB image combines the original red, green and blue RGB image and the information of the near infrared two NIR2 wave bands, enhances the information of the remote sensing image such as contrast, detail texture and the like, and has good detection effect on the ground, buildings, elevated roads and other categories.

And synthesizing all multispectral images in the basic training set into a multiband pseudo-color image as a second training set by using a synthesis method.

The multi-band pseudo-color image combines a coastline Coastal waveband, a Yellow waveband and a near infrared two NIR2 waveband in the multi-spectral image, and highlights the category information of vegetation.

The synthesis method comprises the following steps:

step 1, synthesizing a green channel in each red, green and blue RGB image in a basic training set or a basic test set and corresponding near infrared two NIR2 waveband information into a green channel of an enhanced red, green and blue RGB image of the red, green and blue RGB image according to the following formula:

N＝G*w+R*(1-w)

wherein, N represents the green channel of the enhanced RGB image, G represents the green channel of the RGB image of the basic training set or the basic testing set, x represents the multiplication operation, w represents the weighting of the green channel of the enhanced RGB image and the green channel of the RGB image of the basic training set or the basic testing set, and is used for controlling the degree of the enhanced RGB image, the weighting value is 0.8, R represents the near-infrared two 2 wave bands in the NIR image corresponding to the RGB image of the basic training set or the basic testing set.

And 2, replacing the green channel of each enhanced red, green and blue RGB image in the corresponding red, green and blue RGB image in the basic training set or the basic test set by the green channel of each enhanced red, green and blue RGB image to obtain the enhanced red, green and blue RGB image after the red, green and blue RGB image is updated.

And 3, forming all the enhanced red, green and blue RGB images into a first training set or a first testing set.

And 4, respectively normalizing the values of three wave bands, namely coastline wave band coast, yellow wave band Yellow and near infrared two wave band NIR2, in each multispectral image, respectively multiplying the three normalized values by 255, respectively replacing a red channel R, a green channel G and a blue channel B of a red, green and blue RGB image in a basic training set or a basic test set corresponding to the multispectral image by the product of the three normalized values, and forming the multiband pseudo-color image by the three updated channels of the red channel R, the green channel G and the blue channel B.

Step 5, forming all multiband pseudo-color images into a second training set or a second testing set;

and 3, training a neural network.

Inputting the first training set into a LinkNet network for training to obtain a trained first segmentation network;

and inputting the second training set into a LinkNet network for training to obtain a trained second segmentation network.

LinkNet is a network structure proposed in 2018 NIPS paper LinkNet, which focuses mainly on the speed problem of Semantic Segmentation and obtains a better Segmentation effect.

And 4, generating two test sets.

Selecting at least 6000 images from the remote sensing image data set to form a basic test set, wherein 3000 red, green and blue (RGB) images are selected, and the rest 3000 images are multispectral images; the ground feature type label of each pixel of each image in the basic test set is one of five ground feature type labels of a road, an overpass, a building, vegetation and the ground, each red, green and blue RGB image corresponds to one satellite shooting area, each multispectral image comprises eight-waveband information, and the satellite shooting area of each multispectral image is the same as the satellite shooting area of each red, green and blue RGB image;

synthesizing the red, green and blue RGB images in the basic test set into an enhanced red, green and blue RGB image by using a synthesis method to serve as a first test set;

synthesizing the multispectral image in the basic test set into a multiband pseudo-color image as a second test set by using a synthesis method;

the synthesis was as described in step 2.

And 5, predicting the ground feature class labels of the test set.

Sequentially inputting each image of the first test set into a first segmentation network, and outputting a ground object class label of each pixel in each image in the first test set and a ground object class label probability value of each pixel in each image;

sequentially inputting each image of the second test set into a second segmentation network, and outputting the ground object class label of each pixel in each image in the second test set and the ground object class label probability value of each pixel in each image;

the feature type label probability value is that each image is input to a segmentation network, the segmentation network outputs five output values of each image, wherein each pixel of each image corresponds to a road, an overhead bridge, a building, vegetation and the ground, normalization operation is carried out on the five output values, the maximum value of the five normalized output values is taken, and the maximum value is used as the feature type label probability value of the pixel.

And 6, marking each confidence point pixel.

Traversing each pixel in each image in the first test set; all pixels in which the surface feature class label probability value of the pixel is greater than 0.9 are marked as confidence point pixels.

And 7, correcting the building type.

Traversing all the ground feature type labels in the first test set to be each pixel of the building, changing the ground feature type labels of all the pixels with confidence point pixels in the range of 5 multiplied by 5 pixel points adjacent to each pixel into the building, and changing the ground feature type labels of the rest pixels into the ground; obtaining a ground object type label of each pixel in each image in the updated first test set;

because the sun altitude and the shooting angle are different during satellite shooting, shadow and shading problems can occur around the building, the segmentation results of the difficult areas are not accurate enough, and therefore, the segmentation results output by the segmentation network can form scattered segmentation results in the areas, and the category of the building is corrected through the confidence point pixels.

And 8, updating the plant category.

Multiplying the average value of pixel values of all the pixels with the vegetation type in the second test set, which correspond to a coastline waveband Coastal, by 0.8 to obtain a coastline waveband Coastal threshold value, multiplying the average value of pixel values of all the pixels with the vegetation type in the second test set, which correspond to a Yellow waveband Yellow by 0.8 to obtain a Yellow waveband Yellow threshold value, and multiplying the average value of pixel values of all the pixels with the vegetation type in the second test set, which correspond to a near infrared second waveband NIR2 by 0.2 to obtain a near infrared second waveband NIR2 threshold value;

traversing each pixel in each image in the second test set, updating the ground object class labels of all pixels of which the probability value of the ground object class label is less than or equal to 0.5, the pixel value corresponding to the coastline wave band Coastal is less than the threshold value of the coastline wave band Coastal, the pixel value of the Yellow wave band Yellow is less than the threshold value of the Yellow wave band Yellow, and the pixel value of the near infrared secondary wave band NIR2 is greater than the threshold value of the near infrared secondary wave band NIR2 into vegetation, and keeping the original ground object class labels of the rest pixels unchanged; obtaining a ground object type label of each pixel in each image in the updated second test set;

since the division of the vegetation region by using only three band thresholds may be affected by noise, such as objects with similar colors to vegetation, etc., the introduction of the ground object class label probability value avoids this problem.

And 9, obtaining a final ground object type label.

Replacing the vegetation type label of each pixel in each image in the updated first test set by using the vegetation type label of each pixel in each image in the updated second test set to obtain a ground feature type label of each pixel in each image in the first test set after vegetation type replacement;

and the ground feature class label of each pixel in each image in the first test set after vegetation class replacement is used as the ground feature class label after remote sensing image segmentation.

Because the vegetation class segmentation precision of each pixel in each image in the second test set is higher, and the segmentation effect of each pixel in each image in the first test set on other classes, such as the ground, buildings, viaducts and the like is better, the two segmentation results are fused to obtain the final accurate ground object class label after the remote sensing image is segmented.

Claims

1. A remote sensing image segmentation method based on confidence points is characterized in that a neural network is used for predicting ground object class labels, the class labels obtained according to a multi-view and double-data model are fused, and the confidence points are used for correcting building classes in a segmentation map, and the method comprises the following steps:

(1) Constructing a LinkNet neural network:

(1a) Building a 14-layer LinkNet neural network;

(1b) Setting parameters of each layer in a LinkNet neural network;

(2) Two training sets were generated:

(3) Training a neural network:

(4) Two test sets were generated:

(5) And (3) predicting the ground object class labels of the test set:

(6) Labeling each confidence point pixel:

(7) Modified building type:

(8) Updating the plant category:

(9) Obtaining a final ground object type label:

(9a) Replacing the vegetation type label of each pixel in each image in the updated first test set by using the vegetation type label of each pixel in each image in the updated second test set to obtain a ground feature type label of each pixel in each image in the first test set after vegetation type replacement;

(9b) And the ground feature class label of each pixel in each image in the first test set after vegetation class replacement is used as the ground feature class label after remote sensing image segmentation.

2. The remote sensing image segmentation method based on the confidence points as claimed in claim 1, wherein the structure of the LinkNet neural network with 14 layers in step (1 a) is as follows in sequence: input layer → 1 st convolutional layer → 1 st maximally pooling layer → 1 st encoder → 2 nd encoder → 3 rd encoder → 4 th decoder → 3 rd decoder → 2 nd decoder → 1 st full convolutional layer → 2 nd full convolutional layer; wherein, the output of the 1 st coder is connected with the input of the 1 st decoder, the output of the 2 nd coder is connected with the input of the 2 nd decoder, and the output of the 3 rd coder is connected with the input of the 3 rd decoder;

the structure of the encoder is as follows in sequence: input layer → 1 st coding convolution layer → 2 nd coding convolution layer → 3 rd coding convolution layer → 4 th coding convolution layer; wherein, the input layer is connected with the output of the 2 nd coding convolution layer, and the input of the 3 rd coding convolution layer is connected with the output of the 4 th coding convolution layer; the decoder structure is: input layer → 1 st decoded convolutional layer → 1 st decoded full convolutional layer → 2 nd decoded convolutional layer.

3. The remote sensing image segmentation method based on the confidence point as claimed in claim 1, wherein the setting of the parameters of each layer in the LinkNet neural network in the step (1 b) is as follows: setting the convolution kernel size of the 1 st convolution layer to 7 × 7 and setting the step size to 2; setting the size of the pooling window of the 1 st largest pooling layer to 3 x 3, and setting the step size to 2; the convolution kernels of the 1 st and 2 nd full convolution layers are sequentially set to be 3 x 3 and 2 x 2, and the step length is set to be 2; setting the convolution kernel size of the 2 nd convolution layer as 3 x 3 and setting the step size as 1;

wherein the parameter settings of the encoder are as follows: setting the sizes of convolution kernels of the 1 st, 2 nd, 3 rd and 4 th coding convolution layers as 3 x 3, and sequentially setting the step sizes as 2, 1 and 1; the parameters of the decoder are as follows: setting the sizes of convolution kernels of the 1 st decoding convolution layer and the 2 nd decoding convolution layer as 1 x 1, and setting the step size as 1; the convolution kernel size of the 1 st decoded full convolution layer is set to 3 x 3 and the step size is set to 2.

4. The remote sensing image segmentation method based on the confidence point as claimed in claim 1, wherein the eight-band information in the steps (2 a) and (4 a) is: coastline wave band Coastal, blue wave band Blue, green wave band Green, yellow wave band Yellow, red wave band Red, near infrared first wave band NIR1, near infrared second wave band NIR2.

5. The remote sensing image segmentation method based on the confidence point as set forth in claim 1, wherein the synthesis method in the steps (2 b), (2 c), (4 b) and (4 c) comprises the following steps:

firstly, synthesizing a green channel in each red, green and blue RGB image in a basic training set or a basic test set and corresponding near infrared two NIR2 waveband information according to the following formula:

N＝G*w+R*(1-w)

n represents a green channel of the enhanced red, green and blue RGB image, G represents a green channel of the red, green and blue RGB image of the basic training set or the basic test set, x represents multiplication operation, w represents a weight of the green channel of the enhanced red, green and blue RGB image and the green channel of the red, green and blue RGB image in the basic training set or the basic test set, the weight is 0.8, R represents a near-infrared two NIR2 wave band in the multispectral image corresponding to the red, green and blue RGB image in the basic training set or the basic test set;

replacing the green channel of each enhanced red, green and blue RGB image in the corresponding red, green and blue RGB image in the basic training set or the basic test set by the green channel of each enhanced red, green and blue RGB image to obtain the enhanced red, green and blue RGB image after the red, green and blue RGB image is updated;

thirdly, forming a first training set or a first testing set by all the enhanced red, green and blue RGB images;

fourthly, respectively normalizing the values of three wave bands, namely coastline wave band coast, yellow wave band Yellow and near infrared two wave band NIR2, in each multispectral image, respectively multiplying the three normalized values by 255, respectively replacing a red channel R, a green channel G and a blue channel B of a red, green and blue RGB image in a basic training set or a basic test set corresponding to the multispectral image by the product of the three normalized values, and forming the multiband pseudo-color image by the three updated channels of the red channel R, the green channel G and the blue channel B;

and a fifth step of combining all the multi-band pseudo-color images into a second training set or a second test set.

6. The remote sensing image segmentation method based on the confidence point according to the claim 1, wherein the feature label probability value in the steps (5 a) and (5 b) is that each image is input to the segmentation network, the segmentation network outputs five output values of each image, each pixel of which corresponds to a road, an overhead bridge, a building, vegetation and the ground, the five output values are normalized, the maximum value of the normalized five values is taken, and the maximum value is taken as the feature label probability value of the pixel.