CN114972759A - Remote sensing image semantic segmentation method based on hierarchical contour cost function - Google Patents

Remote sensing image semantic segmentation method based on hierarchical contour cost function Download PDF

Info

Publication number
CN114972759A
CN114972759A CN202210675935.4A CN202210675935A CN114972759A CN 114972759 A CN114972759 A CN 114972759A CN 202210675935 A CN202210675935 A CN 202210675935A CN 114972759 A CN114972759 A CN 114972759A
Authority
CN
China
Prior art keywords
contour
network
remote sensing
convolution
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210675935.4A
Other languages
Chinese (zh)
Inventor
韩振
吕宁
陈晨
原昊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202210675935.4A priority Critical patent/CN114972759A/en
Publication of CN114972759A publication Critical patent/CN114972759A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/34Smoothing or thinning of the pattern; Morphological operations; Skeletonisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/13Satellite images

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Astronomy & Astrophysics (AREA)
  • Remote Sensing (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a remote sensing image semantic segmentation method based on a hierarchical contour cost function, which is used for segmenting a high-resolution remote sensing image. The method comprises the following implementation steps: 1. generating a training set; 2. constructing an inclusion-v 3U-Net segmentation network; 3. training an increment-v 3U-Net network; 4. and predicting the remote sensing image. The invention constructs the training network inclusion-v 3U-Net, reduces the calculated amount and the parameter quantity, and improves the training efficiency. The invention constructs a hierarchical contour cost function to monitor network loss, enhances the capability of the model to segment the foreground contour, refines the contour judgment range by a method of subtracting corrosion after the convolution kernel is sequentially expanded, and improves the accuracy of contour classification. Meanwhile, the invention endows the contour levels of the background and the foreground with the same relative contour distance with pairwise corresponding and complementary hyper-parameters so as to realize the accurate segmentation of the contour.

Description

Remote sensing image semantic segmentation method based on hierarchical contour cost function
Technical Field
The invention belongs to the technical field of image processing, and further relates to a remote sensing image semantic segmentation method based on a hierarchical contour cost function in the technical field of image segmentation. The method can be used for segmenting the remote sensing image, and further realizes the remote sensing image spot interpretation task in the automatic production construction project.
Background
With the rapid development of the technology in the field of remote sensing, a great deal of image data interpretation processing requirements are brought. The interpretation of the remote sensing image requires identifying the region or object by segmenting out the different classes of objects, regions and assigning the same label to the same class. At present, in engineering practice, deep neural network learning technologies such as U-Net are adopted to carry out remote sensing image segmentation, the neural network is used for extracting the characteristics of the remote sensing image, the category of each pixel is predicted through a training network, and therefore a segmentation graph with category labels is obtained finally. However, the existing method can only effectively complete simpler tasks, and the accuracy of subsequent interpretation results is affected due to the fact that the edges of the segmented contours are not clear and smooth enough, so that the interpretation results still need to be further distinguished and corrected by related personnel in a complex scene. In the practical process, the cost function mainly used by the semantic segmentation network at the present stage cannot well improve the accuracy of the model for segmenting the target contour.
Shandongfengshi information technology, Inc. in the patent document "a remote sensing image segmentation method and system based on edge auxiliary information" (patent application No. 202111094364.7, application publication No. CN113920311A) discloses a remote sensing image segmentation method based on edge auxiliary information. The method comprises the following implementation steps: firstly, preprocessing a remote sensing image to obtain a plurality of local images; then, predicting the local image by adopting a remote sensing image segmentation model which takes the cross entropy as a segmentation and auxiliary cost function and takes Resnet as a main network to obtain a prediction result that each pixel belongs to various types; and finally, obtaining an edge characteristic diagram and a main body characteristic diagram through upsampling, splicing and difference value operation, and fusing to obtain a final characteristic diagram, thereby improving the remote sensing segmentation precision. However, the method still has the disadvantages that the cross entropy is directly used as a cost function during segmentation, so that the model focuses more on the accuracy of background segmentation, the foreground is ignored, the boundary and the internal texture cannot be effectively distinguished, and the edge processing is rough. Meanwhile, the method takes Resnet as a backbone network, so that the amount of training parameters is increased, and the calculation efficiency is low.
Chen Z, Zhou H, Lai J et al, in its published article "Boundary Loss: Boundary-Aware Learning for sales Object Segmentation" (IEEE Transactions on Image Processing,2021,30:431-443), propose an Image Segmentation method using a Contour cost function. In the method, during training, a pre-trained VGG-16 network is firstly adopted for image classification to obtain a multilayer characteristic diagram, then the multilayer characteristic diagram is sent to a decoder composed of residual blocks, and a weight matrix for increasing the proportion of pixels in a contour range is adopted as a contour cost function to help learning the boundary distinction between a significant object and a background. The method has the defects that when the method is used for roughly judging the contour of the contour cost function, the problem of wrong judgment on different types of contours can occur. Meanwhile, the obtained weight matrixes of different categories are the same, different directions of the outline cannot be accurately distinguished, and the judgment of the different directions of the outline lacks pertinence, so that the accuracy of the outline is lost.
Disclosure of Invention
The invention aims to provide a remote sensing image semantic segmentation method based on a hierarchical contour cost function aiming at the defects in the prior art, and the method is used for solving the problems that the foreground is ignored when edges are segmented and processed in the remote sensing image semantic segmentation, the boundary and the internal texture cannot be effectively distinguished, the parameter quantity and the calculated quantity are large, the judgment on contours of different categories is wrong, and the contours of segmented objects are inaccurate.
The idea for achieving the purpose of the invention is that a U-Net segmentation network with the inclusion-v 3 as a main body is constructed, namely the U-Net segmentation network with the inclusion-v 3 divides the remote sensing image, and decomposes large convolution into small convolution so as to reduce the network parameter number. The invention constructs a hierarchical contour cost function to supervise network loss, pays more attention to the accuracy of boundary processing during training and enhances the capability of a model for segmenting the foreground contour. The invention thins the outline judgment range by a method of subtracting corrosion after the convolution kernels are sequentially expanded, so that pixels at different positions and under different labels can have different weight parameters, and the accuracy of outline classification is improved. The invention endows contour levels in two directions of the background and the foreground with the same relative contour distance with pairwise corresponding and complementary hyper-parameters, so that the complementary weights form counterstudy in the training process, thereby realizing the accurate segmentation of the contour. And training the network by taking a data set formed by the high-resolution remote sensing image as a sample to obtain a final remote sensing image semantic segmentation model.
The method comprises the following specific steps:
step 1, generating a training set:
step 1.1, randomly selecting at least 20 high-resolution remote sensing images with balanced foreground and background ratios and corresponding label images; cropping each high resolution image and corresponding label image to a size of 224 x 224 pixels;
step 1.2, selecting a label image with foreground pixels accounting for more than 10% of the cut label image and a remote sensing image corresponding to the label image to form a training set;
step 2, constructing an inclusion-v 3U-Net segmentation network:
step 2.1, 1 convolution module is constructed:
building a convolution module formed by connecting a first convolution layer and a second convolution layer in series;
setting the sizes of convolution kernels of the first convolution layer and the second convolution layer to be 1 multiplied by 1, setting the step length to be 1 and setting the edge filling to be 1;
step 2.2, constructing an up-sampling sub-network:
an up-sampling sub-network which is formed by sequentially connecting a first up-sampling module, a second up-sampling module, a third up-sampling module and a CBR module in series is built as a decoder; the first to third upsampling modules have the same structure, and each upsampling module has the following structure in sequence: a first convolution layer, a second convolution layer, a third convolution layer, a BatchNorm layer, an active layer and an upper sampling layer;
the convolution kernels of the first convolution layer, the second convolution layer and the third convolution layer are all set to be 3 multiplied by 3, the step length is all set to be 1, the edge filling is all set to be 1, the slope of the negative part of the active layer is set to be 0.2, the active layer is realized by adopting a LeakyReLU function, and the up-sampling layer adopts a double nearest neighbor up-sampling mode;
the structure of the CBR module is as follows in sequence: a convolutional layer, a BatchNorm layer, an active layer; setting the convolution kernel size of the convolution layer to be 3 multiplied by 3, setting the step length to be 1, setting the edge filling to be 1, setting the slope of the negative part of the active layer to be 0.2, and realizing the active layer by adopting a LeakyReLU function;
step 2.3, respectively connecting the inputs of three up-sampling modules in an up-sampling sub-network with the outputs of three inclusion modules in an inclusion-v 3 network and connecting the outputs of input modules with the input of a CBR module by using a concatemate mode to form an inclusion-v 3 network with a jump connection structure;
2.4, sequentially connecting an inclusion-v 3 network with a hop connection structure, a convolution module and an up-sampling sub-network in series to form an inclusion-v 3U-Net network;
step 3, training an increment-v 3U-Net network:
inputting the training set into an increment-v 3U-Net network, and iteratively updating parameters of each layer in the increment-v 3U-Net network by using a gradient descent method until a total cost function is converged to obtain the trained increment-v 3U-Net network;
step 4, predicting the remote sensing image:
step 4.1, sequentially cutting all remote sensing images to be predicted into 224 multiplied by 224, and marking sequence numbers on the cut remote sensing images to be predicted;
step 4.2, sequentially inputting the images with the label serial numbers into a trained inclusion-v 3U-Net network to obtain a cutting result of the cut remote sensing image;
and 4.3, sequentially splicing the segmentation results of the cut remote sensing images according to the sequence numbers to obtain a final segmentation result.
Compared with the prior art, the invention has the following advantages:
firstly, the loss of the segmentation network is supervised by using the hierarchical contour cost function in training, and the problems that the foreground is neglected and the accuracy of foreground segmentation is reduced in the semantic segmentation of the remote sensing image in the prior art are solved, so that the foreground segmentation accuracy can be ensured, and the boundary and the internal texture can be effectively distinguished.
Secondly, the invention constructs a training network increment-v 3U-Net, and takes an increment-v 3 model structure as a backbone network of an encoder part, thereby overcoming the problems of large parameter and more difficult model training in the prior art, greatly reducing the calculated amount and parameters, having higher efficiency in training time and having the advantage of easy popularization and application.
Thirdly, the invention adopts a method of sequentially expanding and corroding by 2 multiplied by 2 convolution kernels, so that the outline judgment range of the hierarchical outline cost function is refined, the problem of wrong judgment of different types of outlines caused by rough judgment in the outline judgment in the prior art is solved, and the processing of outline information is refined, so that the accuracy of the outline classification result is improved.
Fourthly, the hyper-parameters given to the contour levels in the background and foreground directions with the same relative contour distance in the grading contour cost function correspond to each other in pairs and are complementary with each other, the problem that processing of the contours in different directions is not targeted in the prior art is solved, and the segmentation result obtained by the method has a more accurate contour.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a schematic diagram of the inclusion-v 3U-Net network of the present invention;
fig. 3 is a simulation diagram of the remote sensing image segmentation of Massachusetts buildings according to the invention.
Detailed Description
The invention will be described in further detail with reference to fig. 1 and the examples.
The implementation steps of the present invention will be described in further detail with reference to fig. 1 and the embodiment.
Step 1, generating a training set.
Step 1.1, the embodiment of the invention randomly selects 20 label images corresponding to high-resolution remote sensing images with balanced foreground and background proportion from Massachusetts building data sets, and the resolution of each image is 1500 multiplied by 1500 pixels. Each high resolution image and corresponding label image are cropped one by one to a size of 224 x 224 pixels.
And 1.2, selecting the label images with the foreground pixel ratio of more than 10% in the cut label images and the remote sensing images corresponding to the label images to form a training set.
And 2, constructing an inclusion-v 3U-Net network.
The inclusion-v 3U-Net network constructed by the present invention is described in further detail with reference to FIG. 2.
And 2.1, constructing an inclusion-v 3 network.
The encoder of the present invention employs a prior art inclusion-v 3 network. The structure of the Incep-v 3 network is formed by connecting 1 input module and 3 Incep modules with the same structure in series. The structure of the input module is as follows in sequence: the first convolution layer, the first pooling layer, the first LocalRespNorm layer, the second convolution layer, the third convolution layer, the second LocalRespNorm layer and the second pooling layer.
The convolution kernel size of the first convolution layer is set to be 7 x 7, the convolution kernel of the second convolution layer is set to be 1 x 1, the convolution kernel of the third convolution layer is set to be 3 x 3, the first convolution layer step size is set to be 2, the second convolution layer step size and the third convolution layer step size are set to be 1, and the edge padding is set to be 'same'. The first and second pooling layers are implemented by maximum pooling, the size of the pooling window is set to 3 × 3, and the step length is set to 2.
And 2.2, constructing 1 convolution module.
And building a convolution module formed by connecting the first convolution layer and the second convolution layer in series.
The convolution kernel sizes of the first convolution layer and the second convolution layer are set to be 1 multiplied by 1, the step size is set to be 1, and the edge filling is set to be 1.
And 2.3, constructing an upsampling sub-network.
An up-sampling sub-network which is formed by sequentially connecting a first up-sampling module, a second up-sampling module, a third up-sampling module and a CBR module in series is built as a decoder; the first to third upsampling modules have the same structure, and each upsampling module has the following structure in sequence: a first convolution layer, a second convolution layer, a third convolution layer, a BatchNorm layer, an active layer, and an up-sampling layer.
The convolution kernel sizes of the first to third convolution layers are all set to 3 × 3, the step sizes are all set to 1, and the edge padding is all set to 1. The slope of the negative part of the active layer is set to 0.2. The active layer is implemented using the LeakyReLU function. The up-sampling layer adopts a double nearest neighbor up-sampling mode.
The structure of the CBR module is as follows in sequence: a convolutional layer, a BatchNorm layer, an active layer. The convolution kernel size of the convolutional layer is set to 3 × 3, the step size is set to 1, and the edge padding is set to 1. The slope of the negative part of the active layer is set to 0.2. The active layer is implemented using the LeakyReLU function.
And 2.4, respectively connecting the inputs of three up-sampling modules in the up-sampling sub-network with the outputs of three inclusion modules in the inclusion-v 3 network and connecting the outputs of the input modules with the inputs of the CBR module by using a concatemate mode to form the inclusion-v 3 network with a jump connection structure.
And 2.5, sequentially connecting the Incep-v 3 network with the hop connection structure, the convolution module and the up-sampling sub-network in series to form the Incep-v 3U-Net network.
And 3, training an inclusion-v 3U-Net network.
Inputting the training set into an increment-v 3U-Net network, and iteratively updating parameters of each layer in the increment-v 3U-Net network by using a gradient descent method until a total cost function is converged to obtain the trained increment-v 3U-Net network.
The total cost function L is as follows:
Figure BDA0003694620560000051
where L represents the total cost function, N represents the total number of samples in the training set, N is set to 20 in the embodiment of the present invention, Σ (-) represents the summation operation, j represents the sequence number of samples in the training set, L represents the sequence number of samples in the training set, and GCL (j) representing the hierarchical contour cost function for the jth sample in the training set.
The hierarchical contour cost function L GCL (j) The following were used:
Figure BDA0003694620560000061
wherein M is gc Hierarchical contour weight matrix M representing the jth sample in the training set gc Weight of the ith pixel, y i And
Figure BDA0003694620560000062
respectively representing the real value and the predicted value of the ith pixel in the jth sample in the training set, and log (-) representing the logarithm operation with the base 2.
The hierarchical outline weight matrix M gc The following were used:
Figure BDA0003694620560000063
wherein G represents the inward hierarchical progression of the contour matrix, the interior of the target area is taken as the positive direction of the contour matrix, the exterior of the target area is taken as the negative direction of the contour matrix, and G represents the sequence number of the progression in the contour matrix. Gauss (-) denotes the Gaussian convolution function, K g Showing the G-th order profile range over-parameter, in the present invention
Figure BDA0003694620560000064
K represents a hyperparameter controlling the weight of the whole outline, and K is more than or equal to 1.Γ represents a division result of the profile of each level, and δ represents a determination result of the profile range. In the embodiment of the present invention, G is 3, the convolution kernel of the gaussian convolution is set to 2 × 2, the variance is 1.5, and K is 4.
The division result Γ of the profile at each level is obtained by the following formula:
Figure BDA0003694620560000065
wherein Y represents a label image in the training set, S represents a convolution kernel used when performing an expansion or erosion operation on the label image, (Y; S) α The method comprises the steps of representing expansion or corrosion operation on a label image, representing the expansion operation on the label image when alpha is a positive number, and representing the corrosion operation on the label image when alpha is a negative number, wherein alpha is set to be g +1, g, +1, -1 and g-1 respectively.
The determination result δ of the contour range is obtained by the following formula:
δ=255·One-((Y;S) +1 )-(Y;S) -1 ))
where One represents a matrix with all 1 element values, the matrix size is set to 224 × 224 in the embodiment of the present invention.
And 4, predicting the remote sensing image.
And 4.1, sequentially cutting all remote sensing images to be predicted into 224 multiplied by 224, and marking the serial numbers of the cut remote sensing images to be predicted.
And 4.2, sequentially inputting the images with the label serial numbers into a trained increment-v 3U-Net network to obtain a cutting result of the remote sensing image after cutting.
And 4.3, sequentially splicing the segmentation results of the cut remote sensing images according to the sequence numbers to obtain a final segmentation result.
The effects of the present invention can be further illustrated by the following simulation experiments.
1. Simulation experiment conditions are as follows:
the hardware platform of the simulation experiment of the invention is as follows: the processor is Intel i7-6850K, the main frequency is 3.60GHz, the memory is 64G, the video card is NVIDIA TITAN Xp, and the video memory is 12 GB.
The software platform of the simulation experiment of the invention is as follows: ubuntu operating system and python 3.6.
The data used in the simulation experiment of the invention comes from 20 randomly selected groups of data in a remote sensing building data set Massachusetts and consists of 151 aerial images of the Boston area, the size of each image is 1500 multiplied by 1500, and the foreground of the data set is a building and comprises various buildings.
2. Simulation experiment content and result analysis:
the simulation experiment of the invention is carried out by adopting the method of the invention and the ablation experiment method of the prior art according to the following steps respectively.
The ablation experimental method is characterized in that binary cross entropy and a contour cost function are respectively adopted as cost functions to train the increment-v 3U-Net network.
Step a, data in the Massachusetts dataset was clipped to a size of 224 × 224 as the standard image size for this experiment.
And step B, selecting 4464 remote sensing images corresponding to the label images with the foreground pixel proportion of more than 10% in the cut label images as samples to form an integral data set of the experiment.
And C, randomly disordering the samples in the whole data set, wherein 3960 samples are divided into a training set, 144 samples are divided into a verification set, and 360 samples are divided into a test set.
And D, inputting the training set data into an inclusion-v 3U-Net network for training, and training for 30 epochs in total. In training, the performance of the current model is evaluated on the validation set by using the evaluation index every 5 epochs.
And E, after training, evaluating the performance of the model with the best verification effect on the test set.
The effect of the present invention will be further described with reference to the simulation diagram of fig. 3.
Fig. 3(a) is a remote sensing image of a test set. Fig. 3(b) is a label diagram corresponding to fig. 3(a) in the test set. Fig. 3(c) is a result diagram of segmenting the remote sensing image by using a binary cross entropy as a cost function. Fig. 3(d) is a result diagram of segmenting the remote sensing image by using the contour cost function as the cost function. Fig. 3(e) is a diagram showing the result of segmenting a remote sensing image by the method of the present invention.
As can be seen from fig. 3(c), 3(d) and 3(e), the segmentation result of the present invention has a clearer foreground contour segmentation and a more precise contour compared to the segmentation results of the two ablation experimental methods.
In order to verify that the segmentation effect of the invention is superior to that of two ablation experimental methods. The segmentation results are evaluated by calculating three evaluation indexes of Recall rate Recall, accuracy Acc and intersection ratio IoU by using the following formulas, and all the calculation results are drawn as table 1: .
The formula for Recall rate recalls is as follows:
Figure BDA0003694620560000081
where TP is the number of samples that are actually positive samples and the prediction result is also positive samples, and FN is the number of samples that are actually positive samples and the prediction result is negative samples.
The formula for the accuracy Acc is as follows:
Figure BDA0003694620560000082
where FP is the number of samples that are actually negative samples and the prediction result is positive samples, and TN is the number of samples that are actually negative samples and the prediction result is also negative samples. The formula for the intersection ratio IoU is as follows:
Figure BDA0003694620560000083
TABLE 1 quantitative analysis table of segmentation results of the inventive method and the ablation experimental method in simulation experiments
Figure BDA0003694620560000084
The larger values of Recall, Acc and IoU in Table 1 represent more accurate results of segmentation. The combination of the table 1 shows that the Recall rate Recall is 67.81%, the accuracy rate Acc is 89.68%, and the intersection ratio IoU is 57.00%, and the three indexes are all higher than 2 ablation experimental methods, so that the effectiveness of the hierarchical contour cost function provided by the invention is proved, and the accuracy of remote sensing image segmentation can be effectively improved.

Claims (2)

1. A remote sensing image semantic segmentation method based on a hierarchical contour cost function is characterized by comprising the steps of constructing an inclusion-v 3U-Net segmentation network, and monitoring the loss of the segmentation network by using the hierarchical contour cost function; the segmentation method comprises the following specific steps:
step 1, generating a training set:
step 1.1, randomly selecting at least 20 high-resolution remote sensing images with balanced foreground and background ratios and corresponding label images; cropping each high resolution image and corresponding label image to a size of 224 x 224 pixels;
step 1.2, selecting a label image with foreground pixels accounting for more than 10% of the cut label image and a remote sensing image corresponding to the label image to form a training set;
step 2, constructing an inclusion-v 3U-Net segmentation network:
step 2.1, 1 convolution module is constructed:
building a convolution module formed by connecting a first convolution layer and a second convolution layer in series;
setting the sizes of convolution kernels of the first convolution layer and the second convolution layer to be 1 multiplied by 1, setting the step length to be 1 and setting the edge filling to be 1;
step 2.2, constructing an up-sampling sub-network:
an up-sampling sub-network which is formed by sequentially connecting a first up-sampling module, a second up-sampling module, a third up-sampling module and a CBR module in series is built as a decoder; the first to third upsampling modules have the same structure, and each upsampling module has the following structure in sequence: a first convolution layer, a second convolution layer, a third convolution layer, a BatchNorm layer, an active layer and an upper sampling layer;
the convolution kernels of the first convolution layer, the second convolution layer and the third convolution layer are all set to be 3 multiplied by 3, the step length is all set to be 1, the edge filling is all set to be 1, the slope of the negative part of the active layer is set to be 0.2, the active layer is realized by adopting a LeakyReLU function, and the up-sampling layer adopts a double nearest neighbor up-sampling mode;
the structure of the CBR module is as follows in sequence: a convolutional layer, a BatchNorm layer, an active layer; setting the convolution kernel size of the convolution layer to be 3 multiplied by 3, setting the step length to be 1, setting the edge filling to be 1, setting the slope of the negative part of the active layer to be 0.2, and realizing the active layer by adopting a LeakyReLU function;
step 2.3, respectively connecting the inputs of three up-sampling modules in an up-sampling sub-network with the outputs of three inclusion modules in an inclusion-v 3 network and connecting the outputs of input modules with the input of a CBR module by using a concatemate mode to form an inclusion-v 3 network with a jump connection structure;
2.4, sequentially connecting an inclusion-v 3 network with a hop connection structure, a convolution module and an up-sampling sub-network in series to form an inclusion-v 3U-Net network;
step 3, training an increment-v 3U-Net network:
inputting the training set into an increment-v 3U-Net network, and iteratively updating parameters of each layer in the increment-v 3U-Net network by using a gradient descent method until a total cost function is converged to obtain the trained increment-v 3U-Net network;
step 4, predicting the remote sensing image:
step 4.1, sequentially cutting all remote sensing images to be predicted into 224 multiplied by 224, and marking sequence numbers on the cut remote sensing images to be predicted;
step 4.2, sequentially inputting the images with the label serial numbers into a trained increment-v 3U-Net network to obtain a cutting result of the remote sensing image after cutting;
and 4.3, sequentially splicing the segmentation results of the cut remote sensing images according to the sequence numbers to obtain a final segmentation result.
2. The method for semantically segmenting the remote sensing image based on the hierarchical contour cost function according to claim 1, wherein the total cost function in the step 3 is as follows:
Figure FDA0003694620550000021
wherein, L represents the total cost function, N represents the total number of samples in the training set, sigma (DEG) represents the summation operation, j represents the serial number of the samples in the training set, L GCL (j) Representing a hierarchical contour cost function of a jth sample in the training set;
the hierarchical contour cost function L GCL (j) The following were used:
Figure FDA0003694620550000022
wherein M is gc Hierarchical contour weight matrix M representing the jth sample in the training set gc Weight of the ith pixel, y i And
Figure FDA0003694620550000023
respectively representing the real value and the predicted value of the ith pixel in the jth sample in the training set, and log (-) representing the logarithm operation with 2 as a base;
the hierarchical contour weight matrix M gc The following were used:
Figure FDA0003694620550000024
wherein G represents the inward hierarchical progression of the contour matrix, the interior of the target area is taken as the positive direction of the contour matrix, the exterior of the target area is taken as the negative direction of the contour matrix, G represents the sequence number of the progression in the contour matrix, Gauss (·) representsGaussian convolution function, K g Representing a g-level profile range hyper-parameter, gamma representing a division result of each level of profile, and delta representing a judgment result of the profile range;
the division result Γ of the profile at each level is obtained by the following formula:
Figure FDA0003694620550000031
wherein Y represents a label image in the training set, S represents a convolution kernel used when performing an expansion or erosion operation on the label image, (Y; S) α The method comprises the steps of representing expansion or corrosion operation on a label image, representing the expansion operation on the label image when alpha is a positive number, and representing the corrosion operation on the label image when alpha is a negative number;
the determination result δ of the contour range is obtained by the following formula:
δ=255·One-((Y;S) +1 )-(Y;S) -1 ))
wherein One represents a matrix with all 1 element values.
CN202210675935.4A 2022-06-15 2022-06-15 Remote sensing image semantic segmentation method based on hierarchical contour cost function Pending CN114972759A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210675935.4A CN114972759A (en) 2022-06-15 2022-06-15 Remote sensing image semantic segmentation method based on hierarchical contour cost function

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210675935.4A CN114972759A (en) 2022-06-15 2022-06-15 Remote sensing image semantic segmentation method based on hierarchical contour cost function

Publications (1)

Publication Number Publication Date
CN114972759A true CN114972759A (en) 2022-08-30

Family

ID=82963891

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210675935.4A Pending CN114972759A (en) 2022-06-15 2022-06-15 Remote sensing image semantic segmentation method based on hierarchical contour cost function

Country Status (1)

Country Link
CN (1) CN114972759A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115810139A (en) * 2022-12-16 2023-03-17 西北民族大学 Target area identification method and system of SPECT image
CN116030080A (en) * 2023-02-03 2023-04-28 北京博睿恩智能科技有限公司 Remote sensing image instance segmentation method and device
CN116310883A (en) * 2023-05-17 2023-06-23 山东建筑大学 Agricultural disaster prediction method based on remote sensing image space-time fusion and related equipment

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115810139A (en) * 2022-12-16 2023-03-17 西北民族大学 Target area identification method and system of SPECT image
CN115810139B (en) * 2022-12-16 2023-09-01 西北民族大学 Target area identification method and system for SPECT image
CN116030080A (en) * 2023-02-03 2023-04-28 北京博睿恩智能科技有限公司 Remote sensing image instance segmentation method and device
CN116030080B (en) * 2023-02-03 2023-08-22 北京博睿恩智能科技有限公司 Remote sensing image instance segmentation method and device
CN116310883A (en) * 2023-05-17 2023-06-23 山东建筑大学 Agricultural disaster prediction method based on remote sensing image space-time fusion and related equipment
CN116310883B (en) * 2023-05-17 2023-10-20 山东建筑大学 Agricultural disaster prediction method based on remote sensing image space-time fusion and related equipment

Similar Documents

Publication Publication Date Title
CN110136154B (en) Remote sensing image semantic segmentation method based on full convolution network and morphological processing
CN109919108B (en) Remote sensing image rapid target detection method based on deep hash auxiliary network
CN108961235B (en) Defective insulator identification method based on YOLOv3 network and particle filter algorithm
CN108664971B (en) Pulmonary nodule detection method based on 2D convolutional neural network
CN114120102A (en) Boundary-optimized remote sensing image semantic segmentation method, device, equipment and medium
CN114972759A (en) Remote sensing image semantic segmentation method based on hierarchical contour cost function
CN107808138B (en) Communication signal identification method based on FasterR-CNN
CN112949783B (en) Road crack detection method based on improved U-Net neural network
CN111833322B (en) Garbage multi-target detection method based on improved YOLOv3
CN114897779A (en) Cervical cytology image abnormal area positioning method and device based on fusion attention
CN111461213A (en) Training method of target detection model and target rapid detection method
CN113012177A (en) Three-dimensional point cloud segmentation method based on geometric feature extraction and edge perception coding
CN114627383A (en) Small sample defect detection method based on metric learning
CN114332473A (en) Object detection method, object detection device, computer equipment, storage medium and program product
CN113362277A (en) Workpiece surface defect detection and segmentation method based on deep learning
CN113221956A (en) Target identification method and device based on improved multi-scale depth model
CN115880529A (en) Method and system for classifying fine granularity of birds based on attention and decoupling knowledge distillation
CN116012709B (en) High-resolution remote sensing image building extraction method and system
CN112613354A (en) Heterogeneous remote sensing image change detection method based on sparse noise reduction self-encoder
CN111860465A (en) Remote sensing image extraction method, device, equipment and storage medium based on super pixels
CN114419078B (en) Surface defect region segmentation method and device based on convolutional neural network
CN113971764B (en) Remote sensing image small target detection method based on improvement YOLOv3
CN113177563B (en) Post-chip anomaly detection method integrating CMA-ES algorithm and sequential extreme learning machine
CN114332107A (en) Improved tunnel lining water leakage image segmentation method
CN112348062A (en) Meteorological image prediction method, meteorological image prediction device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination