CN114663439A

CN114663439A - Remote sensing image land and sea segmentation method

Info

Publication number: CN114663439A
Application number: CN202210280187.XA
Authority: CN
Inventors: 郭海涛; 卢俊; 龚志辉; 阎晓东; 张衡; 林雨准; 刘相云; 高慧
Original assignee: Information Engineering University of PLA Strategic Support Force
Current assignee: Information Engineering University of PLA Strategic Support Force
Priority date: 2022-03-21
Filing date: 2022-03-21
Publication date: 2022-06-24

Abstract

The invention relates to a remote sensing image sea-land segmentation method, and belongs to the technical field of remote sensing image processing. The invention adopts the framework mode of an encoder and a decoder, realizes the extraction of different size characteristics by using a multilayer coding module of a Res2Net network, and performs characteristic enhancement on characteristic graphs of different scales by using a compression and attention module so as to strengthen the information of the sea-land weak boundary; in the training process, a deep supervision strategy is used, and the output results of different decoding modules are respectively trained, so that the capability of network learning of target boundary information is enhanced, and the accuracy of remote sensing image sea-land boundary segmentation is improved. Two groups of remote sensing image data sets containing different coast types are selected for experiments, and the results show that the method can obtain more accurate sea and land segmentation results and clearer and more complete sea and land boundaries.

Description

Remote sensing image land and sea segmentation method

Technical Field

The invention relates to a remote sensing image sea-land segmentation method, and belongs to the technical field of remote sensing image processing.

Background

With the rapid development of remote sensing technology, remote sensing data acquisition means are more diversified, the spatial resolution, the time resolution, the spectral resolution and the radiation resolution of remote sensing images are also continuously improved, and sufficient data support is provided for the research of large-scale coastal areas. On the remote sensing image, the ocean area and the land area are effectively distinguished, the sea and land segmentation with high speed and high precision is realized, and the method has important application values for coastline extraction, island reef identification, offshore target detection and the like. Traditional land and sea segmentation comprises a threshold segmentation method, an active contour model method, a region growing method, a Markov random field-based method and the like, and mainly depends on the difference of land and sea in images in the aspects of gray scale, texture and the like to perform segmentation, so that a better segmentation result can be obtained under the conditions of obvious land and sea boundary gray scale difference and simple water edge shape, but the segmentation result is easily interfered by noise, the result needs to be regulated and controlled by artificially set parameters, and the robustness is poor.

Deep learning, particularly Convolutional Neural Networks (CNNs), has achieved superior performance over traditional methods in the fields of image classification, target detection, semantic segmentation, and the like, and the occurrence of full Convolutional Neural Networks (FCNs) has brought attention to the semantic segmentation technology based on CNNs. Most of the semantic segmentation networks proposed in recent years are based on the design principle of FCN, and SegNet, U-Net, and their variants (UNet + +, restore) are typical networks, which employ an encoding-decoding structure, and are composed of an encoding path for feature extraction and a decoding path for restoring the resolution of a feature map, and thus can obtain a more detailed semantic segmentation result by fully utilizing semantic information of each layer. The PSPNet utilizes a Pyramid Pooling module (SPP) to extract multi-scale information of an image, the Deeplabv3+ applies a hole convolution to the semantic segmentation field, and provides an empty space Pyramid Pooling module (ASPP) which can better extract the multi-scale information of the image by carrying out parallel sampling on the empty convolutions with different sampling rates. A double attention Network (DANet), a Cross attention Network (CCNet) and the like introduce an attention mechanism into a semantic segmentation Network, areas with similar features in an image have the same response through correlation measurement calculation, and therefore learning of the features of a specific area and utilization of effective information by the semantic segmentation Network are enhanced. In addition, a Bilateral Segmentation Network (BiseNet), a BiseNet v2, etc. are used to balance the speed and accuracy of the semantic Segmentation Network, so as to achieve the purpose of real-time Segmentation.

The rapid development of the semantic segmentation network provides sufficient theoretical support for the remote sensing image sea-land segmentation by using CNNs, and at present, scholars construct a deep network based on an encoding-decoding architecture and simultaneously eliminate the hole phenomenon in a prediction result by combining a post-processing method. Some have constructed a deep DeepUNet network than U-Net by using ResNet's residual Block (Res _ Block), designed DownBlock and UpBlock modules to replace convolutional layers in the encoding-decoding structure, and obtained better segmentation results than the original U-Net in the optical remote sensing image sea-land segmentation task. Also, Res _ Block is used to construct Res-UNet, and a CRF (Conditional Random Field, CRF) model and morphological operation are used to perform post-processing on the segmentation result. Pourba and the like construct a network with proper depth to realize end-to-end sea-land segmentation by using aggregated multi-scale context information of a densely connected residual block system based on a standard U-shaped network structure. In addition, the sea-land segmentation task also focuses on the accuracy of sea-land boundary segmentation, and some scholars propose a multi-task framework aiming at the problem, and the sea-land segmentation accuracy is improved by expanding network branches. For example, a multitask network SeNet is proposed to perform sea-land segmentation and edge detection simultaneously, but a large number of standard convolutions exist in the SeNet network, a large amount of storage space is occupied, and a large amount of running time is consumed. A multitask network fusion net combined with edge information is also proposed, which extends the output of a branch-structured edge network from an encoding-decoding structure, and trains and learns boundary semantic information in parallel with a segmentation network to obtain a segmentation result with consistent space and good boundary positioning. And the Dysoma algorithm and the like reduce the number of convolution layers of a spatial path in the BiSeNet network aiming at the characteristics of the SAR image, and simultaneously provide an edge enhancement loss function strategy to improve the segmentation capability of the model.

The coastline types in China are complex and various, spectral, texture and shape characteristics of land objects in different types of coastlines are different, weak boundaries (silt coastlines) and strong boundaries (artificial coastlines) are alternately distributed, and the existing research can obtain a better segmentation result under the condition that the sea-land boundaries are simple, but cannot realize a sea-land segmentation task under a complex scene; in addition, the remote sensing image sea-land segmentation task also focuses on the boundary segmentation result, the pixel proportion of the sea-land boundary in the remote sensing image is low, and the problem of unbalanced samples exists. Therefore, the accuracy of the segmentation result of the existing network at the sea-land boundary is difficult to guarantee, and the extraction capability of the network to the boundary is difficult to embody by only evaluating the region segmentation result in the research.

Disclosure of Invention

The invention aims to provide a remote sensing image sea-land segmentation method to solve the problem that the segmentation at the sea-land boundary of the current remote sensing image is inaccurate.

The invention provides a remote sensing image land and sea segmentation method for solving the technical problems, which comprises the following steps:

1) obtaining a remote sensing image, and performing label making on a remote sensing image data set to form corresponding training set data;

2) establishing a remote sensing image sea-land segmentation model, wherein the sea-land segmentation model adopts an encoder and a decoder structure, the encoder adopts a plurality of layers of encoding modules, and each layer of encoding module is used for extracting different scale characteristics of the remote sensing image; the decoder comprises a plurality of layers of decoding modules corresponding to the plurality of layers of coding modules, wherein each layer of decoding module is used for up-sampling the output characteristics of the corresponding layer of coding module and the output characteristics of the previous layer of coding module to the size of an original image, fusing the characteristics processed by each layer of decoding module and carrying out edge detection on the fused characteristic image;

3) training the established remote sensing image sea-land segmentation model by using the training set data, respectively training each layer of decoding module by adopting a depth supervision strategy during training, and constructing a total loss function of the remote sensing image sea-land segmentation model according to the loss function of each layer of decoding module;

4) and acquiring the remote sensing image to be segmented, and inputting the remote sensing image to be segmented into the trained remote sensing image sea-land segmentation model so as to realize the sea-land segmentation of the remote sensing image to be segmented.

The invention adopts the framework modes of the encoder and the decoder, realizes the extraction of different size characteristics by utilizing a multilayer coding module, and improves the expression of the remote sensing image by extracting the different size characteristics; in the training process, a deep supervision strategy is used, and the output results of different decoding modules are respectively trained, so that the capability of network learning of target boundary information is enhanced, and the accuracy of remote sensing image sea-land boundary segmentation is improved.

Furthermore, the multilayer coding module adopts a Res2Net network and comprises 5 layers of coding modules, the first layer of coding module comprises a convolution layer and a maximum pooling layer and is used for carrying out feature extraction on the input remote sensing image, and the other layers of coding modules all adopt residual blocks and are used for processing the output result of the previous layer of coding module.

Further, the residual Block adopts Res2_ Block.

The invention divides the feature mapping in the Res2_ Block residual Block into a plurality of channel groups, and designs a connection similar to the residual between different channel groups, so that the network improves the multi-scale expression capability on a finer-grained level.

Furthermore, the remote sensing image sea-land segmentation model also comprises a compression and attention module, and the output result of each layer of coding module is processed by the corresponding compression and attention module.

The invention utilizes the compression and attention module to promote useful features and restrain features which are not useful for the current task, and the extraction capability of the network to the weak boundary features is enhanced by giving greater weight to the feature map at the weak sea-land boundary.

Further, each layer decoding module includes one Res2_ Block and one Upsample for gradually restoring the feature map to the original input image size.

Further, the loss function during the training of the remote sensing image sea-land segmentation model is as follows:

wherein M is equal to the number of layers of the decoding module and is also the number of layers of the encoding module;

represents a loss function of the mth layer decoding module; l_fuseRepresenting the loss function after the fusion of the decoding modules of each layer;

and W_fuseRespectively representing the weight of the loss function of the m-th layer decoding module and the weight of the fused loss function.

Further, the loss function of each layer of decoding module and the fused loss function both adopt a BCE loss function of a semantic segmentation two-classification task.

Further, the BCE loss function is:

wherein (r, c) represents the coordinates of the pixel point, H, W represents the height and width of the image, P_G(r,c)And P_S(r,c)Respectively representing the true value and the predicted value of the pixel point.

Drawings

FIG. 1 is a schematic structural diagram of a sea-land segmentation model of a remote sensing image adopted by the present invention;

FIG. 2a is a block diagram of the residual block of ResNet;

FIG. 2b is a block diagram of the residual block of Res2Net employed in the present invention;

FIG. 3a is a block diagram of an SE module;

FIG. 3b is a block diagram of an SE module employed in the present invention;

FIG. 4a is the training Data and corresponding sample labels for the first scene in Data1 in the experimental example of the present invention;

FIG. 4b shows training Data and corresponding sample labels for a second scenario in Data1 in an example of the present invention;

FIG. 4c is the training Data and corresponding sample labels for the first scene in Data2 in the experimental example of the present invention;

FIG. 4d shows training Data and corresponding sample labels for the second scenario in Data2 in the experimental example of the present invention;

FIG. 5 is a graph showing a comparison of the segmentation results of the present invention and other conventional segmentation models in the experimental examples.

Detailed Description

The following further describes embodiments of the present invention with reference to the drawings.

The method adopts a network model with a coding-decoding structure as a sea-land segmentation model of the remote sensing image, wherein a novel backbone network Res2Net is adopted as a coder to extract multi-scale features of the image, and the extracted features of each layer are processed by a compression and attention module to enhance the extraction capability of the network to weak boundaries; during decoding, upsampling (Upsample) of each feature map is used as side output of a network, the side outputs are combined to realize multi-level feature fusion, a depth supervision strategy is used for each feature fusion result, and finally edge detection is carried out on a sea and land segmentation result output by the network, so that a water line is obtained, and sea and land segmentation of a remote sensing image is realized.

1. And acquiring a remote sensing image data set, and performing label making on the remote sensing image data set to form corresponding training set data.

For the embodiment, the sea and land categories of the remote sensing image in the data set can be labeled according to pixels to generate a corresponding label remote sensing image, and the generated label remote sensing image is used as a training set.

2. And establishing a remote sensing image sea-land segmentation model.

The invention establishes a remote sensing image sea-land segmentation model as shown in figure 1, wherein the model is a Multi-Scale Deep Supervision U-shaped network (MSDSONT), and comprises an encoder and a decoder, wherein the encoder adopts a Res2Net network and a compression and attention module; the decoder comprises 5 layers of decoding modules which respectively correspond to the output characteristics of 5 layers of the encoder in different scales, and each layer of decoding module respectively consists of Res2_ Block and Upesample.

The convolutional neural network can extract more abstract features by continuously increasing the depth and parameters of the network, and the deepening of the network layer number can cause the problems of gradient disappearance, explosion, network degradation and the like, so that the network is difficult to converge, and therefore, a ResNet network is adopted to overcome the situation. The residual block structure adopted by the ResNet network is shown in figure 2a, and input information can be directly transmitted to a later layer by adding a jump connection between input and output, so that the difficulty of network learning is simplified. On the basis, in order to improve the multi-scale expression capability on a finer-grained level, the invention adopts the Res2Net network to extract the multi-scale features.

The encoder in this embodiment uses a Res2Net-50 network, taking an input image size of 512 × 512 × 3 as an example, and each layer of detailed information is shown in table 1 and includes 5 layers of encoding modules and 5 compression and attention modules, and one layer of encoding module corresponds to one compression and attention module. The five-layer coding modules are Encoder _1, Encoder _2, Encoder _3, Encoder _4 and Encoder _5 respectively, and the 5 compression and attention modules are SA _1, SA _2, SA _3, SA _4 and SA _5 respectively. Encoder _1 includes convolutional layers and max-pooling layers, and Res2_ Block is adopted for Encoder _2, Encoder _3, Encoder _4 and Encoder _ 5.

TABLE 1

The residual unit structure (Res2_ Block) is shown in FIG. 2b, which divides the feature map into s subsets after 1 × 1 convolutional layer, each with x_iRepresents where i ∈ {1,2,3, …, s }. Each feature map subset x_iHas the same space size, but the channel number is 1/s of the original input feature map, and x is divided₁Each outer x_iAll have their corresponding 3X 3 convolutional layers, each with K_iDenotes x_iPassing through a convolutional layer K_iRear output is set to y_iThe feature map is then sub-set x_iAnd through K_i-1Feature map subset x of (2)_i-1Added input K_iTo obtain an output y_iWherein y is_iThe expression is defined as follows:

from this it can be seen that each 3 x 3 convolution kernel K in Res2_ Block_iAll previous information of the feature map subset, i.e. { x }, can be received_jJ ≦ i } of feature information, such that the feature map subset x_jAfter a 3 x 3 convolution, a ratio x is obtained_jOutputting results with larger receptive fields; to allow information fusion at different scales, all y are combined_iParallel and fused using a1 × 1 convolution. The grouping and merging strategy enables the convolutional layer to process the feature map more effectively, so that the Res2_ Block output feature map contains different receptive fields, which is beneficial to extracting multi-scale features, and thus, the network can capture local or global image features at a finer granularity level.

The input image is processed by a convolution layer and a maximum pooling layer of the Encoder _1 and then sequentially enters four encoding layers of the Encoder _2, the Encoder _3, the Encoder _4 and the Encoder _5, and the output result of each encoding layer respectively enters a corresponding compression and attention module.

SENET (Squeeze-and-Excitation Networks is the earliest oneThe proposed channel attention mechanism is divided into two parts, compression (Squeeze) and Excitation (Excitation). As shown in FIG. 3a, the known input feature map X ∈ R^C*H*WC, H, W respectively representing the number, width and height of channels of the input feature map, compressing X by using an Avg_resAnd multiplying channel by channel to realize channel recalibration of the characteristic diagram. Output X of SE Module_outCan be expressed as:

X_out＝ω*X_res+X_res (3)

wherein the content of the first and second substances,

sigmoid function, σ (-) is ReLU activation function, ω₁、ω₂For the parameters of the two fully connected layers, avp (X) represents the global average pooling operation for X.

The SA module expands the re-weighting channel of the SE module into two parts, compression (Squeeze) and Attention (Attention), as shown in fig. 3b, where the Squeeze part is consistent with the SE module, and the Attention part introduces two convolution layers with convolution kernel size of 3 × 3 to gather non-local features, and then according to the importance of each feature channel, promotes useful features and suppresses features that are not useful for the current task. The output of two convolutional layers with the feature map X passing through the weighting channel is

Up-sampling it to X_resSize is given by X_attnX is to be_attnAnd X_resMultiplication channel by channel and X_attnAdding channel by channel to obtain output X_outThe definition is as follows:

X_out＝X_attn*X_res+X_attn (5)

wherein the content of the first and second substances,

represents the outputs of the weighted convolutional layers Conv1 and Conv2, and Up (-) represents the upsampling function. The compression and attention module of the invention adopts an SA module to realize that a feature map at the weak sea-land boundary is endowed with larger weight so as to enhance the extraction capability of the network on the weak boundary features.

Corresponding to 5 coding layers in the encoder, the decoder in this embodiment also employs 5 decoding modules, each decoding module is respectively composed of a Res2_ Block and an upscale, and is used for gradually restoring the feature map to the original input image size, and the size of the feature map is doubled every time the feature map passes through an upscale layer. Each decoding layer is fused with the feature map output by the corresponding level SA so as to better utilize the detail information of the shallow feature map and the semantic information of the deep feature map, thereby generating a more accurate sea and land segmentation result. In this embodiment, the feature maps of each layer at the decoding stage are up-sampled to the size of the original map as Side-Output (Side-Output), and the feature maps are subjected to dimensional superposition (Concat) as the final Output result of the network, so as to realize multi-layer feature fusion and improve the accuracy of land and sea segmentation. The msdsonnet decoding layer structure details are shown in table 2.

TABLE 2

3. And training the constructed remote sensing image sea-land segmentation model by adopting a depth supervision strategy.

The invention adopts a Deep supervision strategy (DS), and carries out loss calculation for each lateral output respectively, namely, the output of each decoding module is trained respectively, and the loss of each lateral output is reflected in a final loss function to supervise the training process of the network, wherein the calculation formula is defined as:

wherein the content of the first and second substances,

denotes the loss of lateral output,/_fuseWhich represents the total loss of the final output,

and W_fuseRepresenting a weight of each penalty, wherein the penalty function selects a BCE penalty function for the semantic segmentation binary task, defined as follows:

wherein (r, c) represents the coordinates of the pixel points, H, W represents the height and width of the image, P_G(r,c)And P_S(r,c)And respectively representing the true value and the predicted value of the pixel point. In the training process, the value of l tends to be the lowest through continuously learning the weight parameter of the network, thereby achieving the aim of network convergence.

And (3) training the sea-land segmentation model by using the training set formed in the step (1) in the above way to obtain the trained sea-land segmentation model.

4. And carrying out segmentation processing on the trained remote sensing image to be segmented.

Through the process, a trained sea-land segmentation model can be obtained, a remote sensing image to be segmented containing a sea-land boundary is obtained, the obtained remote sensing image is input into the sea-land segmentation model for segmentation, and a sea-land boundary segmentation result of the remote sensing image can be obtained.

In order to verify the effectiveness of the sea-land segmentation network adopted by the invention, two groups of open remote sensing image sea-land segmentation Data sets are selected for experiments, wherein the remote sensing image of an area A is recorded as a Data set 1(Data1), and the remote sensing image of an area B is recorded as a Data set 2(Data 2). Because the remote sensing image coverage is wide, and the demand of intensive prediction tasks such as semantic segmentation on computing resources is high, the large-amplitude remote sensing images of the two data sets are cut into a plurality of tiles during training and prediction, and the detailed information of the two data sets is shown in table 3.

TABLE 3

Two sets of Data set partial training Data and sample labels are shown in fig. 4a, 4b, 4c and 4d, where fig. 4a and 4b are training Data and corresponding sample labels in two scenes in Data1, fig. 4c and 4d are training Data and corresponding sample labels in two scenes in Data2, black pixels in the labels represent land areas, gray represents sea areas, and rivers and lakes on land are all regarded as land categories. In addition, the training label is subjected to edge detection to obtain a sea-land water line with the width of 1 pixel, and preparation is made for evaluating the accuracy of edge detection in the next step.

The present invention and the existing networks are evaluated for their respective properties on land-sea segmented data sets from two aspects, namely, region segmentation and boundary detection. F1 score (F1-score), average cross-Over ratio (MIOU) and average Absolute Error (MAE) were used as evaluation indexes for region segmentation, and F1 score (F1-score of boundary, F1-score-b) was used as an evaluation index for boundary detection.

(1) F1-score; f1-score is a harmonic mean value of precision and recall, the higher the F1 value is, the higher the accuracy of the network sea-land segmentation is indicated, and F1 herein refers to the average value of two categories of sea and land, and the expression is as follows:

in the formula, tp (true positive) indicates that a positive class is determined as a positive class, fp (false positive) indicates that a negative class is determined as a positive class, fn (false negative) indicates that a negative class is determined as a negative class, tn (true negative) indicates that a negative class is determined as a negative class, P and R are an accuracy and a recall, n is a segmentation class, and n is 2 in this test.

(2) MIOU; the IOU represents the ratio of the intersection and union of the target actual position and the predicted position, and MIOU is the average value of each type of IOU, and the expression is as follows:

(3) the MAE is to calculate the average absolute error between the prediction result and the true value label by taking a pixel point as a unit, so that the actual situation of the prediction error can be better reflected, and the prediction result obtained by the network is closer to the true value label graph as the MAE value is smaller, as shown in the following formula:

in the formula, P_iRepresenting the segmentation result, y, of the network output at pixel i_iIs the true value marked here.

(4) F1-score-b is defined as the accuracy of the predicted boundary pixel and the true-labeled boundary pixel in the beta pixel, and the calculation formula is as follows:

in the formula P_βAnd R_βRepresenting the accuracy and recall of the boundary pixels within the beta pixel, respectively, and the beta value of this experiment was set to 3. F1-score-b can be used as an evaluation standard of boundary segmentation quality, and edge extraction is carried out on segmentation results and mark truth values to obtain a segmentation boundary with the pixel width of 1.

In order to verify the effectiveness of the segmentation network adopted by the invention, the segmentation network is connected with U-Net, Deeplabv3+ and U²-Net and RAUNet for comparison. Wherein U-Net and Deeplabv3+ are classic semantic segmentation methods; RAUNet adopts coding-decoding structure, designs Attention enhancing Module (AAM) to fuse multi-level features and capture global context information; u shape²Net utilizes two-stage nested U-blocks and designed ReSidual U-blocks (RSU) to enable the network to capture richer feature information from shallow and deep layers. In the experimental process, comparison models such as U-Net are realized by using the published source codes, and all models are retrained on two sea and land segmentation data sets for comparison.

The experiment adopts a Pythroch machine learning framework under Windows, and the hardware environment is CPU Inter (R) XeoneE 2176G, GPU GTX 2080Ti and 11G video memory. Each network experiment was performed in the same environment, with each training parameter kept consistent, the Adam optimizer was selected, the batch size (batch size) was set to 4, the initial learning rate was set to 0.0001, and a total of 50 trains were performed.

The following compares the prediction results of the segmented network proposed by the present invention and the compared network on two sets of data sets. In order to more comprehensively compare the sea and land segmentation capabilities of various methods, two typical scenes are selected from the Data set 1(Data1) and are analyzed, and the two typical scenes are respectively marked as scene 1 and scene 2; two representative scenes from dataset 2(Data2) were selected for analysis, denoted as scene 3 and scene 4, respectively. Each scene image, label, and each network prediction result are shown in fig. 5, which is a scene 1, a scene 2, a scene 3, and a scene 4 from left to right. In addition, the boundary of the sea and land division of the network is overlapped with the original image, so that the effectiveness of the network in the sea and land division task is more visually shown.

Scene 1 is used to explore the extraction capability of each network at the weak sea-land boundaries such as silt and estuary, as shown in the leftmost column of images in fig. 5, the sea-land boundaries with a large amount of silt, U-Net, deeppabv 3+, U-Net, and upper white frame²The network and the RAUNet can not classify the sea and land correctly, and the segmentation network overcomes the interference of silt and obtains an accurate segmentation result at the boundary of the weak sea and land. The lower white box contains a large number of estuaries, and the part between the estuaries and the first bridge is conventionally defined as the sea area, U-Net, Deeplabv3+, U²Net completely subdivides the area into land, RAUNet only correctly classifies partial sea areas, and the sea and land segmentation result of the segmentation network at the river mouth is more consistent with the convention. Scene 2 includes a hydraulic structure with a slender structure such as a harbor and a breakwater, and is used for detecting the sea-land segmentation capability of the network on an artificial coast with a complex boundary, as shown in the second row of images on the left side of fig. 5, as can be seen from white frames, U-Net and RAUNet can extract parts of the breakwater area, but the segmentation result is still not accurate enough, and U-Net²Net and Deeplabv3+ completely divide the breakwater into sea areas in a wrong way, which shows that fusing high-level and low-level semantic information can improve the network segmentation performance to a certain extent, but is not ideal for boundary and detail processing. The network of the invention uses a deep supervision strategy in the training process, effectively retains the edge detail information of the breakwater, extracts relatively complete and continuous sea-land boundaries in artificial coast areas with complex boundaries, and has a segmentation result obviously superior to other networks. Particularly, the left white frame area of the second row image on the left side of fig. 5 is surrounded by the port, so that the networks such as U-Net, depllabv 3+ ignore the sea area characteristics of the area and wrongly divide the area into the land, and the dividing network of the invention can distinguish the port and sea area categories and obtain more accurate sea and land dividing results.

A scene 3 is a rock coast area and is used for exploring the land and sea segmentation capability of each network under the complex land background, as shown in a third column of image white circle area on the left side of fig. 5, the land of the area contains interference factors such as rocks and vegetation, so that the land and sea segmentation background is complex, and the segmentation result of the area has an adhesion phenomenon because the contrast networks such as U-Net cannot extract deeper semantic information; the segmentation network of the invention introduces an attention mechanism to strengthen the characteristics of the sea-land boundary, and obtains a complete and accurate sea-land segmentation result in a coastal region with a complex background environment. The scene 4 is a coast image including a plurality of islands for detecting the land and sea division capability of the network under the condition of complex shape of the water line. As shown in the rightmost image of fig. 5, the sea-land boundary of the island is usually obvious, and the segmentation result of each network in this type of image is obviously better than that in other scenes, but the island is irregular in shape and size, which easily causes the phenomenon of a severe edge depression (white circle region) on the water line, and each comparative network is difficult to process the complex boundary information in this region, and the phenomenon of classification error occurs. The network of the invention extracts the multi-scale information of the image by using Res2Net, makes up the ambiguity caused by the insufficient information of the local area, overcomes the interference caused by the water line recess, obtains more accurate sea and land segmentation results, and extracts clearer and more continuous sea and land boundaries.

The results of evaluation of each network on two data sets are shown in Table 4, with data set 1, U-Net and U²F1-score and other regions obtained by-Net have lower segmentation precision, while U²-Net employs a depth supervision strategy such that the F1-score-b value is higher than U-Net. The Deeplabv3+ region segmentation precision is higher than that of U-Net and U²Net, but F1-score-b is lower because deepabv 3+ does not fully utilize the detailed information of the shallow feature map. Raunet achieves the sub-optimal value of each network segmentation result in the data set 1 with the F1-score, MIOU and MAE respectively being 98.38%, 96.82% and 0.016, and the F1-score-b being 69.25%. The evaluation indexes of the segmented network in the data set 1 are superior to those of other networks, F1-score and MIOU are improved by 0.71 percent and 1.37 percent compared with the suboptimal value obtained by RAUNet, MAE is reduced to 0.009, and in addition, the F1-score-b value of the network in the data set 1 reaches 80.14 percent, and the suboptimal value is improved by 10.89 percent.

In data set 2, U-Net and U²-Net has a lower accuracy of region segmentation than it doesHis network. Deeplabv3+ achieved sub-optimal values for F1-score, MIOU, and F1-score-b in dataset 2, whereas RAUNet had poor edge segmentation capability for this dataset, and F1-score-b was only 49.77%, much lower than other networks. The evaluation indexes of the segmentation network in the data set 2 are optimal, F1-score and MIOU are improved by 0.42% and 0.83% compared with suboptimal values, MAE is reduced by 0.187, the precision of the edge F value reaches 72.86%, and the precision of the edge F value is improved by 17.81% compared with suboptimal values.

TABLE 4

In order to further study the effect of each module in the segmentation model of the present invention, the segmentation model of the present invention is split, and the effect of Res2Net module, the effect of depth supervision policy (DS) module and the effect of SA module in the model are verified through ablation experiments, respectively, where table 5 shows the F1-score-b value when performing ablation experiments on dataset 1, where the basic network is a coding-decoding (En _ Decoder) network structure, and the convolution layer when performing ablation on Res2Net is replaced with the Res Net module.

TABLE 5

The result shows that when the segmentation network of the invention uses the Res2Net module, the boundary segmentation precision is improved by 10.45 percent compared with the ResNet module; after the deep supervision strategy is adopted, the boundary segmentation precision is improved by 1.03%; the network (MSDSONT) added with the SA module has the segmentation boundary precision of 80.14 percent, improves the segmentation boundary precision by 5.03 percent and achieves the optimal precision value of each structure, thereby proving the necessity of each module in the sea and land segmentation task.

From the above analysis, it can be seen that: the attention module can improve the characteristic response of the weak sea-land boundary and has advantages in the weak sea-land boundary extraction; the deep supervision strategy can enhance the capability of learning the target boundary information by the network and improve the accuracy of edge segmentation. Therefore, the network of the invention can be suitable for the sea and land segmentation of different types of remote sensing images, and can obtain the optimal result in both the region and the edge detection result.

Claims

1. A remote sensing image land and sea segmentation method is characterized by comprising the following steps:

2) establishing a remote sensing image sea-land segmentation model, wherein the sea-land segmentation model adopts an encoder and a decoder structure, the encoder adopts a plurality of layers of encoding modules, and each layer of encoding module is used for extracting different scale characteristics of the remote sensing image; the decoder comprises a plurality of layers of decoding modules corresponding to the plurality of layers of coding modules, each layer of decoding module is used for up-sampling the output characteristics of the corresponding layer of coding module and the output characteristics of the previous layer of coding module to the size of an original image, fusing the characteristics processed by each layer of decoding module and carrying out edge detection on the fused characteristic image;

2. The remote-sensing image sea-land segmentation method according to claim 1, wherein the multilayer coding modules adopt a Res2Net network and comprise 5 layers of coding modules, the first layer of coding module comprises a convolution layer and a maximum pooling layer and is used for extracting features of the input remote-sensing image, and the other layers of coding modules all adopt residual blocks and are used for processing output results of the previous layer of coding module.

3. A remote sensing image land and sea segmentation method as claimed in claim 2, characterized in that the residual Block is Res2_ Block.

4. The remote-sensing image sea-land segmentation method according to claim 2, wherein the remote-sensing image sea-land segmentation model further comprises a compression and attention module, and an output result of each layer of coding module is processed by the corresponding compression and attention module.

5. A remote sensing video sea-land segmentation method as claimed in claim 2, characterized in that each layer decoding module comprises a Res2_ Block and an Upsample for gradually restoring the feature map to the original input image size.

6. The method for sea-land segmentation of remote-sensing images according to any one of claims 1 to 5, wherein the loss function of the model for sea-land segmentation of remote-sensing images during training is as follows:

7. The remote sensing image sea-land segmentation method according to claim 6, wherein the loss function of each layer of decoding module and the loss function after fusion both adopt BCE loss functions of semantic segmentation binary classification tasks.

8. The remote-sensing image land and sea segmentation method according to claim 7, wherein the BCE loss function is: