CN115272842A

CN115272842A - SAR image ship instance segmentation method based on global semantic boundary attention network

Info

Publication number: CN115272842A
Application number: CN202210472909.1A
Authority: CN
Inventors: 张晓玲; 柯潇; 张天文; 师君; 韦顺军
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2022-04-29
Filing date: 2022-04-29
Publication date: 2022-11-01

Abstract

The invention discloses a SAR ship instance segmentation method based on a global semantic boundary attention network, which is used for solving the problem of limited positioning capability of a target frame in the prior art. The invention is based on deep learning theory, and mainly comprises a global context information modeling module and a boundary attention prediction module. The global context information modeling module builds a long-distance dependency relationship by enhancing the semantic information of the features for multiple times, thereby effectively reducing background interference. The boundary attention prediction module predicts the boundary information of the target twice, so that the positioning capacity of the target frame is improved. The average precision AP of the method provided by the invention is superior to that of the existing SAR ship example segmentation method based on deep learning. The method can overcome the problem of limited target frame positioning capability in the prior art, and improve the example segmentation precision of the ship in the SAR image.

Description

SAR image ship instance segmentation method based on global semantic boundary attention network

Technical Field

The invention belongs to the technical field of Synthetic Aperture Radar (SAR) image interpretation, and relates to an SAR image ship instance segmentation method based on a global semantic boundary attention network.

Background

Synthetic Aperture Radar (SAR) is an excellent sensor. The high-resolution observation image can be provided by measuring the radar scattering characteristics of the target, is not influenced by light and weather, and is widely applied to communities such as measurement, traffic, oceans and remote sensing. The ship monitoring is beneficial to disaster relief, traffic control and fishery monitoring, and is a hotspot of current research. Compared with optical, infrared and hyperspectral sensors, the SAR has stronger adaptability to the marine climate change environment and is more suitable for ship monitoring. Therefore, ship surveillance using SAR is gaining increasing importance.

The traditional method usually depends on expert experience to manually make the characteristics, which wastes time and labor and limits wider popularization. In recent years, detection, classification and recognition methods based on deep learning have been rapidly developed in various fields, and have been extensively and extensively applied to pedestrian detection, face recognition, image classification, speech translation, and the like. The deep learning is introduced into SAR ship example segmentation, and the deep learning has great application potential. More and more scholars have conducted more research on SAR ship instance segmentation methods based on deep learning. For example, suhao et al adopts a model based on a convolutional neural network to perform example segmentation on a remote sensing image, but does not consider the characteristics of an SAR ship, so that the further improvement of the precision is influenced. High-fly et al propose an anchor-box-free instance-segmented network, but the model cannot handle complex scenes and cases. One example segmentation method for SAR vessels based on synergistic attention was proposed by jones danbach et al, but their methods still leave out many small vessels and offshore vessels. In general, most existing SAR vessel example segmentation methods have limited target frame positioning capability, so that the segmentation accuracy needs to be further improved.

Therefore, to solve this problem. The SAR image ship example segmentation method based on the global semantic boundary attention network is provided, and the example segmentation precision is improved by improving the positioning capability of a target frame. The method mainly comprises two modules for improving the positioning capability of a target frame, wherein the first module is a global context information modeling module and is formed by serially connecting a content perception feature recombination sub-network, a multi-view field feature extraction sub-network and a global feature self-attention sub-network. The global context information modeling module is used for modeling the long-distance dependence of the ship surrounding environment through a larger visual field, so that background interference is effectively reduced, and regional features with higher resolution are extracted. The second module is a boundary attention prediction module and is formed by connecting four parts of a boundary attention feature extraction sub-network, a boundary rough positioning sub-network, a boundary fine positioning sub-network and a boundary guide classification re-grading sub-network in series. The boundary attention prediction module is different from a traditional boundary regression module, and the boundary attention prediction module does not adopt output center point and size information to predict a boundary frame, but adopts output information of four boundaries to realize boundary frame prediction. Experimental results on the HRSID data set show that the proposed method is superior to other example segmentation methods based on deep learning.

Disclosure of Invention

The invention belongs to the technical field of Synthetic Aperture Radar (SAR) image interpretation, and discloses a SAR ship instance segmentation method based on a global semantic boundary attention network, which is used for solving the problem of limited target frame positioning capability in the prior art. The method is based on a deep learning theory and mainly comprises a global context information modeling module and a boundary attention prediction module. The global context information modeling module builds a long-distance dependency relationship by enhancing the semantic information of the features for multiple times, thereby effectively reducing background interference. The boundary attention prediction module predicts the boundary information of the target twice, so that the positioning capacity of the target frame is improved. Experiments prove that on the HRSID data set, the average precision AP of the SAR ship example segmentation method based on the global semantic boundary attention network is 57.3%, and the highest average precision in the existing SAR ship example segmentation methods based on the deep learning is 55.4%. The SAR ship instance segmentation method based on the global semantic boundary attention network improves the ship instance segmentation precision.

For the convenience of describing the present invention, the following terms are first defined:

definition 1: traditional HRSID data set acquisition method

The HRSID Dataset is a commonly used SAR image ship instance segmentation Dataset, which is called High-Resolution SAR Images Dataset in English. The data set is derived from 136 panoramic SAR images with the resolution range of 1 meter to 5 meters, each panoramic SAR image is provided with an overlap ratio of 25 percent, a plurality of SAR image slices are obtained through a sliding window mechanism, the size of each slice is 800 x 800 pixels, the number of the finally obtained total slices is 5604, and the total number of ships contained in each slice is 169511. The HRSID dataset sets 65 percent of the slices as the training set and the remaining 35 percent of the slices as the test set. HRSID data set acquisition methods are detailed in "Wei S, zeng X, qu Q, et al.HRSID A High-Resolution SAR Images data set for Ship Detection and Instance Segmentation [ J ]. IEEE Access,2020, 8.

Definition 2: traditional residual backbone network construction method

The residual backbone network is a commonly used backbone network, and the convolutional neural network proposed by 4 scholars from Microsoft Research has gained a superior image classification and object Recognition in the 2015 ImageNet Large Scale Visual Recognition Competition (ILSVRC). Compared with the conventional backbone network, the residual backbone network reduces the probability of gradient extinction and gradient explosion by adding a plurality of residual connections, and can realize faster optimization, so that the number of convolution layers in the residual backbone network can be more than that in the conventional backbone network. A residual network with network depth of 101 is one of the more common structures in the residual network. Specifically, the residual network having a network depth of 101 layers is first subjected to feature extraction by convolution layers having a size of 7 × 7, a convolution kernel number of 64, and a step size of 2, and then subjected to double down-sampling by 3 × 3 maximum pooling, and is output as a feature map at the first stage. And then, the outputs of the second stage, the third stage, the fourth stage and the fifth stage are respectively extracted through stacking of a plurality of residual error modules. The differences are that the number of residual error modules used in each stage is different, and the number of convolution kernels of convolution layers in the residual error modules in each stage is different. Thanks to the cross-connection structure in the residual error module, the convolution network can avoid the problems of gradient disappearance, gradient explosion and degradation while realizing deep-level stacking, can quickly optimize the network, and extracts abstract features with higher distinguishing capability. The classical Residual network construction method is described in detail in K.He et al, "Deep Residual Learning for Image registration," IEEE Conf.Compout.Vis.Pattern registration, 2016, pp.770-778.

Definition 3: traditional regional recommendation network construction method

The regional recommendation network is proposed in Faster R-CNN. The Fast R-CNN provides a regional recommendation network to replace a selective search algorithm aiming at the defect of time consumption of the regional recommendation algorithm in the Fast R-CNN algorithm, and the regional recommendation network and the Fast R-CNN are fused into a network by introducing the concept of a shared convolution characteristic diagram, so that the rapid target detection is realized. In addition, the regional recommendation network also enhances the multi-scale detection capability of the detection network to a certain extent by presetting anchor frames with different sizes and different length-width ratios, thereby improving the detection precision of the target. Specifically, the area recommendation network takes a whole picture as input, and outputs position information and confidence degrees of a series of area recommendation frames. Wherein the confidence level represents the probability that the region recommendation box is foreground. The regional recommendation network comprises two sub-networks, wherein the first sub-network is a backbone network which is shared by the regional recommendation network and Fast R-CNN, and the mechanism is also called a shared convolution characteristic diagram, and the ultimate aim of the mechanism is to save the consumption of computing resources for target detection. The second sub-network of the regional recommendation network consists of an intermediate layer, a classification layer and a regression layer. Wherein the middle layer is essentially a full connection layer; both the classification layer and the regression layer are essentially convolutional layers of 3 x 3 size. It is noted that the classification layer and the regression layer are two modules in parallel, and their inputs are both the outputs of the middle layer. In addition, the second sub-network operates in a similar manner to the convolution operation, using a sliding window mechanism. In particular a local region of the input profile of the second subnetwork. In the forward propagation stage, the second sub-network slides on the feature map, and for each position, classification and regression information corresponding to a plurality of anchor frames is calculated for the position. The classic method for constructing a regional recommended network is described in "Ren S, he K, girshick R, et al. Faster R-CNN: towards read-Time Object Detection with Region pro-social Networks [ J ]. IEEE Transactions on Pattern Analysis & Machine Analysis, 2017,39 (6): 1137-1149.

Definition 4: construction method of traditional interested region feature extraction module

The interesting region feature extraction module is firstly proposed in Fast R-CNN paper, and is used for acquiring corresponding local features with fixed size on the feature map according to coordinate values of the interesting region. The region-of-interest feature extraction module proposed in the Fast R-CNN paper is RoI posing, and the basic idea is to obtain local features with fixed size through two-time quantization and maximum pooling operation. However, the two quantifications often bring about some precision loss, so that the extracted local features are inconsistent with the coordinate values of the region of interest. Therefore, in the Mask R-CNN article, the method proposes RoI Align to extract the characteristics of the region of interest, and the main idea is to abandon the quantization operation and calculate the characteristic value of the corresponding coordinate point by adopting bilinear interpolation, so that the extracted local characteristics are consistent with the coordinate values of the region of interest. The RoI Align has now become the mainstream implementation of the region of interest feature extraction module. The detailed construction method of the interesting region feature extraction module is detailed in He K, gkioxari G, P Doll' R, et al Mask R-CNN [ J ]. IEEE Transactions on Pattern Analysis & Machine Analysis, 2017.

Definition 5: traditional convolutional layer construction method

The convolutional layer is a basic module in a deep learning neural network, and has the basic function of extracting abstract features of input data, so that subsequent classification, regression and other networks can conveniently execute related tasks. Convolutional layers typically contain a number of convolutional kernels, which is a node that enables values within a small rectangular region in an input feature map or picture to be weighted separately and then summed as an output. Each convolution kernel requires the manual specification of multiple parameters. One type of parameter is the length and width of the node matrix processed by the convolution kernel, and the size of this node matrix is also the size of the convolution kernel. The other type of convolution kernel has parameters of the depth of the unit node matrix obtained by processing, and the depth of the unit node matrix is also the depth of the convolution kernel. In the convolution operation process, each convolution kernel slides on input data, then an inner product of the whole convolution kernel and the corresponding position of the input data is calculated, then the inner product is processed through a nonlinear function to obtain a final result, and finally the results of all the corresponding positions form a two-dimensional characteristic diagram. Each convolution kernel generates a two-dimensional feature map, and the feature maps generated by the plurality of convolution kernels are overlapped to form a three-dimensional feature map. In general, the convolution layer has a convolution kernel size of 3 × 3 or 5 × 5, the depth of the convolution kernel is determined by the number of characteristic channels of the previous layer, and the number of convolution kernels is determined by the designer. The classic convolutional layer construction method is detailed in 'Vanli, zhao hong Wei, zhaoyu, huhuang water, wangxing' target detection research based on deep convolutional neural network reviews [ J ]. Optical precision engineering 2020,28 (05): 1152-1164.

Definition 6: traditional pixel recombination construction method

The pixel recombination is firstly proposed for the super-resolution task of the image, and is gradually applied to the classification and detection task of the image subsequently. The pixel recombination is an up-sampling method, can effectively enlarge the reduced feature map, and can be used as a substitute for deconvolution or nearest neighbor interpolation. The classic pixel recombination construction method is detailed in https:// blog.csdn.net/djfjkj 52/article/details/123829282.

Definition 7: traditional hollow convolution layer construction method

The void convolutional layer is similar to the standard convolutional layer, and only the expansion rate parameter is added on the basis of the standard convolutional layer, so that the size of a sampling domain of the convolutional layer is increased, and the increase of the size of a receptive field is finally realized. The cavity convolution layer can extract global information to a certain extent by increasing the receptive field, enhance the semantic information in the output characteristic diagram and is beneficial to the neural network to distinguish the target and the background interference. The classic method for constructing the cavity convolution layer is detailed in https:// blog.csdn.net/qq _ 30241709/article/details/88080367.

Definition 8: conventional tandem operation

The cascade is an important operation in the design of a network structure, and is used for combining features, fusing the features extracted by a plurality of convolution feature extraction frameworks or fusing the information of an output layer, thereby enhancing the feature extraction capability of the network. The cascade method is detailed in "https:// blog. Csdn. Net/alxe _ master/article/details/80506051utm _. Medium =" distribute. Pc _. Release. Free. Non-task-block-blogcommenda frommmachine LearnPai2-3.Channel _. Park parameter _. Dept. 1-utm _ source = distribute. Pc _ release. Non-task-block-blogcommendan LearnPai2-3.Channel _. Park _ release. P _ Rev.

Definition 9: traditional global feature self-attention construction method

The global feature self-attention module is used for extracting input non-local features, and the basic idea of the module is that similarity weights of all other pixel points are calculated for each input pixel point, and then the corresponding pixel points are subjected to weighted summation by using the similarity weights and serve as the output of the pixel point. Compared with the convolutional layer, the visual field of the global feature self-attention module is larger and is not limited to the local visual field, so that global information can be extracted, and semantic information in the feature map is enhanced. The classical global feature self-attention construction method is detailed in Wang X, girshick R, gupta A, et al.

Definition 10: traditional full-connection layer construction method

The fully connected layer is one of neural network structures for further extracting features. Different from the convolutional layer, the number of input and output nodes of the fully-connected layer needs to be preset, and the parameter number and the calculation amount of the fully-connected layer far exceed those of the convolutional layer, so that the fully-connected layer often appears in only a certain part of a neural network structure. The classical full link layer method is described in detail in Haoren Wang, haotian Shi, ke Lin, chengjin Qin, liqun Zhao, yixiang Huang, chengliang Liu.A. high-precision arhythmia classification method based on dual functional connected neural network [ J ]. Biological Signal Processing and Control,2020,58".

Definition 11: traditional convolution attention module construction method

The convolution attention module mainly comprises three parts of pooling, convolution and activation functions. Specifically, for the input feature F, a two-dimensional feature map is obtained by maximum pooling at a channel level and average pooling at a channel level, the two feature maps are spliced, and finally a spatial attention map M is obtained by a convolution kernel activation function_S. The specific expression of the spatial attention module is as formula M_S(F)＝σ(f^7×7([AvgPool(F)；MaxPool(F)]) Shown in (c). Wherein f is^7×7Represents convolution operation, the convolution kernel size is 7 × 7, and σ represents sigmoid activation function. It should be noted that, in this chapter, in order to extract features of multiple visual fields, this chapter adopts two spatial attention modules to extract features in parallel, and the convolution kernel sizes of convolution layers are 7 × 7 and 3 × 3, respectively. Classical convolutional attention module construction methods such as "Woo, s.; park, j.; lee, j.y.; kweon, I.S.J.S., cham, CBAM: volumetric Block attachment Module.2018.".

Definition 12: traditional boundary coarse positioning sub-network construction method

The boundary rough positioning sub-network takes the boundary characteristics as input and outputs rough positioning of the corresponding boundary. In particular, the coarse boundary locator sub-network divides the target space into a plurality of discrete intervals, and for a given boundary characteristic, the coarse boundary locator sub-network only gives the parameter (i.e., s) of which interval the corresponding boundary belongs to_x-right,s_x-left,s_y-rightAnd s and_y-left) Without giving more accurate boundary regression values. Wherein s is_x-rightRepresenting the confidence of the boundary vertically to the right, s_x-leftRepresenting the confidence of the boundary vertically to the left, s_y-rightRepresenting the confidence of the boundary horizontally to the right, s_y-leftRepresentative levelBoundary confidence towards the left. The classic Boundary coarse positioning sub-network construction method is described in detail in Wang J, zhang W, cao Y, et al, side-Aware Boundary Localization for More precision Object Detection [ J].2019.”。

Definition 13: construction method of traditional boundary fine positioning sub-network

The boundary fine positioning sub-network corrects the position of the boundary again on the basis of the boundary coarse positioning. The process is similar to the traditional boundary classification regression, the prediction frame in the boundary coarse positioning is used as the prior knowledge, and the coordinate offset and the size offset between the target frame and the prediction frame are output, so that a more accurate boundary prediction value is obtained. The classic Boundary fine positioning subnetwork construction method is described in detail in Wang J, zhang W, cao Y, et al.side-Aware Boundary Localization for More precision Object Detection [ J ].2019 ].

Definition 14: traditional mask subnetwork construction method

The Mask sub-network is extracted from the Mask R-CNN, the sub-network takes the result of the boundary prediction network as input, outputs the pixel level two-classification result of the target area, and can realize the pixel level differentiation of the target and the background so as to extract the edge information of the target. The classical mask subnetwork construction method is described in detail in "He K, gkioxari G, P Doll a R, et al.Mask R-CNN [ J ]. IEEE Transactions on Pattern Analysis & Machine understanding, 2017.".

Definition 15: classical Adam algorithm

The classical Adam algorithm is an extension of the stochastic gradient descent method and has recently been widely used in deep learning applications in computer vision and natural language processing. Classical Adam is different from classical random gradient descent methods. The random gradient descent maintains a single learning rate for all weight updates, and the learning rate does not change during the training process. Each network weight maintains a learning rate and is adjusted individually as learning progresses. The method calculates adaptive learning rates for different parameters from budgets of the first and second moments of the gradient. The classic Adam algorithm is detailed in "Kingma, d.; a Method for Stochastic optimization 2014, arXiv 1412.698. ".

Definition 16: conventional forward propagation method

The forward propagation method is the most basic method in deep learning, and mainly carries out forward reasoning on input according to parameters and connection methods in a network so as to obtain the output of the network. The forward propagation method is detailed in "https:// www. Jianshu. Com/p/f30c8daebebb".

Definition 17: conventional non-maxima suppression method

The non-maximum suppression (NMS) method is an algorithm used in the field of object detection to remove redundant detection boxes. In the forward propagation result of the classical detection network, the situation that the same target corresponds to a plurality of detection boxes often occurs. Therefore, an algorithm is needed to select a detection box with the best quality and the highest score from a plurality of detection boxes of the same target. Non-maxima suppression performs a local maximum search by calculating an overlap rate threshold. Non-maxima suppression methods are detailed in "https:// www. Cnblogs. Com/makefile/p/nms. Html".

Definition 18: traditional recall rate and accuracy rate calculation method

Recall R refers to the number of correct predictions in all positive samples, expressed as

The precision ratio P refers to the proportional expression of the correct number in the result predicted as positive example as

Wherein TP (true positive) represents positive samples predicted to be positive by the model; FN (false negative) represents the negative sample predicted by the model to be negative; FP (false positive) is expressed as a positive sample predicted to be negative by the model. The recall rate and accuracy curve P (R) refers to a function with R as an independent variable and P as a dependent variable, and the method for solving the numerical values of the parameters is shown in the literature' Lihang, statistical learning method [ M]Beijing, qinghua university Press, 2012 ".

The invention provides a SAR ship instance segmentation method based on a global semantic boundary attention network, which comprises the following steps:

step 1, initializing a data set

Obtaining HRSID data set according to the traditional HRSID data set obtaining method in definition 1, and marking the training set in the HRSID data set as D_testTraining set D_train。

Step 2, building a forward propagation network

Step 2.1, building ResNet-101 backbone network

And (3) constructing a residual error network with 101 network layers by adopting the traditional classical residual error backbone network construction method in definition 2, and marking as Res-101.

Step 2.2, building a regional recommendation network

Constructing a regional recommendation network by adopting a classical regional recommendation network construction method in definition 3, taking the ResNet-101 backbone network Res-101 obtained in step 2.1 as a sub-network in the regional recommendation network, and recording the constructed regional recommendation network as RPN₀。

Step 2.3, building a feature extraction module

And (4) constructing a feature extraction module by adopting a traditional region-of-interest feature extraction module construction method in definition 4, and recording the constructed feature extraction module as FExtract.

Step 2.4, building a global context information modeling module

Firstly, a traditional convolutional layer construction method in definition 5 is adopted to construct two convolutional layers which are respectively marked as conv1 and conv2, and then a traditional pixel recombination construction method in definition 6 is adopted to construct a pixel recombination module which is marked as pixelschuffle. According to the expression

Defining a Softmax layer, noted as Softmax0, wherein z_iC represents the channel number of the input feature map for the feature value of the ith node on the input feature map. Conv1, conv2, pixelshuffle, softmax were concatenated and designated kplayer. According to the expression

Constructing feature reconstruction layersIs cz. Wherein, the first and the second end of the pipe are connected with each other,

(i, j) represents l, F_(i+n,j+m)Represents the feature vector at (i + n, j + m) in F, W_l'(n,m)Represents W_l'Of (c) is located at (n, m). And combining the kplayer and the czlayer together to complete the construction of the content-aware feature recombination sub-network, and marking the constructed content-aware feature recombination sub-network as card.

Then, the conventional method for constructing the hole convolution layers in definition 7 is adopted to construct hole convolution layers with expansion rates of 2,3,4,5, which are respectively marked as d1, d2, d3 and d4. And constructing a cascade module, which is recorded as concate, by adopting a traditional cascade operation construction method in definition 8. Build the convolutional layer using the conventional convolutional layer building method in definition 6, denoted conv3. And d1, d2, d3 and d4 are connected in parallel and then are sequentially connected with concate and con3 in series, so that the construction of the multi-view-field feature extraction subnetwork is completed, and the constructed multi-view-field feature extraction subnetwork is recorded as mrblock.

And finally, constructing a global feature self-attention sub-network by adopting a traditional global feature self-attention construction method in definition 9, and marking the constructed global feature self-attention sub-network as sablock.

And (4) extracting subnetworks mrblock from the feature recombination sub-network card and the multi-view field features, and connecting the global features in series according to the sequence from the attention sub-network sablock to obtain a global context information modeling module, which is recorded as GCB.

Step 2.5, setting up boundary attention prediction module

And building three full connection layers which are respectively marked as fc1, fc2 and fc3 according to the definition 10 of the traditional full connection layer building method. Establishing the classification branch by connecting fc1, fc2 and fc3 in series, and recording the result as CLBranch, and recording the classification result output by the CLBranch as s.

And (4) constructing a convolution attention module, which is recorded as CBAM, by adopting a traditional convolution attention module construction method of definition 11. And (5) building four convolutional layers by adopting a traditional convolutional layer building method of definition 5, wherein the convolutional layers are respectively marked as conv4, conv5, conv6 and conv7. According to the expression

Two Softmax layers are defined, denoted as Softmax1 and Softmax2, respectively, where z_iC represents the channel number of the input feature map for the feature value of the ith node on the input feature map. Conv4, softmann 1, conv5 were serially connected in sequence and designated branchx, and conv6, softmax2, conv7 were serially connected in sequence and designated branchy. And (4) connecting branchx and branchy in parallel and then adding the branchx and branchy to the CBAM module, thus completing the construction of the boundary attention prediction module and recording the boundary attention prediction module as baff.

Establishing a boundary coarse positioning sub-network by adopting a traditional boundary coarse positioning sub-network construction method defined by 12, recording the established boundary coarse positioning sub-network as bbcl, and recording four outputs of the bbcl as s respectively_x-right,s_x-left,s_y-right,and s_y-left。

And (3) constructing a boundary fine positioning sub-network by adopting a traditional boundary fine positioning sub-network construction method defined by 13, and recording the constructed boundary fine positioning sub-network as brfl.

With the classification result s output by CLBranch in step 2.5 and s output by bbcl in step 2.6_x-right,s_x-left,s_y-right,and s_y-leftAs input, according to the formula

And (5) performing calculation, namely finishing the construction of the boundary guide classification re-scoring sub-network, and recording as cbcr.

And serially connecting baff, bbcl, brfl and cbcr in sequence to complete the construction of the boundary attention prediction module, and marking as BABP.

Step 2.6, building a mask subnetwork

And constructing a MASK subnetwork according to the definition 14 of the traditional MASK subnetwork construction method, and recording the constructed MASK subnetwork as MASK.

Step 2.7, building an example segmentation cascade network

And (3) sequentially connecting the feature extraction module FExtract obtained in the step (2.3), the global context information modeling module GCB obtained in the step (2.4), the boundary attention prediction module BABP obtained in the step (2.5) and the MASK sub-networks MASK obtained in the step (2.6) in series to obtain a first example segmentation network which is marked as SEG1.

And (3) repeating the step 2.3, the step 2.4, the step 2.5 and the step 2.6, and sequentially connecting the modules or the sub-networks obtained in the steps in series to obtain a second example segmentation network, which is marked as SEG2.

And (3) repeating the step 2.3, the step 2.4, the step 2.5 and the step 2.6, and sequentially connecting the modules or the sub-networks obtained in the steps in series to obtain a third example segmentation network, which is marked as SEG3.

The first example segmentation network SEG1, the second example segmentation network SEG2 and the third example segmentation network SEG3 are connected in series in sequence, namely the example segmentation cascade network is completed and is marked as CASEG₀。

Step 3, training area recommendation network

An iteration parameter epoch is set, and an initial epoch value is 1.

Step 3.1, forward propagation is carried out on the regional recommendation network

Training set D obtained in step 1_trainNetwork RPN as regional recommendation₀According to the forward propagation method in definition 16, training set D_trainSending into regional recommended network RPN₀Computing and recording network RPN₀The output of (1) is Result0.

Step 3.2, sampling the forward propagation result

Inputting Result0 and training set D obtained in step 3.1_trainAs input, according to the formula

Calculating the IOU value of each recommendation box in Result0 by using a calculation method, and taking the output of the IOU in Result0 which is greater than 0.5 as a positive sample and recording as Result0p; the output of Result0 with an IOU less than 0.5 is taken as a negative sample and is recorded as Result0n. The total number of samples in the negative sample Result0n is counted as M. Manually inputting the number of required negative samples, and recording the number as N; the number of intervals for dividing IOU equally by human input is n_bThe number of samples in the ith IOU interval is M_i. Setting the random sampling probability of the ith interval as

And randomly sampling each IOU interval, and recording the sampling results of all the IOU intervals of the negative samples as Result0ns.

The number of samples in the positive sample Result0P is counted and recorded as P. Setting a random sampling probability of

Randomly sampling Result0p, and recording the sampling Result of positive samples as Result0ps.

Step 3.3, training and optimizing the regional recommendation network

And (3) taking the positive sample sampling Result0ps and the negative sample sampling Result0ns obtained in the step (3.2) as inputs, and training and optimizing the regional recommendation network according to the classical Adam algorithm in the definition 4. Obtaining the RPN of the region recommendation network after training and optimization₁。

Step 4, training example segmentation cascade network

Step 4.1, forward propagation is carried out on the example segmentation cascade network

The training set D obtained in the step 1 is processed_trainSplitting cascaded networks CASEG as an example₀According to the conventional forward propagation method defined in definition 16, training set D_trainCut-in instance split cascade network CASEG₀Performing operation, recording example division cascade network CASEG₀The output of (d) is Result1.

Step 4.2, training and optimizing the example segmentation cascade network

Segmenting the example obtained in the step 4.1 into a cascade network CASEG₀With output Result1 as input, the example split cascade network is trained and optimized according to the classical Adam algorithm in definition 15. Obtaining an instance segmentation cascade network CASEG after training and optimization₁。

Step 5, alternative training is carried out

It is determined whether epoch set in step 3 is equal to 12. If the epoch is not equal to 12, let epoch = epoch +1, RPN₀＝RPN₁、CASEG₀＝CASEG₁Sequentially repeating the step 3.1, the step 3.2, the step 3.3, the step 4.1 and the step 4.2, and then returning to the step 5 to judge the epoch again; if the epoch is equal to 12, let the trained area recommendation network RPN1 and the trained flat case segmentation cascade network CASEG₁Is recorded as a network GCBAN, and then step 6 is carried out.

Step 6, evaluation method

Step 6.1, forward propagation

Using the network GCBAN obtained in step 6 and the test set D obtained in step 1_testAs an input, the detection result, denoted as R, is obtained by using the conventional forward propagation method of definition 16.

Taking the detection result R as input, removing a redundant frame in the detection result R1 by adopting the conventional non-maximum suppression method in definition 17, and specifically performing the following steps:

firstly, marking a box with the highest score in a detection result R1 as BS;

the step (2) then adopts a calculation formula as follows:

calculating an overlapping rate threshold (IoU) of all frames of the detection result R1; discarding IoU>A frame of 0.5;

step (3) selecting a frame BS with the highest score from the rest frames;

repeating the process of calculating IoU and discarding in the step (2) until no frame can be discarded, wherein the last remaining frame is the final detection result and is recorded as R^F。

Step 6.2, index calculation

Using the detection result R obtained in step 6.1^FAs input, the traditional recall rate and precision calculation method in definition 18 is adopted to calculate the precision rate P, the recall rate R and the precision rate and recall rate curve P (R) of the network; using the formula

Calculating SAR ship example segmentation precision indexes AP and AP based on balance learning₅₀、AP₇₅、AP_S、AP_M、AP_L。

The SAR ship instance segmentation method based on deep learning has the innovative point that a global context information modeling module and a boundary attention prediction module are introduced, so that the problem that the target frame positioning capability is limited in the existing SAR ship instance segmentation method based on deep learning is solved. The SAR image ship example segmentation AP adopting the method is 57.3 percent, which exceeds 1.9 percent of the suboptimal SAR image ship example segmentation method; SAR image ship example segmentation AP adopting method₅₀88.6%, 2.8% more than the suboptimal SAR image ship example segmentation method; SAR image ship example segmentation AP adopting method₇₅68.9%, which is 2.0% higher than the suboptimal SAR image ship example segmentation method; SAR image ship example segmentation AP adopting method_S57%, which is 2.1% higher than the suboptimal SAR image ship example segmentation method; SAR image ship example segmentation AP adopting method_M64.3%, which exceeds the suboptimal SAR image ship example segmentation method by 0.8 percentage point; SAR image ship example segmentation AP adopting method_L25.9%, 6.2% more than the suboptimal SAR image ship example segmentation method. In conclusion, the method can realize better target frame positioning and has excellent SAR ship example segmentation precision.

The method has the advantages of overcoming the problem of limited target frame positioning capability in the prior art and improving the example segmentation precision of the ship in the SAR image.

Drawings

FIG. 1 is a schematic flow chart of an SAR image ship example segmentation method based on a global semantic boundary attention network in the invention

Wherein, 1 represents a boundary attention prediction module, and 2 represents a global context information modeling module;

FIG. 2 is an example segmentation accuracy index of the SAR image ship example segmentation method based on the global semantic boundary attention network in the present invention.

Detailed Description

The present invention is further described in detail with reference to fig. 1 and 2.

Step 1, initializing a data set

Obtaining HRSID data set according to HRSID data set obtaining method in definition 1, marking training set in HRSID data set as D_testTraining set D_train。

Step 2, building a forward propagation network

Step 2.1, building ResNet-101 backbone network

As shown in fig. 1, a classical residual backbone network construction method in definition 2 is adopted to construct a residual network with 101 network layers, which is denoted as Res-101.

Step 2.2, building a regional recommendation network

As shown in fig. 1, a classical regional recommendation network construction method in definition 3 is adopted to construct a regional recommendation network, the ResNet-101 backbone network Res-101 obtained in step 2.1 is used as a sub-network in the regional recommendation network, and the constructed regional recommendation network is marked as RPN₀。

Step 2.3, building a feature extraction module

As shown in fig. 1, a feature extraction module is constructed by using the region-of-interest feature extraction module construction method in definition 4, and the constructed feature extraction module is recorded as FExtract.

Step 2.4, building a global context information modeling module

Firstly, two convolutional layers are constructed by adopting the convolutional layer construction method in the definition 5, which are respectively marked as conv1 and conv2, and then a pixel recombination module is constructed by adopting the pixel recombination construction method in the definition 6, which is marked as pixelbuffle. According to the expression

Define Softmax layer, denoted as Softmax0, where z_iC represents the channel number of the input feature map for the feature value of the ith node on the input feature map. Conc 1, conv2, pixelbuffle, softmax were concatenated and recorded as kplayer. According to the expression

And constructing a characteristic recombination layer, and recording as cz. Wherein the content of the first and second substances,

(i, j) represents l, F_(i+n,j+m)Represents the feature vector at (i + n, j + m) in F, W_l'(n,m)Represents W_l'Is located at (n, m). And combining the kplayer and the czlayer together to complete the construction of the content-aware feature recombination sub-network, and marking the constructed content-aware feature recombination sub-network as card.

Then, the hole convolution layers with expansion rates of 2,3,4,5, denoted as d1, d2, d3, d4, respectively, are constructed by the hole convolution layer construction method in definition 7. And constructing a cascade module by adopting a cascade operation construction method in the definition 8, and marking as concatee. Build the convolutional layer using the convolutional layer building method in definition 6, denoted conv3. And d1, d2, d3 and d4 are connected in parallel and then are sequentially connected with concate and con3 in series, so that the construction of the multi-view-field feature extraction subnetwork is completed, and the constructed multi-view-field feature extraction subnetwork is recorded as mrblock.

And finally, constructing a global feature self-attention sub-network by adopting a global feature self-attention construction method in definition 9, and marking the constructed global feature self-attention sub-network as sablock.

As shown in fig. 1, a feature recombination sub-network card, a multi-view domain feature extraction sub-network mrblock, and a global feature self-attention sub-network sablock are connected in series in sequence, so as to obtain a global context information modeling module, which is denoted as GCB.

Step 2.5, building a boundary attention prediction module

And building three full-connection layers according to the full-connection layer construction method defined by the definition 10, wherein the three full-connection layers are respectively marked as fc1, fc2 and fc3. Establishing a classification branch by connecting fc1, fc2 and fc3 in series, namely, establishing a classification branch, and recording the classification branch as CLBranch, and recording a classification result output by the CLBranch as s.

And (4) building a convolution attention module by adopting a convolution attention module construction method of the definition 11, and marking the module as CBAM. Four convolutional layers were constructed using the convolutional layer construction method defined in definition 5, denoted as conv4, conv5, conv6, conv7. According to the expression

Two Softmax layers are defined, denoted as Softmax1 and Softmax2, respectively, where z_iC represents the channel number of the input feature map for the feature value of the ith node on the input feature map. Conv4, softmann 1, conv5 were serially connected in sequence and designated branchx, and conv6, softmax2, conv7 were serially connected in sequence and designated branchy. And (4) connecting branchx and branchy in parallel and then adding the branchx and branchy to the CBAM module, thus completing the construction of the boundary attention prediction module, and recording the boundary attention prediction module as baff.

Establishing a boundary coarse positioning sub-network by adopting the boundary coarse positioning sub-network establishing method defined by the definition 12, recording the established boundary coarse positioning sub-network as bbcl, and recording four outputs of the bbcl as s_x-right,s_x-left,s_y-right,and s_y-left。

And (3) constructing a boundary fine positioning sub-network by adopting the boundary fine positioning sub-network construction method of the definition 13, and recording the constructed boundary fine positioning sub-network as brfl.

As shown in fig. 1, baff, bbcl, brfl, cbcr are serially connected in sequence, that is, the building of the boundary attention prediction module is completed and is recorded as BABP.

Step 2.6, building a mask subnetwork

As shown in fig. 1, a MASK subnetwork is constructed according to the MASK subnetwork construction method defined by definition 14, and the constructed MASK subnetwork is denoted as MASK.

Step 2.7, building an example segmentation cascade network

And (4) repeating the step 2.3, the step 2.4, the step 2.5 and the step 2.6, and sequentially connecting the modules or the sub-networks obtained in the steps in series to obtain a second example segmentation network, which is marked as SEG2.

Step 3, training area recommendation network

An iteration parameter epoch is set, and an initial epoch value is 1.

Training set D obtained in step 1_trainNetwork RPN as regional recommendation₀According to the forward propagation method in definition 16, training set D_trainSending to regional recommended network (RPN)₀Performing calculation and recording network RPN₀As Result0.

Step 3.2, sampling the forward propagation result

Calculating the IOU value of each recommendation box in Result0 by using a calculation method, and taking the output of the IOU in Result0 which is greater than 0.5 as a positive sample and recording as Result0p; the output of Result0 with an IOU less than 0.5 is taken as a negative sample and is denoted as Result0n. The total number of samples in the negative sample Result0n is counted as M. Manually inputting the number of required negative samples, and recording the number as N; the number of intervals for dividing IOU equally required by human input is n_bThe number of samples in the ith IOU interval is M_i. Setting the random sampling probability of the ith interval as

The number of samples in the positive sample Result0P is counted and is marked as P. Setting a random sampling probability of

Step 3.3, training and optimizing the regional recommendation network

And (3) taking the positive sample sampling Result0ps and the negative sample sampling Result0ns obtained in the step (3.2) as input, and training and optimizing the regional recommendation network according to the classical Adam algorithm in the definition 4. Obtaining the RPN of the region recommended network after training and optimization₁。

Step 4, training example segmentation cascade network

Training set D obtained in step 1_trainSplitting a cascaded network CASEG as an example₀According to the forward propagation method in definition 16, training set D_trainSending instance split cascade network CASEG₀Performing operation, recording example, dividing cascade network CASEG₀As Result1.

Step 4.2, training and optimizing the example segmentation cascade network

Step 5, alternate training is carried out

It is determined whether epoch set in step 3 is equal to 12. If the epoch is not equal to 12, let epoch = epoch +1, RPN₀＝RPN₁、CASEG₀＝CASEG₁Repeating the steps in sequenceStep 3.1, step 3.2, step 3.3, step 4.1, step 4.2, then return to step 5 to judge the epoch again; if the epoch is equal to 12, the trained region recommendation network RPN1 and the trained plain case segmentation cascade network CASEG are enabled₁Is recorded as a network GCBAN, and then step 6 is carried out.

Step 6, evaluation method

Step 6.1, forward propagation

Taking the detection result R as an input, removing a redundant box in the detection result R1 by adopting a conventional non-maximum suppression method in definition 17, specifically including the following steps:

firstly, marking a box with the highest score in a detection result R1 as a BS;

the step (2) then adopts a calculation formula as follows:

step (3) selecting a frame BS with the highest score from the rest frames;

Step 6.2, index calculation

Using the detection result R obtained in step 6.1^FAs input, the traditional recall ratio and precision calculation method in definition 18 is adopted to calculate the precision ratio P, the recall ratio R and the precision ratio and recall ratio curve P (R) of the network; using the formula

And calculating the SAR ship example segmentation average precision mAP based on balance learning.

Claims

1. A SAR ship instance segmentation method based on a global semantic boundary attention network is characterized by comprising the following steps:

step 1, initializing a data set

Obtaining HRSID data set according to traditional HRSID data set obtaining method, marking training set in HRSID data set as D_testTraining set D_train；

Step 2, building a forward propagation network

Step 2.1, building ResNet-101 backbone network

Constructing a residual error network with 101 network layers by adopting a classical residual error backbone network construction method, and marking as Res-101;

step 2.2, building a regional recommendation network

Constructing a regional recommendation network by adopting a classical regional recommendation network construction method, taking the ResNet-101 backbone network Res-101 obtained in the step 2.1 as a sub-network in the regional recommendation network, and marking the constructed regional recommendation network as RPN₀；

Step 2.3, building a feature extraction module

Constructing a feature extraction module by adopting a traditional interesting region feature extraction module construction method, and recording the constructed feature extraction module as FExtract;

step 2.4, building a global context information modeling module

Firstly, constructing two convolution layers which are respectively marked as conv1 and conv2 by adopting a traditional convolution layer construction method, and then constructing a pixel recombination module which is marked as pixelshuffle by adopting a traditional pixel recombination construction method;

according to the expression

Defining a Softmax layer, noted as Softmax0, wherein z_iC represents the channel number of the input characteristic diagram; connecting conv1, conv2, pixelshuffle and soft max in series and recording as kplayer;

according to the expression

Constructing a characteristic reconstruction layer, and recording as cz; wherein the content of the first and second substances,

(i, j) represents l, F_(i+n,j+m)Represents the feature vector at (i + n, j + m) in F, W_l'(n,m)Represents W_l'The weight at (n, m); combining the kplayer and the czlayer together to complete the construction of a content sensing characteristic recombination sub-network, and recording the constructed content sensing characteristic recombination sub-network as card;

then, respectively constructing the cavity convolution layers with expansion rates of 2,3,4 and 5 by adopting a traditional cavity convolution layer construction method, and respectively recording the expansion rates as d1, d2, d3 and d4; constructing a cascade module by adopting a traditional cascade operation construction method, and recording as concatee; constructing a convolutional layer by adopting a traditional convolutional layer construction method, and marking as conv3; d1, d2, d3 and d4 are connected in parallel and then sequentially connected with concatee and co nv3 in series, namely the construction of the multi-view-field feature extraction sub-network is completed, and the constructed multi-view-field feature extraction sub-network is marked as mrblock;

finally, a traditional global feature self-attention construction method is adopted to construct a global feature self-attention sub-network, and the constructed global feature self-attention sub-network is marked as sablock;

the method comprises the steps that a feature recombination sub-network card and a multi-view field feature extraction sub-network mrblock are connected in series in sequence, and a global context information modeling module is obtained and recorded as GCB;

step 2.5, building a boundary attention prediction module

Constructing three full-connection layers which are respectively marked as fc1, fc2 and fc3 by adopting a traditional full-connection layer construction method; connecting fc1, fc2 and fc3 in series to finish the construction of the classification branch, recording as CLBranch, and recording the classification result output by the CLBranch as s;

building a convolution attention module by adopting a traditional convolution attention module construction method, and recording the convolution attention module as CBAM; building four convolutional layers by adopting a traditional convolutional layer building method, wherein the convolutional layers are respectively marked as conv4, conv5, conv6 and conv7;

according to the expression

Two Softmax layers are defined, denoted as Softmax1 and Softmax2, respectively, where z_iC represents the channel number of the input characteristic diagram; serially connecting conv4, softmanx1 and conv5 in sequence and marking as branchx, and serially connecting conv6, softmax2 and conv7 in sequence and marking as branchy; adding branchx and branchy which are connected in parallel after the CBAM module, namely completing the construction of a boundary attention prediction module, and recording as baff;

establishing a boundary coarse positioning sub-network by adopting a traditional boundary coarse positioning sub-network construction method, recording the established boundary coarse positioning sub-network as bbcl, and recording four outputs of the bbcl as S respectively_x-right,S_x-left,S_y-rightAnd S_y-left；

Constructing a boundary fine positioning sub-network by adopting a traditional boundary fine positioning sub-network construction method, and recording the constructed boundary fine positioning sub-network as brfl;

with classification result S output by CLBranch in step 2.5 and S output by bbcl in step 2.6_x-right,S_x-left,S_y-rightAnd S_y-leftAs input, according to the formula

Calculating to complete the construction of a boundary guidance classification re-scoring sub-network, and recording as cbcr;

serially connecting baff, bbcl, brfl and cbcr in sequence to complete the construction of a boundary attention prediction module, and marking as BABP;

step 2.6, building a mask subnetwork

Constructing a MASK sub-network according to a traditional MASK sub-network construction method, and marking the constructed MASK sub-network as MASK;

step 2.7, building an example segmentation cascade network

The feature extraction module FExtract obtained in the step 2.3, the global context information modeling module GCB obtained in the step 2.4, the boundary attention prediction module BABP obtained in the step 2.5 and the MASK sub-networks MASK obtained in the step 2.6 are sequentially connected in series to obtain a first example segmentation network which is marked as SEG1;

repeating the step 2.3, the step 2.4, the step 2.5 and the step 2.6, and sequentially connecting the modules or sub-networks obtained in the steps in series to obtain a second example segmentation network which is marked as SEG2;

repeating the step 2.3, the step 2.4, the step 2.5 and the step 2.6, and sequentially connecting the modules or the sub-networks obtained in each step in series to obtain a third example segmentation network, which is marked as SEG3;

the first example segmentation network SEG1, the second example segmentation network SEG2 and the third example segmentation network SEG3 are connected in series in sequence, namely the example segmentation cascade network is completed and is marked as CASEG₀；

Step 3, training area recommendation network

Setting an iteration parameter epoch, and initializing an epoch value to be 1;

The training set D obtained in the step 1 is processed_trainNetwork RPN as regional recommendation₀According to a forward propagation method, training set D_trainSending into regional recommended network RPN₀Computing and recording network RPN₀The output of (1) is Result0;

step 3.2, sampling the forward propagation result

Calculating the IOU value of each recommended box in Result0 by using the calculation method, and taking the output of which the IOU in Result0 is greater than 0.5 as a positive sample and recording as Result0p; taking the output of the IOU less than 0.5 in Result0 as a negative sample, and recording as Result0n; counting the total number of samples in the negative sample Result0n to be M; manually inputting the number of required negative samples, and recording the number as N; the number of intervals for dividing IOU equally by human input is n_bThe number of samples in the ith IOU interval is M_i(ii) a Setting the random sampling probability of the ith interval as

Randomly sampling each IOU interval, and recording the sampling results of all the IOU intervals of the negative samples as Result0ns;

counting the number of samples in the positive sample Result0P, and recording as P; setting a random sampling probability of

Randomly sampling Result0p, and recording a positive sample sampling Result as Result0ps;

step 3.3, training and optimizing the regional recommendation network

Taking the positive sample sampling Result0ps and the negative sample sampling Result0ns obtained in the step 3.2 as input, and training and optimizing the regional recommendation network according to a classical Adam algorithm; obtaining the RPN of the region recommendation network after training and optimization₁；

Step 4, training example segmentation cascade network

Training set D obtained in step 1_trainSplitting cascaded networks CASEG as an example₀According to the conventional forward propagation method, training set D_trainSending instance split cascade network CASEG₀Performing operation, recording example, dividing cascade network CASEG₀The output of (1) is Result1;

step 4.2, training and optimizing the example segmentation cascade network

Segmenting the example obtained in the step 4.1 into a cascade network CASEG₀Taking the output Result1 as input, and training and optimizing the example segmentation cascade network according to a classical Adam algorithm; obtaining an instance segmentation cascade network CASEG after training and optimization₁；

Step 5, alternate training is carried out

Judging whether the epoch set in the step 3 is equal to 12 or not; if epoch is not equal to 12, let epoch = epoch+1、RPN₀＝RPN₁、CASEG₀＝CASEG₁Sequentially repeating the step 3.1, the step 3.2, the step 3.3, the step 4.1 and the step 4.2, and then returning to the step 5 to judge the epoch again; if the epoch is equal to 12, let the trained area recommendation network RPN1 and the trained flat case segmentation cascade network CASEG₁Recording as a network GCBAN, and then performing step 6;

step 6, evaluation method

Step 6.1, forward propagation

Using the network GCBAN obtained in step 6 and the test set D obtained in step 1_testAs input, a traditional forward propagation method is adopted to obtain a detection result which is marked as R;

taking the detection result R as input, and removing a redundant frame in the detection result R1 by adopting a traditional non-maximum value inhibition method, wherein the method specifically comprises the following steps:

firstly, marking a box with the highest score in a detection result R1 as a BS;

the step (2) then adopts a calculation formula as follows:

step (3) selecting a frame BS with the highest score from the rest frames;

repeating the processes of calculating IoU and discarding in the step (2) until no frame can be discarded, and taking the last remaining frame as the final detection result and marking as R^F；

Step 6.2, calculating the index

Using the detection result R obtained in step 6.1^FAs input, the precision P, the recall ratio R and a precision and recall ratio curve P (R) of the network are solved by adopting a traditional recall ratio and precision calculation method; using a formula