CN113989672B

CN113989672B - SAR image ship detection method based on balance learning

Info

Publication number: CN113989672B
Application number: CN202111268008.2A
Authority: CN
Inventors: 张晓玲; 柯潇; 张天文; 师君; 韦顺军
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2021-10-29
Filing date: 2021-10-29
Publication date: 2023-10-17
Anticipated expiration: 2041-10-29
Also published as: CN113989672A

Abstract

The invention discloses a SAR image ship detection method based on balance learning, which is based on a deep learning theory and mainly comprises a balance scene learning mechanism, a balance interval sampling mechanism, a balance characteristic pyramid network and a balance classification regression network. The balanced scene learning mechanism solves the problem of unbalanced sample scene by amplifying the shore sample; the balanced interval sampling mechanism is used for sampling samples such as an IOU (input output unit) divided into a plurality of intervals, each interval and the like so as to solve the problem of unbalanced image sample scenes; the balanced feature pyramid network extracts features with more multi-scale detection capability through a feature enhancement method, so that the problem of unbalanced ship scale features is solved; the balanced classification regression network solves the problem of unbalanced classification regression tasks by designing two different sub-networks for classification and regression tasks. The invention has the advantages of overcoming the unbalance problem in the prior art and improving the detection precision of the ship in the SAR image.

Description

SAR image ship detection method based on balance learning

Technical Field

The invention belongs to the technical field of synthetic aperture radar (Synthetic Aperture Radar, SAR) image interpretation, and relates to an SAR image ship detection method based on balance learning.

Background

Synthetic Aperture Radar (SAR) is an advanced active microwave sensor for high resolution earth observation, and is still currently the leading technology in the field of marine monitoring. The method is widely applied to military and civil fields such as offshore traffic control, disaster relief, fishery management and the like. Currently, while optical or hyperspectral satellites offer some monitoring services, SAR with all-day, all-weather operating capabilities is more suited to climatically changing oceans. SAR is therefore an essential telemetry tool in marine area sensing.

Ships are the most important participants in the ocean. The system is more and more valued by students because of great value in aspects of sunken ship rescue, offshore traffic control, fishery management and the like. Since the first SAR satellite Seasat-1 was transmitted in the united states, marine vessel monitoring has been actively studied. In addition, the data volume generated by various SAR sensors is large at present, and intelligent detection of ocean targets is urgently needed. Therefore, the ship SAR detection has become a research hotspot for the high resolution earth-facing observation world. See documents Wang Zhiyong, dou Hao, tian Jinwen for research on SAR image ship target rapid detection method [ J ]. Ship electronic engineering 2016,36 (09): 27-30+88 ] "

In recent years, with rapid rise of Deep Learning (DL), many scholars in the SAR world start to study a DL-based detection method. Compared with the traditional feature-based method, the DL-based method has the outstanding advantages of simplicity, full automation (i.e. no need of complex basic stages such as land-sea segmentation, coastline detection, speckle correction, etc.), high speed, high precision, etc. Although their deep principle has not been recognized, it can free productivity and greatly improve work efficiency. This enables a qualitative leap in the intelligent interpretation of SAR images. See "Dulan, wang Zhaocheng, wang Yan, wei Di, li Lu. A review of single channel SAR target detection and discrimination research progress in complex scenarios [ J ]. Radar theory, 2020,9 (01): 34-54.

However, existing deep learning based SAR ship detectors have some imbalance problems, potentially impeding further accuracy. Specifically: 1) The image sample scene is unbalanced, i.e. the number of the offshore ship image samples is unbalanced with the number of the offshore ship image samples. In short, there are far fewer samples of the offshore vessel than the offshore vessel. 2) Positive and negative sample imbalance, i.e. positive (ship) and negative (background) sample number imbalance. There are far more negative samples than positive samples. 3) The ship scale feature imbalance, i.e., the multi-scale ship feature imbalance. For dynamic ship detection, ship sizes are also varied due to different spatial resolutions and ship categories. 4) Classification regression tasks are unbalanced, i.e. the difficulty level of ship classification and ship position regression are unbalanced, the latter being much more difficult than the former.

Therefore, in order to solve the problems of unbalance, a SAR image ship detection method based on balance learning is proposed herein. The method comprises four mechanisms for solving the imbalance problem, namely a balance scene learning mechanism, a balance interval sampling mechanism, a balance characteristic pyramid network and a balance classification regression network. Experimental results on SSDD datasets show that the proposed method is superior to other deep learning based detection methods.

Disclosure of Invention

The invention belongs to the technical field of Synthetic Aperture Radar (SAR) image interpretation, and discloses a ship detection method based on balance learning, which is used for solving the problems of unbalanced image sample scene, unbalanced positive and negative samples, unbalanced ship scale characteristics and unbalanced classification regression tasks in the prior art. The method is based on a deep learning theory and mainly comprises four parts of a balance scene learning mechanism, a balance interval sampling mechanism, a balance characteristic pyramid network and a balance classification regression network. The balance scene learning mechanism solves the problem of unbalanced sample scene by amplifying the sample of the shore-backing ship; the balanced interval sampling mechanism is used for sampling samples such as an IOU (input output unit) divided into a plurality of intervals, each interval and the like so as to solve the problem of unbalanced image sample scenes; the balanced feature pyramid network extracts features with more multi-scale detection capability through a feature enhancement method, so that the problem of unbalanced scale features of the ship is solved; the balanced classification regression network solves the problem of unbalanced classification regression tasks by designing two different sub-networks for classification and regression tasks. Experiments prove that on an SSDD data set, the detection precision of the SAR image ship detection method based on balance learning is 95.25%, the detection precision of other existing SAR ships based on deep learning is 92.27%, and the SAR detection method based on balance learning improves the ship detection precision.

For convenience in describing the present invention, the following terms are first defined:

definition 1: SSDD data set acquisition method

The SSDD dataset refers to the SAR ship survey dataset, all in english as SAR Ship Detection Dataset, and SSDD is the first open SAR ship survey dataset. The SAR images comprising Sentinel-1, radarSat-2 and TerraSAR-X were 1160 in total, with a resolution of 500×500 pixels. The SSDD has 2551 vessels. Minimum is 28pixel ² At maximum 62878pixel ² (pixel ² Is the product of the width pixel and the height 1). In SSDD, images (232 samples) with suffixes of 1 and 9 are selected as a test set, and the rest are selected as training sets (928 samples). Methods for acquiring SSDD datasets can be found in references "Li Jianwei, qu Changwen, peng Shujuan, deng Bing. SAR image ship target detection based on convolutional neural networks [ J ]]System engineering and electronics, 2018,40 (09): 1953-1959.

Definition 2: classical GAN network construction method

Classical formed antagonism network (GAN, generative Adversarial Networks) is a deep learning model, and is one of the most promising approaches for unsupervised learning on complex distributions in recent years. The model is built up of two modules in the frame: the mutual game learning of the Generative Model and the discriminant Model Discriminative Model produces a fairly good output. In the original GAN theory, it is not required that both G and D are neural networks, but only functions that can fit the corresponding generation and discrimination. But in practice deep neural networks are generally used as G and D. An excellent GAN network can achieve fast scene feature extraction. Classical GAN network construction methods are described in detail in "I.J. Goodfulow et al," Generative adversarial nets, "International Conference on Neural Information Processing Systems, pp.2672-2680,2014"

Definition 3: classical K-means clustering algorithm

The classical K-means clustering algorithm is an iterative solution clustering analysis algorithm and is commonly used as an unsupervised classification task, and the classical K-means clustering algorithm comprises the steps of pre-dividing data into K groups, randomly selecting K objects as initial clustering centers, calculating the distance between each object and each seed clustering center, and distributing each object to the closest clustering center. The cluster centers and the objects assigned to them represent a cluster. For each sample assigned, the cluster center of the cluster is recalculated based on the existing objects in the cluster. This process will repeat until a certain termination condition is met. Classical K-means clustering algorithm is detailed in "Li Tingting. Research on improved K-means clustering algorithm [ D ]. University of Anhui, 2015 ].

Definition 4: classical Adam algorithm

Classical Adam's algorithm is an extension of the random gradient descent method and has recently been widely used in deep learning applications in computer vision and natural language processing. Classical Adam is different from classical random gradient descent. The random gradient descent maintains a single learning rate for all weight updates and the learning rate does not change during training. Each network weight maintains a learning rate and is adjusted individually as learning progresses. The method calculates adaptive learning rates for different parameters from budgets of first and second moments of the gradient. Classical Adam algorithms are described in detail in "Kingma, d.; ba, J.Adam: A Method for Stochastic optimizations.arXiv 2014, arXiv:1412.6980.

Definition 5: classical forward propagation method

The forward propagation method is the most basic method in deep learning, and mainly performs forward reasoning on the input according to parameters in the network and a connection method, so as to obtain the output of the network. The forward propagation method is described in detail in "https:// www.jianshu.com/p/f30c8daebebb".

Definition 6: classical residual error network construction method

The residual network is a convolutional neural network proposed by 4 scholars from Microsoft Research, and the advantages of image classification and object recognition were obtained in ImageNet large-scale visual recognition competition (ImageNet Large Scale Visual Recognition Challenge, ILSVRC) in 2015. The residual network is characterized by easy optimization and can improve accuracy by increasing considerable depth. The inside residual block uses jump connection, so that the classical residual network construction method for solving the gradient vanishing problem caused by adding depth in the deep neural network is relieved. Classical residual network construction methods are described in "K.He et al," Deep Residual Learning for Image Recognition, "IEEE Conf.Comput.Vis.Pattern Recognit," 2016, pp.770-778.

Definition 7: traditional convolution kernel operations

A convolution kernel is a node that enables the weighting and then summing, respectively, of values within a small rectangular region in an input feature map or picture as an output. Each convolution kernel requires manual specification of a number of parameters. One type of parameter is the length and width of the node matrix processed by the convolution kernel, and the size of this node matrix is also the size of the convolution kernel. The other type of parameters of the convolution kernel is the depth of the unit node matrix obtained by processing, and the depth of the unit node matrix is also the depth of the convolution kernel. In the convolution operation process, each convolution kernel slides on input data, then the inner product of the corresponding position of the whole convolution kernel and the input data is calculated, then the final result is obtained through a nonlinear function, and finally the results of all the corresponding positions form a two-dimensional characteristic diagram. Each convolution kernel generates a two-dimensional feature map, and the feature maps generated by the convolution kernels are overlapped to form a three-dimensional feature map. Conventional convolution kernel operations are described in detail in "Fan Lili, zhao Hongwei, zhao Haoyu, hu Huangshui, wang Zhen. Overview of object detection studies based on deep convolutional neural networks [ J ]. Optical precision engineering, 2020,28 (05): 1152-1164 ].

Definition 8: conventional cascading operation

Cascading is an important operation in network structure design, and is used for combining features, fusing features extracted by a plurality of convolution feature extraction frames or fusing information of an output layer, so that the feature extraction capability of a network is enhanced. The cascading method is described in detail in 'https:// blog.csdn.net/alxe_map/arc/details/80506051utm_medium=distribution.pc_releas.none-task-blog CommendFromMachineLearnPai 2-3.channel_parameter & depth1-utm _source=distribution.pc_releas.none-task-blog CommendFromMachineLearnpai 2-3.channel_parameter'.

Definition 9: conventional upsampling operations

Upsampling is an operation of performing a method on a picture or a feature map, and the main stream upsampling operation generally adopts an interpolation method, that is, a suitable interpolation algorithm is adopted to insert new elements between pixel points on the basis of original image pixels. In the mainstream interpolation algorithm, the adjacent interpolation is simpler, easy to realize and is commonly applied in early stages. However, this method can produce noticeable jagged edges and mosaics in the new image. The bilinear interpolation method has a smoothing function, can effectively overcome the defects of the adjacent method, but can degrade the high-frequency part of the image and blur the details of the image. When the magnification is higher, higher order interpolation, such as bicubic and cubic spline interpolation, is better than lower order interpolation. The interpolation algorithms can enable the pixel gray values generated by interpolation to continue the continuity of the gray change of the original image, so that the gray change of the amplified image is naturally smooth. However, in the image, there is a sudden change in gray value between some pixels and adjacent pixels, that is, there is a gray discontinuity. These pixels with abrupt gray value changes are the edge pixels in the image that describe the contour or texture image of the object. Classical upsampling operations are described in detail in "https:// blog.csdn.net/weixin_ 43960370/arc/details/106049708utm_term=% E5% 8d%b7%a7%af%e7%89%b9%e5%be%81%e5%9b%be%e4%b8%8a%e9%87%87%e6%a0%b7 & utm_medium = distribution.pc_agpage_search_result.none-task-blog-2-all-sobaidoup-1-106049708 & spm= 3001.4430".

Definition 10: traditional pooling operations

Pooling is a very common operation in CNN, where the Pooling layer is to imitate the visual system of a person to reduce the dimension of data, and is often called Subsampling (Subsampling) or Downsampling (Downsampling), and when a convolutional neural network is constructed, the Pooling is often used after the convolutional layer to reduce the feature dimension output by the convolutional layer, so that the network parameters are effectively reduced, and the overfitting phenomenon can be prevented. Classical pooling operations are described in detail in "https:// www.zhihu.com/query/303215483/answer/615115629"

Definition 11: traditional regional recommendation network construction method

The regional recommendation network is a subnetwork in the fast R-CNN for extracting regions in the picture where objects may be present. The regional recommendation network is a full convolution network that takes as input the convolution feature map of the base network output, which is the target confidence score for each candidate box. The construction method of the conventional regional recommended network is shown in detail in Ren S, he K, girshick R, et al Faster R-CNN: towards Real-Time Object Detection with Region Proposal Networks [ J ]. IEEE Transactions on Pattern Analysis & Machine Intelligence,2017,39 (6): 1137-1149 ] "

Definition 12: traditional full connection layer method

The full-connection layer is a part of the convolutional neural network, the input and output sizes of the full-connection layer are fixed, and each node is connected with all nodes of the upper layer and used for integrating the features extracted from the front edge. The full-link layer method is described in detail in "Haoren Wang, haotian Shi, ke Lin, chengjin Qin, liqun Zhao, YIxiang Huang, chengliang Liu.A high-precision arrhythmia classification method based on dual fully connected neural network [ J ]. Biomedical Signal Processing and Control,2020,58 ].

Definition 13: traditional non-maximum value inhibition method

The non-maximum suppression method is an algorithm for removing redundant detection frames in the field of target detection. In the forward propagation result of classical detection networks, it is often the case that the same target corresponds to multiple detection frames. Therefore, an algorithm is needed to screen out a best quality, highest scoring detection box from multiple detection boxes of the same target. Non-maximum suppression local maximum search is performed by calculating the overlap ratio threshold. The non-maximum suppression method is described in detail in "https:// www.cnblogs.com/makefile/p/nms. Html".

Definition 14: traditional recall rate and accuracy rate calculation method

Recall R is indicated at allThe correct number is predicted in positive samples of (a) expressed asThe accuracy P refers to the result of the positive example, and the ratio expression of the correct number is +.>Wherein TP (true positive) represents a positive sample predicted by the model to be positive; FN (false negative) represents a negative sample predicted by the model to be negative; FP (false positive) is represented as a positive sample predicted by the model to be negative. The traditional recall and precision curve P (R) refers to a function with R as an independent variable and P as a dependent variable, and the detailed calculation of the values of the parameters is described in the document Li Hang, statistical learning method [ M ]]Beijing, university of Qinghua Press, 2012.

The invention discloses a ship detection method based on balance learning, which comprises the following steps:

step 1, initializing an SSDD data set

And (3) adjusting SAR image sequence in the SSDD data set by adopting a random method to obtain a new SSDD data set.

Step 2, scene augmentation by using balance scene learning mechanism

Step 2.1, extracting SSDD data set characteristics by using GAN network

Constructing and generating an antagonistic network GAN by adopting a classical GAN network construction method in definition 2 ₀ . Taking the new SSDD data obtained in the step 1 as input, adopting the classical Adam algorithm in the definition 4 to train and optimize to generate the countermeasure network GAN ₀ The resulting challenge network after training and optimization is denoted GAN.

And then, inputting the new SSDD data obtained in the step 1 into the generated countermeasure network GAN after training and optimizing according to the traditional forward propagation method in the definition 5 by taking the new SSDD data obtained in the step 1 as input to obtain the output vector M= { M1, M2, … Mi, … M1160} of the network, wherein Mi is the output vector of the ith picture in the new SSDD data.

Defining the output vector M as the scene characteristics of all pictures in the new SSDD data set, and defining Mi as the scene characteristics of the ith picture in the new SSDD data set.

Step 2.2, scene clustering is carried out

Taking a set M of scene characteristics of all pictures in the new SSDD data obtained in the step 2.1 as input, adopting a traditional K-means clustering algorithm in the definition 3, and clustering the pictures in the new SSDD data set by means of the scene characteristics M:

step 2.3, initializing parameters

For centroid parameters in the traditional K-means clustering algorithm in definition 3, randomly initializing centroid parameters of the K-means clustering algorithm in the first iteration, and recording as

The current iteration number is defined as t, t=1, 2, …, I is the maximum iteration number of the K-means clustering algorithm, and i=1000 is initialized. Defining centroid parameters of step t iteration as Initializing an iteration convergence error epsilon as one of algorithm iteration convergence conditions.

Step 2.4, performing iterative operation

First, the formula is adoptedCalculating to obtain scene feature M of ith picture ⁱ To the first centroid in iteration 1->Distance of (2) is denoted as->

Using the formulaCalculating to obtain scene feature M of ith picture ⁱ To the second centroid in iteration 1->Distance of (2) is denoted as->

Comparison ofAnd->If->Then define: scene feature M of the ith picture in iteration 1 ⁱ Belongs to the second category, otherwise, defining scene feature M of the ith picture in the 1 st iteration ⁱ Belonging to the first category.

Definition: after iteration 1, the set of all scene features of the first class isThe set of all scene features of the second class is +.>

Let t=2 then perform the following until convergence:

1) Let centroid parameters of step tFor the collection->Let centroid parameter of step t +.>Is a set ofClose->Is a mean of the arithmetic of (a).

2) Using the formulaCalculating to obtain scene feature M of ith picture ⁱ To the first centroid in the t-th iteration +.>Distance of (2) is denoted as->

Using the formulaCalculating to obtain scene feature M of ith picture ⁱ To the second centroid in the t-th iteration +.>Distance of (2) is denoted as- >

3) Comparison ofAnd->If->Then define: scene feature M of ith picture in t-th iteration ⁱ Belonging to the second category, otherwise define: scene feature M of ith picture in t-th iteration ⁱ Belonging to the first category. Definition: after iteration of step t, all scene feature sets of the first class are +.>All scene feature sets of the second class are +.>

And outputting a clustering result, and marking the clustering result as CLASS.

4) Calculating the change quantity of centroid parameters of the iteration and the last iteration, marking the change quantity as sigma, and expressing the change quantity as sigmaIf sigma<Epsilon or t<And I, outputting a clustering result CLASS, otherwise, t=t+1, and returning to the step 1) to continue iteration.

Step 2.5, scene augmentation

According to the clustering result CLASS obtained in the step 2.4 and all pictures in the new SSDD Data, classifying all pictures in the new SSDD Data into two types, wherein the first type is a land scene picture and recorded as Data ₁ The second type is that the offshore scene picture is recorded as Data ₂ . Definition: data ₁ The number of pictures is N ₁ ，Data ₂ The number of pictures is N ₂ 。

If N ₂ >N ₁ Then from the first category as the coastal scene picture Data ₁ N is selected randomly based on Gaussian distribution ₂ -N ₁ Carrying out traditional mirror image operation on a picture to obtain N after the mirror image operation ₂ -N ₁ The picture is recorded as Data _1extra . Then N after the mirror operation ₂ -N ₁ Picture Data _1extra And the first type is the picture Data of the scene of the approach shore ₁ Merging and outputting a new picture set, which is recorded as Data _1new . Definition of Data _2new ＝Data ₂ 。

If N ₂ <＝N ₁ Then from the second category as offshore scene picture Data ₂ N is selected randomly based on Gaussian distribution ₁ -N ₂ Carrying out traditional mirror image operation on a picture to obtain N after the mirror image operation ₁ -N ₂ The picture is recorded as Data _2extra . Then N after the mirror operation ₁ -N ₂ Picture Data _2extra And the first type is the picture Data of the scene of the approach shore ₂ Merging and outputting a new picture set, which is recorded as Data _2new . Definition of Data _1new ＝Data ₁ 。

Defining a new set of pictures Data _new ＝{Data _1new ,Data _2new }。

Will Data _new Dividing the training set into two parts according to the ratio of 7:3 to obtain a training set and a Test set, wherein the training set is denoted as Train, and the Test set is denoted as Test.

Step 3, constructing a forward propagation network

Step 3.1, constructing a balance characteristic pyramid network

Adopting a classical residual network construction method in definition 6 to construct a residual network with the network layer number of 50, marking as Res-50, and simultaneously marking the feature images generated by the last layer network with different sizes in the residual network Res-50 as F respectively according to the feature image size from large to small ₁ ，F ₂ ，F ₃ ，F ₄ ，F ₅ 。

Will F ₅ Is also denoted as P ₅ 。

Using the conventional convolution kernel operation in definition 7, F ₄ Feature extraction was performed by a 1×1 convolution sum, and the feature extraction result was denoted as E ₄ ；

With the conventional upsampling operation in definition 9, P is determined by the upsampling operation ₅ Feature map size and F of (2) ₄ Consistent, the result after the upsampling operation is denoted U ₅ ；

Using the conventional cascading operation in definition 8, E ₄ And U ₅ Superposing, and marking the superposition result as P ₄ 。

Using the conventional convolution kernel operation in definition 7, F ₃ Feature extraction was performed by a 1×1 convolution sum, and the feature extraction result was denoted as E ₃ ；

With the conventional upsampling operation in definition 9, P is determined by the upsampling operation ₄ Feature map size and F of (2) ₃ Consistent, up-sampling operationsThe result after doing is U ₄ ；

Using the conventional cascading operation in definition 8, E ₃ And U ₄ Superposing, and marking the superposition result as P ₃ 。

Using the conventional convolution kernel operation in definition 7, F ₂ Feature extraction was performed by a 1×1 convolution sum, and the feature extraction result was denoted as E ₂ ；

With the conventional upsampling operation in definition 9, P is determined by the upsampling operation ₃ Feature map size and F of (2) ₂ Consistent, the result after the upsampling operation is denoted U ₃ ；

Using the conventional cascading operation in definition 8, E ₂ And U ₃ Superposing, and marking the superposition result as P ₂ 。

Using the conventional convolution kernel operation in definition 7, F ₁ Feature extraction was performed by a 1×1 convolution sum, and the feature extraction result was denoted as E ₁ ；

With the conventional upsampling operation in definition 9, P is determined by the upsampling operation ₂ Feature map size and F of (2) ₂ Consistent, the result after the upsampling operation is denoted U ₂ ；

Using the cascading operation in definition 8, E ₁ And U ₂ Superposing, and marking the superposition result as P ₁ 。

With the conventional upsampling operation in definition 9, P is determined by the upsampling operation ₅ Feature map size and P of (2) ₃ Consistent, the result after the upsampling operation is H ₅ 。

With the conventional upsampling operation in definition 9, P is determined by the upsampling operation ₄ Feature map size and P of (2) ₃ Consistent, the result after the upsampling operation is H ₄ 。

Will P ₃ Is also denoted as H ₅ 。

P is pooled by maximum pooling using the conventional pooling operations in definition 10 ₂ Feature map size and P of (2) ₃ Consistent, the result after the upsampling operation is H ₂ 。

Employing definition 10Conventional pooling operations, P is pooled by maximizing ₁ Feature map size and P of (2) ₃ Consistent, the result after the upsampling operation is H ₁ 。

For H ₁ ，H ₂ ，H ₃ ，H ₄ ，H ₅ Using the formulaA feature map I is calculated, where k represents the subscript of H and (I, j) represents the spatial sampling position of the feature map.

Taking the characteristic diagram I as input, adopting a formulaAnd calculating to obtain a characteristic diagram O. Wherein I is _i Features representing the I-th position on the feature map I; o (O) _i Features representing the i-th position on the feature map O;representing a normalization factor; f (I) _i ,I _j ) Is used for calculating I _i And I _j The similarity between functions is expressed as +.>Wherein θ (I) _i )＝W _θ I _i ,φ(I _j )＝W _φ I _J ,W _θ And W is _φ Is a matrix learned by a 1 x 1 convolution operation in definition 7; g (I) _j )＝W _g I _j ，W _g Is a matrix learned by a 1 x 1 convolution operation in definition 7.

And 3.1, after all network operations are completed, obtaining a balanced characteristic pyramid network, and marking the balanced characteristic pyramid network as a backhaul.

Step 3.2, building an area recommendation network

Adopting a traditional area recommendation network construction method in definition 11, taking the backhaul obtained in step 3.1 as a feature extraction layer to construct an area recommendation network, and marking the area recommendation network as RPN ₀ 。

Step 3.3, constructing a balance classification regression network

Constructing full connection layers FC1 and FC2 by adopting a traditional full connection layer method in definition 12, taking the output of FC1 as the input of FC2, taking FC1 and FC2 as classification heads, and marking as Clhead;

adopting the traditional convolution kernel method in definition 7 to construct four layers of convolution layers, namely Conv1, conv2, conv3 and Conv4; meanwhile, the Pooling layer is constructed using the conventional Pooling operation in definition 10, denoted Pooling. The output of Conv1 is taken as the input of Conv2, the output of Conv2 is taken as the input of Conv3, the output of Conv3 is taken as the input of Conv4, and the output of Conv4 is taken as the input of Pooling. Conv1, conv2, conv3, conv4, and Pooling were used as regression heads and were designated as read. The classification header Clhead and the regression header Rehead have the same feature map input and form a balanced classification regression network together with a backhaul, denoted as BCRN ₀ 。

Step 4, training area recommendation network

And setting an iteration parameter epoch, and initializing an epoch value to be 1.

Step 4.1, forward propagation is carried out on the regional recommendation network

Taking the training set Train of the amplified data set Datanew obtained in the step 2 as an area recommendation network RPN ₀ Is used to send the training set Train into the regional recommendation network RPN by adopting the traditional forward propagation method in definition 5 ₀ Performing operation and recording network RPN ₀ As Result0.

Step 4.2, performing balance interval sampling on the forward propagation result

Taking the input Result0 and the training set Train obtained in the step 4.1 as inputs, and adopting a formulaCalculating the IOU value of each recommended frame in Result0, taking the output of the IOU in Result0 greater than 0.5 as a positive sample, and marking as Result0p; the output of Result0 with IOU less than 0.5 is taken as negative sample and denoted Result0n. The total number of samples in the negative sample Result0n is counted as M. Manually inputting the number of required negative samples, and recording as N; number of intervals required to divide IOU by human inputIs n _b Record the number of samples in the ith IOU interval as M _i . Setting the random sampling probability of the ith interval asAnd randomly sampling each IOU interval, and recording the sampling Result of all the IOU intervals of the negative sample as Result0ns.

The number of samples in the positive sample Result0P is counted and denoted as P. Setting random sampling probability asRandom sampling is carried out on Result0p, and the positive sample sampling Result is recorded as Result0ps.

Step 4.3, training and optimizing the regional recommendation network

And (3) taking the positive sample sampling Result0ps and the negative sample sampling Result0ns obtained in the step 4.2 as inputs, and training and optimizing the regional recommendation network by adopting the classical Adam algorithm in the definition 4. The region recommendation network RPN1 after training and optimization is obtained.

Step 5, training a balanced classification regression network

Step 5.1, forward propagation is carried out on the equilibrium classification regression network

Taking the training set Train of the amplified data set Datanew obtained in the step 2 as a balanced classification regression network BCRN ₀ Is to send the training set Train into the balanced classification regression network BCRN by adopting the traditional forward propagation method in definition 5 ₀ Performing operation, recording balance classification regression network BCRN ₀ As Result1.

Step 5.2, training and optimizing the balance classification regression network

Returning the balance classification obtained in the step 5.1 to the network BCRN ₀ As input, the region recommendation network is trained and optimized using the classical Adam algorithm in definition 4. The region recommendation network BCRN1 after training and optimization is obtained.

Step 6, performing alternate training

Judging that the epoch set in the step 4 isAnd no is equal to 12. If epoch is not equal to 12, then let epoch=epoch+1, RPN ₀ ＝RPN ₁ 、BCRN ₀ ＝BCRN ₁ Sequentially repeating the steps 4.1, 4.2, 4.3, 5.1 and 5.2, and then returning to the step 6 to judge the epoch again; if epoch is equal to 12, let the trained regional recommendation network RPN1 and the trained balanced classification regression network BCRN1 be denoted as the network BL-Net, and then go to step 7.

Step 7, evaluation method

Step 7.1 Forward propagation

Taking the network BL-Net obtained in the step 6 and the test set Tests obtained in the step 2.5 as inputs, adopting a traditional forward propagation method of definition 5 to obtain a detection result, and marking the detection result as R.

Taking a detection result R as input, removing redundant frames in the detection result R1 by adopting a traditional non-maximum value inhibition method in definition 13, wherein the specific steps are as follows:

step (1), firstly, enabling a frame with the highest score in a detection result R1 to be marked as a BS;

step (2) then adopts a calculation formula as follows:calculating an overlapping rate threshold value (IoU) of all frames of the detection result R1; reject IoU>A box of 0.5;

step (3) selecting a frame BS with the highest score from the rest frames;

repeating the steps of IoU and discarding in step (2) until no frame can be discarded, and marking the last frame as the final detection result as R ^F 。

Step 7.2, calculating the index

With the detection result R obtained in step 7.1 ^F As input, solving the accuracy rate P, the recall rate R, and the accuracy rate and recall rate curve P (R) of the network using the conventional recall rate and accuracy rate calculation method of definition 14;

using the formulaCalculated to obtainTo the average accuracy mAP of SAR ship detection based on balance learning.

The invention has the innovation point that four balance learning methods, namely a balance scene learning mechanism, a balance interval sampling mechanism, a balance feature pyramid network and a balance classification regression network are introduced, so that the problems of four imbalance problems of image sample scene imbalance, positive and negative sample imbalance, ship scale feature imbalance and classification regression task imbalance in the existing SAR ship detection method based on deep learning are solved. The mAP detected by the SAR image ship by adopting the method is 95.25 percent, which exceeds 3 percent of that of the suboptimal SAR image ship detector; the SAR image shore ship detection mAP of the method is 84.79 percent, which exceeds 10 percent of the suboptimal SAR image ship detector; the SAR image off-shore ship detection mAP of the method is 99.62 percent, which exceeds 0.5 percent of the suboptimal SAR image ship detector.

The invention has the advantages of overcoming the unbalance problem in the prior art and improving the detection precision of the ship in the SAR image.

Drawings

Fig. 1 is a schematic flow chart of a method for detecting a ship from a SAR image based on balance learning in the present invention.

Fig. 2 is a schematic diagram of a balanced classification regression network in the method for detecting the ship in the SAR image of balanced learning in the invention.

Fig. 3 is a diagram showing the detection accuracy of the SAR image ship detection method based on balance learning in the present invention.

Detailed Description

The invention will be described in further detail with reference to fig. 1, 2 and 3.

Step 1, initializing a data set

Step 2, scene augmentation by using balance scene learning mechanism

Step 2.1, extracting SSDD data set characteristics by using GAN network

As shown in fig. 1, according to the classical GAN network construction method in definition 2,building up and generating an countermeasure network GAN ₀ . Training and optimizing to generate an countermeasure network GAN according to a classical Adam algorithm in definition 4 by taking the new SSDD data obtained in the step 1 as input ₀ The resulting challenge network after training and optimization is denoted GAN.

And then, inputting the new SSDD data obtained in the step 1 into the generated countermeasure network GAN after training and optimization according to the traditional forward propagation method in the definition 5 by taking the new SSDD data obtained in the step 1 as input to obtain the output vector M= { M1, M2, … Mi, … M1160} of the network, wherein Mi is the output vector of the ith picture in the new SSDD data.

Step 2.2, scene clustering is carried out

step 2.3, initializing parameters

The current iteration number is defined as t, t=1, 2, …, I is the maximum iteration number of the K-means clustering algorithm, and i=1000 is initialized. Defining centroid parameters of step t iteration asInitializing an iteration convergence error epsilon as one of algorithm iteration convergence conditions.

Step 2.4, performing iterative operation

First, the formula is adoptedCalculating scene characteristics M of ith picture ⁱ To the first centroid in iteration 1->Distance of (2) is denoted as->/>

Using the formulaCalculating scene characteristics M of ith picture ⁱ To the second centroid in iteration 1->Distance of (2) is denoted as- >

Comparison ofAnd->If->Defining scene feature M of the ith picture in iteration 1 ⁱ Belongs to the second category, otherwise, defining scene feature M of the ith picture in the 1 st iteration ⁱ Belonging to the first category.

Defining the set of all scene features of the first class after the 1 st iteration asThe set of all scene features of the second class is +.>

Let t=2 then perform the following until convergence:

1) Let centroid parameters of step tFor the collection->Let centroid parameter of step t +.>For the collection->Is a mean of the arithmetic of (a).

2) Using the formulaCalculating scene characteristics M of ith picture ⁱ To the first centroid in the t-th iteration +.>Distance of (2) is denoted as->Adopts->Scene feature M of ith picture ⁱ To the second centroid in the t-th iteration +.>Distance of (2) is denoted as->

3) Comparison ofAnd->If->Defining scene feature M of the ith picture in the t-th iteration ⁱ Belongs to the second category, otherwise, defining scene feature M of ith picture in the t-th iteration ⁱ Belonging to the first category. Defining all scene feature sets of the first class after iteration of step t as +.>All scene feature sets of the second class are +.>And outputting a clustering result, and marking the clustering result as CLASS.

4) Calculating the change quantity of centroid parameters of the iteration and the last iteration, marking the change quantity as sigma, and expressing the change quantity as sigma If sigma<Epsilon or t<And I, outputting a clustering result CLASS, otherwise, t=t+1, and returning to the step 1) to continue iteration.

Step 2.5, scene augmentation

According to the clustering result CLASS obtained in the step 2.4 and all pictures in the new SSDD Data, classifying all pictures in the new SSDD Data into two types, wherein the first type is a land scene picture and recorded as Data ₁ The second type is that the offshore scene picture is recorded as Data ₂ . Definition of Data ₁ The number of pictures is N ₁ ，Data ₂ The number of pictures is N ₂ 。

If N ₂ >N ₁ Then from the first category as the coastal scene picture Data ₁ N is selected randomly based on Gaussian distribution ₂ -N ₁ Performing mirror image operation on the picture to obtain N after the mirror image operation ₂ -N ₁ The picture is recorded as Data _1extra . Then mirror the imagePost N ₂ -N ₁ Picture Data _1extra And the first type is the picture Data of the scene of the approach shore ₁ Merging and outputting a new picture set, which is recorded as Data _1new . Definition of Data _2new ＝Data ₂ 。

If N ₂ <＝N ₁ Then from the second category as offshore scene picture Data ₂ N is selected randomly based on Gaussian distribution ₁ -N ₂ Performing mirror image operation on the picture to obtain N after the mirror image operation ₁ -N ₂ The picture is recorded as Data _2extra . Then N after the mirror operation ₁ -N ₂ Picture Data _2extra And the first type is the picture Data of the scene of the approach shore ₂ Merging and outputting a new picture set, which is recorded as Data _2new . Definition of Data _1new ＝Data ₁ 。

Defining a new set of pictures Data _new ＝{Data _1new ,Data _2new }。

Step 3, constructing a forward propagation network

Step 3.1, constructing a balance characteristic pyramid network

As shown in fig. 1, a classical residual network construction method in definition 6 is adopted to construct a residual network with the network layer number of 50, which is denoted as Res-50, and simultaneously, feature graphs generated by the last layer network with different sizes in the residual network Res-50 are respectively denoted as F from large to small according to feature graph sizes ₁ ，F ₂ ，F ₃ ，F ₄ ，F ₅ 。

Will F ₅ Is also denoted as P ₅ 。

According to the convolution and operation in definition 7, F ₄ Feature extraction was performed by a 1×1 convolution sum, and the feature extraction result was denoted as E ₄ The method comprises the steps of carrying out a first treatment on the surface of the By up-sampling operation according to the up-sampling operation in definition 9, P is obtained by up-sampling operation ₅ Feature map size and F of (2) ₄ Consistent, the result after the upsampling operation is denoted U ₅ The method comprises the steps of carrying out a first treatment on the surface of the According to definition 8Cascade operation of (a), E ₄ And U ₅ Superposing, and marking the superposition result as P ₄ 。

According to the convolution and operation in definition 7, F ₃ Feature extraction was performed by a 1×1 convolution sum, and the feature extraction result was denoted as E ₃ The method comprises the steps of carrying out a first treatment on the surface of the By up-sampling operation according to the up-sampling operation in definition 9, P is obtained by up-sampling operation ₄ Feature map size and F of (2) ₃ Consistent, the result after the upsampling operation is denoted U ₄ The method comprises the steps of carrying out a first treatment on the surface of the According to the cascading operation in definition 8, E ₃ And U ₄ Superposing, and marking the superposition result as P ₃ 。

According to the convolution and operation in definition 7, F ₂ Feature extraction was performed by a 1×1 convolution sum, and the feature extraction result was denoted as E ₂ The method comprises the steps of carrying out a first treatment on the surface of the By up-sampling operation according to the up-sampling operation in definition 9, P is obtained by up-sampling operation ₃ Feature map size and F of (2) ₂ Consistent, the result after the upsampling operation is denoted U ₃ The method comprises the steps of carrying out a first treatment on the surface of the According to the cascading operation in definition 8, E ₂ And U ₃ Superposing, and marking the superposition result as P ₂ 。

According to the convolution and operation in definition 7, F ₁ Feature extraction was performed by a 1×1 convolution sum, and the feature extraction result was denoted as E ₁ The method comprises the steps of carrying out a first treatment on the surface of the By up-sampling operation according to the up-sampling operation in definition 9, P is obtained by up-sampling operation ₂ Feature map size and F of (2) ₂ Consistent, the result after the upsampling operation is denoted U ₂ The method comprises the steps of carrying out a first treatment on the surface of the According to the cascading operation in definition 8, E ₁ And U ₂ Superposing, and marking the superposition result as P ₁ 。

By up-sampling operation according to the up-sampling operation in definition 9, P is obtained by up-sampling operation ₅ Feature map size and P of (2) ₃ Consistent, the result after the upsampling operation is H ₅ 。

By up-sampling operation according to the up-sampling operation in definition 9, P is obtained by up-sampling operation ₄ Feature map size and P of (2) ₃ Consistent, the result after the upsampling operation is H ₄ 。

Will P ₃ Is also denoted as H ₅ 。

By pooling operations in definition 10Pooling will P ₂ Feature map size and P of (2) ₃ Consistent, the result after the upsampling operation is H ₂ 。

P is pooled by maximum pooling according to the pooling operation in definition 10 ₁ Feature map size and P of (2) ₃ Consistent, the result after the upsampling operation is H ₁ 。

Will H ₁ ，H ₂ ，H ₃ ，H ₄ ，H ₅ According to the formulaA feature map I is calculated, where k represents the subscript of H and (I, j) represents the spatial sampling position of the feature map.

Taking the feature map I as input, according to the formulaAnd calculating to obtain a characteristic diagram O. Wherein I is _i Features representing the I-th position on the feature map I; o (O) _i Features representing the i-th position on the feature map O;representing a normalization factor; f (I) _i ,I _j ) Is used for calculating I _i And I _j The similarity between functions is expressed as +.>Wherein θ (I) _i )＝W _θ I _i ,φ(I _j )＝W _φ I _J ,W _θ And W is _φ Is a matrix learned by a 1 x 1 convolution operation in definition 7; g (I) _j )＝W _g I _j ，W _g Is a matrix learned by a 1 x 1 convolution operation in definition 7.

All network operations in step 3.1 are taken as balanced feature pyramid networks and are marked as backbones.

Step 3.2, building an area recommendation network

Region recommendation network architecture in accordance with definition 11 The building method comprises the steps of taking the backhaul obtained in the step 3.1 as a characteristic extraction layer, building a regional recommendation network, and marking the regional recommendation network as RPN ₀ 。

Step 3.3, constructing a balance classification regression network

As shown in fig. 2, the balanced classification regression network is divided into a classification header Chead and a regression header Rhead, and full connection layers FC1 and FC2 are constructed according to the conventional full connection layer method in definition 12, the output of FC1 is used as the input of FC2, and FC1 and FC2 are used as classification headers and marked as Clhead; constructing four layers of convolution layers, namely Conv1, conv2, conv3 and Conv4, according to the convolution kernel method in the definition 7; meanwhile, a Pooling layer is built according to the Pooling operation in definition 10, denoted as Pooling. The output of Conv1 is taken as the input of Conv2, the output of Conv2 is taken as the input of Conv3, the output of Conv3 is taken as the input of Conv4, and the output of Conv4 is taken as the input of Pooling. Conv1, conv2, conv3, conv4, and Pooling were used as regression heads and were designated as read. The classification header Clhead and the regression header Rehead have the same feature map input and form a balanced classification regression network together with a backhaul, denoted as BCRN ₀ 。

Step 4, training area recommendation network

Taking the training set Train of the amplified data set Datanew obtained in the step 2 as an area recommendation network RPN ₀ Is to send the training set Train into the regional recommendation network RPN according to the forward propagation method in definition 5 ₀ Performing operation and recording network RPN ₀ As Result0.

Taking the input Result0 and the training set Train obtained in the step 4.1 as inputs, and according to a formulaThe calculation method calculates the IOU value of each recommended frame in Result0, takes the output of the IOU in Result0 which is larger than 0.5 as a positive sample, and marks as Result0p; resu is toThe output of IOU less than 0.5 in lt0 is taken as a negative sample and is denoted Result0n. The total number of samples in the negative sample Result0n is counted as M. Manually inputting the number of required negative samples, and recording as N; the interval number of the equal IOU needed by human input is n _b Record the number of samples in the ith IOU interval as M _i . Setting the random sampling probability of the ith interval to +.>And randomly sampling each IOU interval, and recording the sampling Result of all the IOU intervals of the negative sample as Result0ns.

Step 4.3, training and optimizing the regional recommendation network

And (3) taking the positive sample sampling Result0ps and the negative sample sampling Result0ns obtained in the step 4.2 as inputs, and training and optimizing the regional recommendation network according to the classical Adam algorithm in the definition 4. The region recommendation network RPN1 after training and optimization is obtained.

Step 5, training a balanced classification regression network

Taking the training set Train of the amplified data set Datanew obtained in the step 2 as a balanced classification regression network BCRN ₀ Is to send the training set Train into the balanced classification regression network BCRN according to the forward propagation method in definition 5 ₀ Performing operation, recording balance classification regression network BCRN ₀ As Result1.

Step 5.2, training and optimizing the balance classification regression network

Returning the balance classification obtained in the step 5.1 to the network BCRN ₀ As input, the regional recommendation network is trained and optimized according to the classical Adam algorithm in definition 4.The region recommendation network BCRN1 after training and optimization is obtained.

Step 6, performing alternate training

It is determined whether the epoch set in step 4 is equal to 12. If epoch is not equal to 12, then let epoch=epoch+1, RPN ₀ ＝RPN ₁ 、BCRN ₀ ＝BCRN ₁ Sequentially repeating the steps 4.1, 4.2, 4.3, 5.1 and 5.2, and then returning to the step 6 to judge the epoch again; if epoch is equal to 12, let the trained regional recommendation network RPN1 and the trained balanced classification regression network BCRN1 be denoted as the network BL-Net, and then go to step 7.

Step 7, evaluation method

Step 7.1 Forward propagation

step (3) selecting a frame BS with the highest score from the rest frames;

Step 7.2, calculating the index

As shown in FIG. 3, the detection result R obtained in step 7.1 ^F As input, the accuracy P, R, and precision and recall of the network are calculated using the conventional recall and accuracy calculation method in definition 14A rate curve P (R); using the formulaAnd calculating the average accuracy mAP of SAR ship detection based on balance learning. />

Claims

1. A ship detection method based on balance learning is characterized by comprising the following steps:

step 1, initializing a data set

Adopting a random method to adjust SAR image sequence in the SSDD data set to obtain a new SSDD data set;

step 2, scene augmentation by using balance scene learning mechanism

Step 2.1, extracting SSDD data set characteristics by using GAN network

According to a classical GAN network construction method, constructing and generating an antagonistic network GAN ₀ The method comprises the steps of carrying out a first treatment on the surface of the Taking the new SSDD data obtained in the step 1 as input, training and optimizing to generate an countermeasure network GAN according to a classical Adam algorithm ₀ Obtaining a generated countermeasure network after training and optimization, and marking the generated countermeasure network as GAN;

then, the new SSDD data obtained in the step 1 are taken as input again, and the new SSDD data obtained in the step 1 are input into a generated countermeasure network GAN after training and optimization according to a traditional forward propagation method to obtain output vectors M= { M1, M2, … Mi and … M1160} of the network, wherein Mi is the output vector of the ith picture in the new SSDD data;

Defining the output vector M as scene characteristics of all pictures in the new SSDD data set, and defining Mi as scene characteristics of the ith picture in the new SSDD data set;

step 2.2, scene clustering is carried out

Taking a set M of scene characteristics of all pictures in the new SSDD data obtained in the step 2.1 as input, adopting a traditional K-means clustering algorithm, and clustering the pictures in the new SSDD data set by means of the scene characteristics M:

step 2.3, initializing parameters

For traditional K-meansCentroid parameters in the s clustering algorithm are randomly initialized, and centroid parameters of the K-means clustering algorithm in the first step of iteration are recorded as

Defining the current iteration number as t, t=1, 2, …, I being the maximum iteration number of the K-means clustering algorithm, and initializing i=1000; defining centroid parameters of step t iteration asInitializing an iteration convergence error epsilon as one of algorithm iteration convergence conditions;

step 2.4, performing iterative operation

First, the formula is adoptedCalculating scene characteristics M of ith picture ⁱ To the first centroid in iteration 1->Distance of (2) is denoted as->

Using the formulaCalculating scene characteristics M of ith picture ⁱ To the second centroid in iteration 1Distance of (2) is denoted as->

Comparison ofAnd->If- >Defining scene feature M of the ith picture in iteration 1 ⁱ Belongs to the second category, otherwise, defining scene feature M of the ith picture in the 1 st iteration ⁱ Belonging to the first category;

Let t=2 then perform the following until convergence:

1) Let centroid parameters of step tFor the collection->Let centroid parameter of step t +.>Is a collectionAn arithmetic mean of (a);

3) Comparison ofAnd->If->Defining scene feature M of the ith picture in the t-th iteration ⁱ Belongs to the second category, otherwise, defining scene feature M of ith picture in the t-th iteration ⁱ Belonging to the first category; defining all scene feature sets of the first class after iteration of step t as +.>All scene feature sets of the second class are +.>Outputting a clustering result, and marking the clustering result as CLASS;

4) Calculating the change quantity of centroid parameters of the iteration and the last iteration, marking the change quantity as sigma, and expressing the change quantity as sigma If sigma < epsilon or t<I, outputting a clustering result CLASS, otherwise, enabling t=t+1, and returning to the step 1) to continue iteration;

step 2.5, scene augmentation

According to the clustering result CLASS obtained in the step 2.4 and all pictures in the new SSDD Data, classifying all pictures in the new SSDD Data into two types, wherein the first type is a land scene picture and recorded as Data ₁ The second type is that the offshore scene picture is recorded as Data ₂ The method comprises the steps of carrying out a first treatment on the surface of the Definition of Data ₁ The number of pictures is N ₁ ，Data ₂ The number of pictures is N ₂ ；

If N ₂ >N ₁ Then from the first category as the coastal scene picture Data ₁ N is selected randomly based on Gaussian distribution ₂ -N ₁ Performing mirror image operation on the picture to obtain N after the mirror image operation ₂ -N ₁ The picture is recorded as Data _1extra The method comprises the steps of carrying out a first treatment on the surface of the Then N after the mirror operation ₂ -N ₁ Picture Data _1extra And the first type is the picture Data of the scene of the approach shore ₁ Merging and outputting a new picture set, which is recorded as Data _1new The method comprises the steps of carrying out a first treatment on the surface of the Definition of Data _2new ＝Data ₂ ；

If N ₂ <＝N ₁ Then from the second category as offshore scene picture Data ₂ N is selected randomly based on Gaussian distribution ₁ -N ₂ Performing mirror image operation on the picture to obtain N after the mirror image operation ₁ -N ₂ The picture is recorded as Data _2extra The method comprises the steps of carrying out a first treatment on the surface of the Then N after the mirror operation ₁ -N ₂ Picture Data _2extra And the first type is the picture Data of the scene of the approach shore ₂ Merging and outputting a new picture set, which is recorded as Data _2new The method comprises the steps of carrying out a first treatment on the surface of the Definition of Data _1new ＝Data ₁ ；

Defining a new set of pictures Data _new ＝{Data _1new ,Data _2new }；

Will Data _new Dividing the training set into two parts according to a ratio of 7:3 to obtain a training set and a Test set, wherein the training set is denoted as Train, and the Test set is denoted as Test;

step 3, constructing a forward propagation network

Step 3.1, constructing a balance characteristic pyramid network

Adopting a classical residual network construction method to construct a residual network with the network layer number of 50, and marking the residual network as Res-50, and respectively marking the feature images generated by the last layer network with different sizes in the residual network Res-50 as F according to the feature image size from large to small ₁ ，F ₂ ，F ₃ ，F ₄ ，F ₅ ；

Will F ₅ Is also denoted as P ₅ ；

According to convolution and operation, F ₄ Feature extraction was performed by a 1×1 convolution sum, and the feature extraction result was denoted as E ₄ The method comprises the steps of carrying out a first treatment on the surface of the According to the up-sampling operation, P is processed by the up-sampling operation ₅ Is scaled to the feature map F in Res-50 ₄ The size is consistent, the size is 32 multiplied by 32, and the result after the up-sampling operation is recorded as U ₅ The method comprises the steps of carrying out a first treatment on the surface of the According to cascade operation, E ₄ And U ₅ Superposing, and marking the superposition result as P ₄ ；

According to convolution and operation, F ₃ Feature extraction was performed by a 1×1 convolution sum, and the feature extraction result was denoted as E ₃ The method comprises the steps of carrying out a first treatment on the surface of the According to the up-sampling operation, P is processed by the up-sampling operation ₄ Is scaled to the feature map F in Res-50 ₃ The size is consistent, the size is 64 multiplied by 64, and the result after the up-sampling operation is recorded as U ₄ The method comprises the steps of carrying out a first treatment on the surface of the According to cascade operation, E ₃ And U ₄ Superposing, and marking the superposition result as P ₃ ；

According to convolution and operation, F ₂ Feature extraction was performed by a 1×1 convolution sum, and the feature extraction result was denoted as E ₂ The method comprises the steps of carrying out a first treatment on the surface of the According to the up-sampling operation, P is processed by the up-sampling operation ₃ Is scaled to the feature map F in Res-50 ₂ The sizes are consistent, the size is 128 multiplied by 128, and the result after the up-sampling operation is recorded as U ₃ The method comprises the steps of carrying out a first treatment on the surface of the According to cascade operation, E ₂ And U ₃ Superposing, and marking the superposition result as P ₂ ；

According to convolution and operation, F ₁ Feature extraction was performed by a 1×1 convolution sum, and the feature extraction result was denoted as E ₁ The method comprises the steps of carrying out a first treatment on the surface of the According to the up-sampling operation, P is processed by the up-sampling operation ₂ Is scaled to the feature map F in Res-50 ₁ The sizes are consistent, the size is 256 multiplied by 256, and the result after the up-sampling operation is recorded as U ₂ The method comprises the steps of carrying out a first treatment on the surface of the According to cascade operation, E ₁ And U ₂ Superposing, and marking the superposition result as P ₁ ；

According to the up-sampling operation, P is processed by the up-sampling operation ₅ Is scaled to the feature map P ₃ The size is consistent, the size is 64 multiplied by 64, and the result after the up-sampling operation is recorded as H ₅ ；

According to the up-sampling operation, P is processed by the up-sampling operation ₄ Is scaled to the feature map P ₃ The size is consistent, the size is 64 multiplied by 64, and the result after the up-sampling operation is recorded as H ₄ ；

Will P ₃ Is also denoted as H ₅ ；

By pooling P by maximum pooling according to pooling operations ₂ Is scaled to the feature map P ₃ The size is consistent, the size is 64 multiplied by 64, and the result after the up-sampling operation is recorded as H ₂ ；

By pooling P by maximum pooling according to pooling operations ₁ Is scaled to the feature map P ₃ The size is consistent, the size is 64 multiplied by 64, and the result after the up-sampling operation is recorded as H ₁ ；

Will H ₁ ，H ₂ ，H ₃ ，H ₄ ，H ₅ According to the formulaCalculating to obtain a feature map I, wherein k represents the subscript of H, and (I, j) represents the spatial sampling position of the feature map;

taking the feature map I as input, according to the formulaCalculating to obtain a feature map O; wherein I is _i Features representing the I-th position on the feature map I; o (O) _i Features representing the i-th position on the feature map O; />Representing a normalization factor; f (I) _i ,I _j ) Is used for calculating I _i And I _j Between which are locatedThe function of similarity is expressed asWherein θ (I) _i )＝W _θ I _i ,φ(I _j )＝W _φ I _J ,W _θ And W is _φ Is a matrix learned by a 1 x 1 convolution operation; g (I) _j )＝W _g I _j ，W _g Is a matrix learned by a 1 x 1 convolution operation;

taking all network operations in the step 3.1 as balance characteristic pyramid networks, and marking the balance characteristic pyramid networks as backbones;

Step 3.2, building an area recommendation network

According to the regional recommendation network construction method, the backbond obtained in the step 3.1 is taken as a characteristic extraction layer, and a regional recommendation network is constructed and marked as RPN ₀ ；

Step 3.3, constructing a balance classification regression network

The equilibrium classification regression network is divided into a classification head Chead and a regression head Rhead, a full-connection layer FC1 and FC2 are constructed according to a traditional full-connection layer method, the output of the FC1 is used as the input of the FC2, the FC1 and the FC2 are used as classification heads, and the classification heads are marked as Clhead; four layers of convolution layers, namely Conv1, conv2, conv3 and Conv4, are constructed according to a convolution kernel method; meanwhile, a Pooling layer is constructed according to Pooling operation and is marked as Pooling; taking the output of Conv1 as the input of Conv2, the output of Conv2 as the input of Conv3, the output of Conv3 as the input of Conv4, and the output of Conv4 as the input of Pooling; conv1, conv2, conv3, conv4, and Pooling were used as regression heads, and were designated as read; the classification header Clhead and the regression header Rehead have the same feature map input and form a balanced classification regression network together with a backhaul, denoted as BCRN ₀ ；

Step 4, training area recommendation network

Setting an iteration parameter epoch, and initializing an epoch value to be 1;

Taking the training set Train of the amplified data set Datanew obtained in the step 2 as an area recommendation network RPN ₀ Is used for sending the training set Train into the regional recommendation network RPN according to the forward propagation method ₀ Performing operation and recording network RPN ₀ As Result0;

Taking the input Result0 and the training set Train obtained in the step 4.1 as inputs, and according to a formulaThe calculation method calculates the IOU value of each recommended frame in Result0, takes the output of the IOU in Result0 which is larger than 0.5 as a positive sample, and marks as Result0p; taking the output of IOU less than 0.5 in Result0 as a negative sample, and marking as Result0n; counting the total number of samples in the negative sample Result0n as M; manually inputting the number of required negative samples, and recording as N; the interval number of the equal IOU needed by human input is n _b Record the number of samples in the ith IOU interval as M _i The method comprises the steps of carrying out a first treatment on the surface of the Setting the random sampling probability of the ith interval to +.>Randomly sampling each IOU interval, and marking the sampling Result of all the IOU intervals of the negative sample as Result0ns;

counting the number of samples in a positive sample Result0P, and marking as P; setting random sampling probability asRandomly sampling Result0p, and marking the positive sample sampling Result as Result0ps;

Step 4.3, training and optimizing the regional recommendation network

Taking the positive sample sampling Result0ps and the negative sample sampling Result0ns obtained in the step 4.2 as inputs, and training and optimizing the regional recommendation network according to a classical Adam algorithm; obtaining a region recommendation network RPN1 after training and optimization;

step 5, training a balanced classification regression network

Step 2The training set Train of the amplified data set Datanew obtained in the step (a) is used as a balanced classification regression network BCRN ₀ Is used for sending the training set Train into the balanced classification regression network BCRN according to the forward propagation method ₀ Performing operation, recording balance classification regression network BCRN ₀ As Result1;

step 5.2, training and optimizing the balance classification regression network

Returning the balance classification obtained in the step 5.1 to the network BCRN ₀ As input, training and optimizing the regional recommendation network according to a classical Adam algorithm to obtain a trained and optimized regional recommendation network BCRN1;

step 6, performing alternate training

Judging whether the epoch set in the step 4 is equal to 12; if epoch is not equal to 12, then let epoch=epoch+1, RPN ₀ ＝RPN ₁ 、BCRN ₀ ＝BCRN ₁ Sequentially repeating the steps 4.1, 4.2, 4.3, 5.1 and 5.2, and then returning to the step 6 to judge the epoch again; if the epoch is equal to 12, marking the trained regional recommendation network RPN1 and the trained balanced classification regression network BCRN1 as a network BL-Net, and then performing step 7;

step 7, evaluation method

Step 7.1 Forward propagation

Taking the network BL-Net obtained in the step 6 and the test set Tests obtained in the step 2.5 as inputs, adopting a traditional forward propagation method to obtain a detection result, and marking the detection result as R;

taking a detection result R as input, and removing redundant frames in the detection result R1 by adopting a traditional non-maximum suppression method, wherein the specific steps are as follows:

step (3) selecting a frame BS with the highest score from the rest frames;

repeating the steps of IoU and discarding in step (2) until no frame can be discarded, and marking the last frame as the final detection result as R ^F ；

Step 7.2, calculating the index

With the detection result R obtained in step 7.1 ^F As input, adopting a traditional recall rate and precision rate calculation method to calculate the precision rate P, recall rate R and precision rate and recall rate curve P (R) of the network; using the formulaAnd calculating the average accuracy mAP of SAR ship detection based on balance learning.