CN113989672A

CN113989672A - A Balanced Learning-Based Vessel Detection Method in SAR Images

Info

Publication number: CN113989672A
Application number: CN202111268008.2A
Authority: CN
Inventors: 张晓玲; 柯潇; 张天文; 师君; 韦顺军
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2021-10-29
Filing date: 2021-10-29
Publication date: 2022-01-28
Anticipated expiration: 2041-10-29
Also published as: CN113989672B

Abstract

The invention discloses a SAR image ship detection method based on balance learning, which is based on a deep learning theory and mainly comprises a balance scene learning mechanism, a balance interval sampling mechanism, a balance characteristic pyramid network and a balance classification regression network. The balanced scene learning mechanism solves the problem of unbalanced sample scene by amplifying the shore-approaching sample; the balance interval sampling mechanism solves the problem of scene unbalance of image samples by dividing an IOU into a plurality of intervals and sampling samples such as each interval and the like; the balanced feature pyramid network extracts features with more multi-scale detection capability through a feature enhancement method, so that the problem of unbalanced ship scale features is solved; the balanced classification regression network solves the problem of unbalanced classification regression tasks by designing two different sub-networks for classification and regression tasks. The method has the advantages of overcoming the unbalance problem in the prior art and improving the detection precision of the ship in the SAR image.

Description

SAR image ship detection method based on balance learning

Technical Field

The invention belongs to the technical field of Synthetic Aperture Radar (SAR) image interpretation, and relates to a SAR image ship detection method based on balance learning.

Background

Synthetic Aperture Radar (SAR) is an advanced active microwave sensor for high-resolution earth observation, and is still the leading technology in the field of ocean monitoring at present. The method is widely applied to military and civil fields of marine traffic control, disaster relief, fishery management and the like. Currently, while optical or hyperspectral satellites provide some monitoring services, SAR with all-day, all-weather working capability is more suitable for climatically changing oceans. Therefore, SAR is an indispensable remote sensing tool in marine regional awareness.

Ships are the most important participants in the ocean. Due to the huge value of the method in the aspects of sunken ship rescue, marine traffic control, fishery management and the like, the method is more and more valued by scholars. The research on marine vessel surveillance has been vigorously developed since the launch of the first SAR satellite Seasat-1 in the united states. In addition, the data volume generated by various SAR sensors is large at present, and intelligent detection on marine targets is urgently needed. Therefore, ship SAR detection has become a research hotspot of a high-resolution earth observation boundary. The details are shown in the literature 'Wangzaiyong, Chonghao, Tianjin, SAR image ship target rapid detection method research [ J ]. electronic ship engineering, 2016,36(09):27-30+ 88'

In recent years, with the rapid rise of Deep Learning (DL), many scholars in the SAR community have started to research DL-based detection methods. Compared with the traditional characteristic-based method, the DL-based method has the outstanding advantages of simplicity, full automation (namely, no complex basic stages such as land and sea segmentation, coastline detection, speckle correction and the like), high speed, high precision and the like. Although their underlying principles are not recognized, it can liberate productivity and greatly improve work efficiency. This enables a qualitative leap in the intelligent interpretation of SAR images. See "Dulan, Wangmcheng, Wangsan, Weidi, Liluol. Single channel SAR target detection and discrimination research progress in complex scene overview [ J ] Radar report, 2020,9(01):34-54 ].

However, existing deep learning based SAR ship detectors suffer from some imbalance problems, potentially preventing further accuracy improvements. Specifically, the method comprises the following steps: 1) the image sample scene is unbalanced, i.e. the number of image samples of the offshore vessel is unbalanced. Simply stated, an onshore vessel has far fewer samples than an offshore vessel. 2) The positive and negative samples are not balanced, i.e. the number of positive samples (ship) is not balanced with the number of negative samples (background). There are far more negative samples than positive samples. 3) The ship dimension characteristic is unbalanced, namely the multi-dimension ship characteristic is unbalanced. For dynamic vessel detection, the vessel size is also varied due to different spatial resolution and vessel classification. 4) The classification regression task is unbalanced, i.e. the difficulty level of the classification of the vessel and the regression of the vessel position is unbalanced, the latter being much more difficult than the former.

Therefore, in order to solve the problems of the unbalance, a SAR image ship detection method based on balance learning is provided. The method comprises a balanced scene learning mechanism, a balanced interval sampling mechanism, a balanced feature pyramid network and a balanced classification regression network, wherein the balanced scene learning mechanism, the balanced interval sampling mechanism, the balanced feature pyramid network and the balanced classification regression network are four mechanisms for solving the unbalanced problem. Experimental results on SSDD datasets show that the proposed method is superior to other deep learning based detection methods.

Disclosure of Invention

The invention belongs to the technical field of Synthetic Aperture Radar (SAR) image interpretation, and discloses a ship detection method based on balance learning, which is used for solving the problems of unbalanced image sample scene, unbalanced positive and negative samples, unbalanced ship scale features and unbalanced classification regression tasks in the prior art. The method is based on a deep learning theory and mainly comprises a balanced scene learning mechanism, a balanced interval sampling mechanism, a balanced feature pyramid network and a balanced classification regression network. The balanced scene learning mechanism solves the problem of unbalanced sample scene by amplifying the ship sample in shore; the balance interval sampling mechanism solves the problem of scene unbalance of image samples by dividing an IOU into a plurality of intervals and sampling samples such as each interval and the like; the balanced feature pyramid network extracts features with more multi-scale detection capability through a feature enhancement method, so that the problem of unbalanced ship scale features is solved; the balanced classification regression network solves the problem of unbalanced classification regression tasks by designing two different sub-networks for classification and regression tasks. Experiments prove that on an SSDD data set, the detection precision of the SAR image ship detection method based on the balance learning is 95.25%, the detection precision of the existing SAR ship detection method based on the deep learning is 92.27%, and the SAR detection method based on the balance learning improves the ship detection precision.

For the convenience of describing the present invention, the following terms are first defined:

definition 1: SSDD data set acquisition method

The SSDD data set refers to a SAR Ship Detection data set, which is called SAR Ship Detection Dataset in all english, and SSDD is the first open SAR Ship Detection data set. The SAR images including Sentinil-1, RadarSat-2 and TerrasAR-X are 1160 frames in total, and the resolution is 500X 500 pixels. The SSDD has 2551 ships. The minimum is 28 pixels²The maximum is 62878 pixels²(pixel²Is the product of the width pixel and the height 1). In SSDD, images with suffixes 1 and 9 (232 samples) are chosen as the test set and the rest as the training set (928 samples). The method for acquiring the SSDD data set can be used for detecting the ship target from the SAR image [ J ] based on the convolutional neural network in the reference documents of Lijianwei, Quchang, Pengshan, Dengdong and the like]Systems engineering and electronics, 2018,40(09): 1953-.

Definition 2: classic GAN network construction method

The classical idiomatic countermeasure network (GAN) is a deep learning model, and is one of the most promising methods for unsupervised learning in complex distribution in recent years. The model passes through two modules in the framework: the mutual game learning of the Generative Model (Generative Model) and the Discriminative Model (Discriminative Model) yields a reasonably good output. In the original GAN theory, it is not required that G and D are both neural networks, but only that functions that can be generated and discriminated correspondingly are fitted. Deep neural networks are generally used as G and D in practice. An excellent GAN network can realize rapid scene feature extraction. The classic GAN network construction method is described in "I.J. Goodfellow et al", "genetic adaptive networks", "International Conference on Neural Information Processing Systems, pp.2672-2680,2014"

Definition 3: classic K-means clustering algorithm

The classic K-means clustering algorithm is a clustering analysis algorithm for iterative solution, is often used as an unsupervised classification task, and comprises the steps of dividing data into K groups in advance, randomly selecting K objects as initial clustering centers, then calculating the distance between each object and each seed clustering center, and allocating each object to the nearest clustering center. The cluster centers and the objects assigned to them represent a cluster. The cluster center of a cluster is recalculated for each sample assigned based on the objects existing in the cluster. This process will be repeated until some termination condition is met. The classic K-means clustering algorithm details are "litting", the research of the improved K-means clustering algorithm [ D ]. Anhui university, 2015. ".

Definition 4: classical Adam algorithm

The classical Adam algorithm is an extension of the stochastic gradient descent method and has recently been widely used in deep learning applications in computer vision and natural language processing. Classical Adam is different from classical random gradient descent methods. The random gradient descent maintains a single learning rate for all weight updates, and the learning rate does not change during the training process. Each network weight maintains a learning rate and is adjusted individually as learning progresses. The method calculates adaptive learning rates for different parameters from budgets of the first and second moments of the gradient. The classic Adam algorithm is detailed in "Kingma, d.; ba, J.Adam: A Method for Stocharistic optimization. arXiv 2014, arXiv:1412.6980.

Definition 5: classical forward propagation method

The forward propagation method is the most basic method in deep learning, and mainly carries out forward reasoning on input according to parameters and connection methods in a network so as to obtain the output of the network. The forward propagation method is detailed in "https:// www.jianshu.com/p/f30c8 daebebeb".

Definition 6: classic residual error network construction method

The residual network is a convolutional neural network proposed by 4 scholars from Microsoft Research, and wins image classification and object Recognition in the 2015 ImageNet Large Scale Visual Recognition Competition (ILSVRC). The residual network is characterized by easy optimization and can improve accuracy by adding considerable depth. The internal residual block uses jump connection, and the classical residual network construction method of the gradient disappearance problem caused by increasing the depth in the deep neural network is relieved. The classical Residual network construction method is described in detail in K.He et al, "Deep Residual Learning for Image registration," IEEE Conf.Compout.Vis.Pattern registration, 2016, pp.770-778.

Definition 7: conventional convolution kernel operation

The convolution kernel is a node that implements weighting and then summing values within a small portion of a rectangular region in an input feature map or picture, respectively, as an output. Each convolution kernel requires the manual specification of multiple parameters. One type of parameter is the length and width of the node matrix processed by the convolution kernel, and the size of this node matrix is also the size of the convolution kernel. The other type of convolution kernel has parameters of the depth of the unit node matrix obtained by processing, and the depth of the unit node matrix is also the depth of the convolution kernel. In the convolution operation process, each convolution kernel slides on input data, then an inner product of the whole convolution kernel and the corresponding position of the input data is calculated, then the inner product is processed through a nonlinear function to obtain a final result, and finally the results of all the corresponding positions form a two-dimensional characteristic diagram. Each convolution kernel generates a two-dimensional feature map, and the feature maps generated by the plurality of convolution kernels are overlapped to form a three-dimensional feature map. The traditional convolution kernel operation is detailed in 'Vanli, Zhao hong Wei, Zhaoyu, Huhuangshui, Wangxing' object detection research based on deep convolution neural network reviews [ J ] optical precision engineering, 2020,28(05):1152 + 1164 ].

Definition 8: conventional cascading operation

The cascade is an important operation in the network structure design, and is used for combining features, fusing the features extracted by a plurality of convolution feature extraction frameworks or fusing the information of an output layer, thereby enhancing the feature extraction capability of the network. The cascade method is detailed in https:// blog.csdn.net/alxe _ map/arrow/detail/80506051 utm _ medium ═ distribution.pc _ release.non-task-patch-block-command-message-machine-message-pai 2-3.channel _ param & depth _1-utm _ source ═ distribution.pc _ release.non-task-block-blogcommandedfromachine-pai 2-3.channel _ param.

Definition 9: conventional upsampling operations

The upsampling is an operation of performing a method on a picture or a feature map, and the main upsampling operation usually adopts an interpolation method, that is, a suitable interpolation algorithm is adopted to insert new elements between pixel points on the basis of original image pixels. In the mainstream interpolation algorithm, the adjacent interpolation is simple and easy to realize, and the application is common in the early stage. However, this method can produce significant jagged edges and mosaics in the new image. The bilinear interpolation method has a smoothing function, can effectively overcome the defects of the adjacent method, but can degrade the high-frequency part of the image to make the details of the image blurred. When the magnification factor is higher, high-order interpolation, such as bicubic and cubic spline interpolation, has good effect compared with low-order interpolation. These interpolation algorithms can continue the continuity of the gray scale change of the original image with the pixel gray scale value generated by interpolation, thereby naturally smoothing the gray scale change of the enlarged image. However, in the image, there are abrupt changes in the gray value between some pixels and the adjacent pixels, i.e., there are gray discontinuities. These pixels with abrupt changes in gray value are the edge pixels of the image that describe the contour or texture of the object. The classical upsampling operation is detailed in https:// blog.csdn.net/weixin _ 43960370/article/detail/106049708 utm _ term ═ E5% 8D% B7% E7% A7% AF% E7% 89% B9% E5% BE 81% E5% 9B% BE 4% B8% 8A 9% 87% E6% A0% B7& utm _ medium ═ pc _ aggpage _ search _ result. non-task-block-2-all-sobaiidum web-default-1-106049708 & spm ═ 3001.4430 ".

Definition 10: conventional pooling operations

The Pooling operation (Pooling) is a very common operation in CNN, the Pooling layer is used for reducing the dimension of data by simulating a human visual system, the Pooling operation is also commonly called sub-sampling (Subsampling) or down-sampling (Downsampling), and when a convolutional neural network is constructed, the Pooling operation is often used after a convolutional layer to reduce the characteristic dimension of the convolutional layer output, so that network parameters can be effectively reduced, and an over-fitting phenomenon can be prevented. Classical pooling is described in detail in "https:// www.zhihu.com/query/303215483/answer/615115629"

Definition 11: traditional regional recommendation network construction method

The regional recommendation network is a sub-network in the Faster R-CNN for extracting regions where targets may exist in the picture. The regional recommendation network is a full convolution network that takes as input the convolution signature of the underlying network output, the output being the target confidence score for each candidate box. The traditional regional recommended network construction method is described in detail in "Ren S, He K, Girshick R, et al. faster R-CNN: Towards read-Time Object Detection with Region pro-posal Networks [ J ]. IEEE Transactions on Pattern Analysis & Machine Analysis, 2017,39(6):1137 and 1149"

Definition 12: conventional full link layer approach

The fully-connected layer is a part of a convolutional neural network, the input and output sizes of the fully-connected layer are fixed, and each node is connected with all nodes of the previous layer and is used for integrating the extracted features. The full link layer method is described in detail in "Haoren Wang, Haotian Shi, Ke Lin, Chengjin Qin, Liqun Zhao, Yixiang Huang, Chengliang Liu.A. high-precision arrhythmia classification method on dual functional connected network [ J ]. biological Signal Processing and Control 2020,58 ].

Definition 13: conventional non-maxima suppression method

The non-maximum suppression method is an algorithm used for removing redundant detection boxes in the field of target detection. In the forward propagation result of the classical detection network, the situation that the same target corresponds to a plurality of detection boxes often occurs. Therefore, an algorithm is needed to select a detection box with the best quality and the highest score from a plurality of detection boxes of the same target. Non-maxima suppression performs a local maximum search by calculating an overlap rate threshold. Non-maxima suppression methods are detailed in "https:// www.cnblogs.com/makefile/p/nms.

Definition 14: traditional recall ratio and accuracy calculation method

Recall R refers to the number of correct predictions in all positive samples, expressed as

The precision ratio P refers to the proportional expression of the correct number in the result predicted as positive example as

Wherein tp (true positive) represents a positive sample predicted to be a positive value by the model; fn (false negative) represents the negative sample predicted by the model as negative; fp (false positive) is expressed as a positive sample predicted to be negative by the model. The conventional recall rate and accuracy curve P (R) refers to a function with R as an independent variable and P as a dependent variable, and the method for solving the numerical values of the parameters is shown in the literature' Li navigation, statistical learning method [ M]Beijing, Qinghua university Press, 2012 ".

The invention discloses a ship detection method based on balance learning, which comprises the following steps:

step 1, initializing SSDD data set

And adjusting the SAR image sequence in the SSDD data set by adopting a random method to obtain a new SSDD data set.

Step 2, carrying out scene augmentation by utilizing a balanced scene learning mechanism

Step 2.1, extracting SSDD data set characteristics by using GAN network

Adopting typical GAN network construction method in definition 2 to build and generate confrontation network GAN₀. Taking the new SSDD data obtained in the step 1 as input, adopting the classical Adam algorithm in the definition 4 to train and optimize to generate the countermeasure network GAN₀Generation of countermeasure networks after training and optimizationAnd is denoted as GAN.

Then, taking the new SSDD data obtained in step 1 as input again, according to the conventional forward propagation method in definition 5, inputting the new SSDD data obtained in step 1 into the trained and optimized generative confrontation network GAN, and obtaining an output vector M of the network, which is { M1, M2, … Mi, … M1160}, where Mi is an output vector of the ith picture in the new SSDD data.

And defining an output vector M as the scene characteristics of all pictures in the new SSDD data set, and defining Mi as the scene characteristics of the ith picture in the new SSDD data set.

Step 2.2, clustering scenes

Taking the set M of scene features of all pictures in the new SSDD data obtained in the step 2.1 as input, adopting a traditional K-means clustering algorithm in definition 3, and clustering the pictures in the new SSDD data set by means of the scene features M:

step 2.3, initializing parameters

For the centroid parameter in the traditional K-means clustering algorithm in definition 3, randomly initializing the centroid parameter of the K-means clustering algorithm in the first iteration step, and recording the centroid parameter as the centroid parameter

Defining the current iteration number as t, t as 1,2, …, and I as the maximum iteration number of the K-means clustering algorithm, and initializing I as 1000. Defining the centroid parameter of the t-th iteration as

And initializing an iteration convergence error epsilon as one of iteration convergence conditions of the algorithm.

Step 2.4, carrying out iterative operation

Firstly, using a formula

Calculating to obtain scene characteristics M of the ith pictureⁱTo the first centroid in the 1 st iteration

Is marked as

Using a formula

Calculating to obtain scene characteristics M of the ith pictureⁱTo the second centroid in the 1 st iteration

Is marked as

Comparison

And

if it is

Then define: scene features M of ith picture in 1 st iterationⁱBelong to the second category, otherwise define the scene characteristics M of the ith picture in the 1 st iterationⁱBelonging to the first category.

Defining: after the 1 st iteration, the set of all scene features of the first class is

The set of all scene features of the second class is

Then let t be 2, perform the following until convergence:

1) let the centroid parameter of the t step

Is a set

The arithmetic mean of (1), let the centroid parameter of the t step

Is a set

Is calculated as the arithmetic mean of (1).

2) Using a formula

Calculating to obtain scene characteristics M of the ith pictureⁱTo the first centroid in the t-th iteration

Is marked as

Using a formula

Calculating to obtain scene characteristics M of the ith pictureⁱTo the second centroid in the t-th iteration

Is marked as

3) Comparison

And

if it is

Then define: scene features M of ith picture in t-th iterationⁱBelong to the second category, otherwise define: scene features M of ith picture in t-th iterationⁱBelonging to the first category. Defining: after the t-th iteration, all scene feature sets in the first class are

All scene features of the second class are

And outputting a clustering result, and marking as CLASS.

4) Calculating the variation of the centroid parameter between the iteration and the last iteration, and recording as sigma, wherein the expression is

If σ is<Epsilon or t<And I, outputting a clustering result CLASS, otherwise, t is t +1, and then returning to the step 1) to continue iteration.

Step 2.5, carrying out scene amplification

Dividing all pictures in the new SSDD Data into two types according to the CLASS obtained from the step 2.4 and all pictures in the new SSDD Data, wherein the first type is a landing scene picture and is marked as Data₁The second type is offshore scene picture marked as Data₂. Defining: data₁Number of pictures of N₁，Data₂Number of pictures of N₂。

If N is present₂>N₁Then from the first class as the landing scene picture Data₁In the method, N is randomly selected based on Gaussian distribution₂-N₁Performing traditional mirror image operation on the picture to obtain N after the mirror image operation₂-N₁Opening a picture, recording as Data_1extra. Then N after the mirroring operation₂-N₁Picture Data_1extraAnd the first type is the land-backing scene picture Data₁Merging and outputting a new picture set which is recorded as Data_1new. Defining Data_2new＝Data₂。

If N is present₂<＝N₁From the second class, Data is an offshore scene picture₂In the method, N is randomly selected based on Gaussian distribution₁-N₂Carrying out traditional mirror image operation on a picture to obtain N after the mirror image operation₁-N₂Opening a picture, recording as Data_2extra. Then N after the mirroring operation₁-N₂Picture Data_2extraAnd the first type is the land-backing scene picture Data₂Merging and outputting a new picture set which is recorded as Data_2new. Defining Data_1new＝Data₁。

Defining a new set of pictures Data_new＝{Data_1new,Data_2new}。

Will Data_newAnd dividing the training set into two parts according to a 7:3 ratio to obtain a training set and a Test set, wherein the training set is marked as Train, and the Test set is marked as Test.

Step 3, building a forward propagation network

Step 3.1, building a balanced feature pyramid network

Adopting a classical residual error network construction method in definition 6 to construct a residual error network with 50 network layers, marking as Res-50, and respectively marking as F characteristic graphs generated by the last layer of network with different sizes in the residual error network Res-50 from large to small according to the size of the characteristic graphs₁，F₂，F₃，F₄，F₅。

F is to be₅Is otherwise denoted as P₅。

Using the conventional convolution kernel operation in definition 7, F₄Performing feature extraction by using 1 × 1 convolution sum, and recording the feature extraction result as E₄；

With the conventional upsampling operation in definition 9, P is upsampled by₅Feature size of (D) and (F)₄If the result of the sampling operation is consistent, the result is recorded as U₅；

With the conventional cascading operation in definition 8, E₄And U₅Overlapping, and recording the overlapping result as P₄。

Using the conventional convolution kernel operation in definition 7, F₃Performing feature extraction by using 1 × 1 convolution sum, and recording the feature extraction result as E₃；

With the conventional upsampling operation in definition 9, P is upsampled by₄Feature size of (D) and (F)₃If the result of the sampling operation is consistent, the result is recorded as U₄；

With the conventional cascading operation in definition 8, E₃And U₄Overlapping, and recording the overlapping result as P₃。

Using the conventional convolution kernel operation in definition 7, F₂Performing feature extraction by using 1 × 1 convolution sum, and recording the feature extraction result as E₂；

With the conventional upsampling operation in definition 9, P is upsampled by₃Feature size of (D) and (F)₂If the result of the sampling operation is consistent, the result is recorded as U₃；

With the conventional cascading operation in definition 8, E₂And U₃Overlapping, and recording the overlapping result as P₂。

Using the conventional convolution kernel operation in definition 7, F₁Performing feature extraction by using 1 × 1 convolution sum, and recording the feature extraction result as E₁；

With the conventional upsampling operation in definition 9, P is upsampled by₂Feature size of (D) and (F)₂If the result of the sampling operation is consistent, the result is recorded as U₂；

With cascading operation in definition 8, E₁And U₂Overlapping, and recording the overlapping result as P₁。

With the conventional upsampling operation in definition 9, P is upsampled by₅Feature size and P of₃When the result of the sampling operation is consistent, the result is recorded as H₅。

With the conventional upsampling operation in definition 9, P is upsampled by₄Feature size and P of₃When the result of the sampling operation is consistent, the result is recorded as H₄。

Will P₃Is otherwise denoted as H₅。

P is pooled by max pooling using the conventional pooling operation in definition 10₂Feature size and P of₃When the result of the sampling operation is consistent, the result is recorded as H₂。

P is pooled by max pooling using the conventional pooling operation in definition 10₁Feature size and P of₃When the result of the sampling operation is consistent, the result is recorded as H₁。

For H₁，H₂，H₃，H₄，H₅By the formula

A feature map I is computed, where k represents the index of H and (I, j) represents the spatial sample position of the feature map.

Taking the characteristic diagram I as input and adopting a formula

And calculating to obtain a characteristic diagram O. Wherein, I_iA feature representing the ith position on the feature map I; o is_iA feature representing the ith position on the feature map O;

represents a normalization factor; f (I)_i,I_j) Is used to calculate I_iAnd I_jThe function of similarity between the two is expressed as

Wherein, theta (I)_i)＝W_θI_i,φ(I_j)＝W_φI_J,W_θAnd W_φIs a matrix learned by the 1 × 1 convolution operation in definition 7; g (I)_j)＝W_gI_j，W_gIs a matrix learned by the 1 × 1 convolution operation in definition 7.

And 3.1, obtaining a balanced characteristic pyramid network after all network operations in the step 3.1 are completed, and marking as a backhaul.

Step 3.2, building a regional recommendation network

Adopting a traditional regional recommended network construction method in the definition 11, taking the backhaul obtained in the step 3.1 as a feature extraction layer, constructing a regional recommended network, and recording the regional recommended network as RPN₀。

Step 3.3, building a balance classification regression network

Constructing full link layers FC1 and FC2 by adopting the traditional full link layer method in definition 12, taking the output of FC1 as the input of FC2, taking FC1 and FC2 as classification heads and marking as Clhead;

constructing four convolutional layers by adopting the traditional convolutional kernel method in definition 7, wherein the convolutional layers are Conv1, Conv2, Conv3 and Conv 4; meanwhile, the Pooling layer is constructed using the conventional Pooling operation in definition 10, denoted Pooling. The output of Conv1 was taken as the input of Conv2, the output of Conv2 as the input of Conv3, the output of Conv3 as the input of Conv4, and the output of Conv4 as the input of Pooling. Conv1, Conv2, Conv3, Conv4 and Pooling were used as regression heads and labeled Rehead. The Classification head Clhead and the regression head Rehead have the same characteristic diagram input, and together with the backhaul, the Classification head Clhead and the regression head Rehead form a balanced classification regression network which is marked as BCRN₀。

Step 4, training area recommendation network

An iteration parameter epoch is set, and an initial epoch value is 1.

Step 4.1, forward propagation is carried out on the regional recommendation network

Taking the training set Train of the amplified data set Datanew obtained in the step 2 as a regional recommended network (RPN)₀Using the conventional forward propagation method in definition 5 to send the training set Train into the regional recommendation network RPN₀Computing and recording network RPN₀As Result 0.

Step 4.2, carrying out balance interval sampling on the forward propagation result

Taking the input Result0 and the training set Train obtained in the step 4.1 as input, and adopting a formula

Calculating the IOU value of each recommendation box in Result0, and taking the output of the IOU in Result0 larger than 0.5 as a positive sample, and recording as Result0 p; the output of Result0 with an IOU less than 0.5 is taken as a negative sample and is denoted as Result0 n. The total number of samples in the negative sample Result0n is counted as M. Manually inputting the number of required negative samples, and recording the number as N; the number of intervals for dividing IOU equally by human input is n_bThe number of samples in the ith IOU interval is M_i. Setting the random sampling probability of the ith interval as

And randomly sampling each IOU interval, and recording the sampling results of all the IOU intervals of the negative samples as Result0 ns.

The number of samples in the positive sample Result0P is counted and is denoted as P. Setting a random sampling probability of

Result0p was sampled randomly and the positive sample sampling Result was recorded as Result0 ps.

Step 4.3, training and optimizing the regional recommendation network

And (3) taking the positive sample sampling Result0ps and the negative sample sampling Result0ns obtained in the step 4.2 as input, and training and optimizing the regional recommendation network by adopting a classic Adam algorithm in definition 4. And obtaining the RPN1 of the area recommendation network after training and optimization.

Step 5, training the balance classification regression network

Step 5.1, forward propagation is carried out on the balance classification regression network

Taking the training set Train of the amplified data set Datanew obtained in the step 2 as a balance classification regression network BCRN₀The training set Train is sent to the BCRN by the traditional forward propagation method in definition 5₀Calculating, and recording balance classification regression network BCRN₀As Result 1.

Step 5.2, training and optimizing the balance classification regression network

The balance obtained in step 5.1 is classified backHome network BCRN₀Using Result1 as an input, the area recommendation network is trained and optimized using the classical Adam algorithm in definition 4. And obtaining the trained and optimized regional recommended network BCRN 1.

Step 6, alternate training is carried out

It is determined whether epoch set in step 4 is equal to 12. If the epoch is not equal to 12, let the epoch be epoch +1, RPN₀＝RPN₁、BCRN₀＝BCRN₁Sequentially repeating the step 4.1, the step 4.2, the step 4.3, the step 5.1 and the step 5.2, and then returning to the step 6 to judge the epoch again; if the epoch is equal to 12, let the trained region recommendation network RPN1 and the trained balanced classification regression network BCRN1 note as network BL-Net, and then go to step 7.

Step 7, evaluation method

Step 7.1, Forward propagation

And (5) taking the network BL-Net obtained in the step 6 and the test set Tests obtained in the step 2.5 as input, and obtaining a detection result by adopting a traditional forward propagation method defined by the definition 5, wherein the detection result is marked as R.

Taking the detection result R as an input, removing a redundant box in the detection result R1 by adopting the conventional non-maximum suppression method in definition 13, and specifically performing the following steps:

firstly, marking a box with the highest score in a detection result R1 as a BS;

the step (2) then adopts a calculation formula as follows:

calculating an overlapping rate threshold (IoU) of all the frames of the detection result R1; discard IoU>A frame of 0.5;

step (3) selecting a frame BS with the highest score from the rest frames;

repeating the calculation IoU and discarding processes in the step (2) until no frame can be discarded, and the last remaining frame is the final detection result and is marked as R^F。

Step 7.2, calculating indexes

Using the detection result R obtained in step 7.1^FAs an input to the process, the process may,calculating the precision ratio P, the recall ratio R and a precision ratio and recall ratio curve P (R) of the network by adopting a traditional recall ratio and precision ratio calculation method in definition 14;

using a formula

And calculating to obtain the average detection accuracy mAP of the SAR ship based on balance learning.

The invention has the innovation point that four balance learning methods, namely a balance scene learning mechanism, a balance interval sampling mechanism, a balance characteristic pyramid network and a balance classification regression network, are introduced, so that four unbalance problems of image sample scene unbalance, positive and negative sample unbalance, ship scale characteristic unbalance and classification regression task unbalance in the conventional SAR ship detection method based on deep learning are solved. The SAR image ship detection mAP adopting the method is 95.25 percent and exceeds a suboptimal SAR image ship detector by 3 percent; the detection mAP of the SAR image ship-ashore detector is 84.79%, which exceeds 10% of suboptimal SAR image ship detector; the SAR image offshore ship detection mAP of the method is 99.62%, which exceeds the suboptimal SAR image ship detector by 0.5 percentage point.

The method has the advantages of overcoming the unbalance problem in the prior art and improving the detection precision of the ship in the SAR image.

Drawings

Fig. 1 is a schematic flow chart of a SAR image ship detection method based on balance learning in the present invention.

Fig. 2 is a schematic diagram of a balance classification regression network in the SAR image ship detection method for balance learning in the present invention.

Fig. 3 shows the detection accuracy of the SAR image ship detection method based on balance learning in the present invention.

Detailed Description

The invention is described in further detail below with reference to fig. 1,2 and 3.

Step 1, initializing a data set

Step 2.1, extracting SSDD data set characteristics by using GAN network

As shown in fig. 1, according to the classic GAN network construction method in definition 2, a countermeasure network GAN is constructed and generated₀. Training and optimizing to generate an antagonistic network GAN according to a classical Adam algorithm in definition 4 by taking the new SSDD data obtained in the step 1 as input₀And generating the countermeasure network after training and optimization, and recording as GAN.

Then, taking the new SSDD data obtained in step 1 as input again, according to the conventional forward propagation method in definition 5, inputting the new SSDD data obtained in step 1 into the trained and optimized generative countermeasure network GAN, and obtaining an output vector M of the network, which is { M1, M2, … Mi, … M1160}, where Mi is an output vector of the ith picture in the new SSDD data.

Step 2.2, clustering scenes

step 2.3, initializing parameters

Step 2.4, carrying out iterative operation

Firstly, using a formula

Calculating scene characteristics M of ith pictureⁱTo the first centroid in the 1 st iteration

Is marked as

Using a formula

Calculating scene characteristics M of ith pictureⁱTo the second centroid in the 1 st iteration

Is marked as

Comparison

And

if, if

Then the scene characteristics M of the ith picture in the 1 st iteration are definedⁱBelong to the second category, otherwise define the scene characteristics M of the ith picture in the 1 st iterationⁱBelonging to the first category.

Defining all scenes of the first class after iteration step 1The set of features is

The set of all scene features of the second class is

Then let t be 2, perform the following until convergence:

1) let the centroid parameter of the t step

Is a set

The arithmetic mean of (1), let the centroid parameter of the t step

Is a set

Is calculated as the arithmetic mean of (1).

2) Using a formula

Calculating scene characteristics M of ith pictureⁱTo the first centroid in the t-th iteration

Is marked as

By using

Scene feature M of ith pictureⁱTo the second centroid in the t-th iteration

Is marked as

3) Comparison

And

if it is

Then define the scene characteristics M of the ith picture in the t iterationⁱBelongs to the second category, otherwise defines the scene characteristics M of the ith picture in the t iterationⁱBelonging to the first category. Defining all scene characteristics of the first class as

All scene features of the second class are

And outputting a clustering result, and marking as CLASS.

Step 2.5, carrying out scene amplification

Dividing all pictures in the new SSDD Data into two types according to the CLASS obtained from the step 2.4 and all pictures in the new SSDD Data, wherein the first type is a landing scene picture and is marked as Data₁The second type is offshore scene picture marked as Data₂. Defining Data₁Number of pictures of N₁，Data₂Number of pictures of N₂。

If N is present₂>N₁Then from the first class as the landing scene picture Data₁In the method, N is randomly selected based on Gaussian distribution₂-N₁Carrying out mirror image operation on a picture to obtain N after the mirror image operation₂-N₁Opening a picture, recording as Data_1extra. Then N after the mirroring operation₂-N₁Picture Data_1extraAnd the first type is the land-backing scene picture Data₁Merging and outputting a new picture set which is recorded as Data_1new. Defining Data_2new＝Data₂。

If N is present₂<＝N₁From the second class, Data is an offshore scene picture₂In the method, N is randomly selected based on Gaussian distribution₁-N₂Carrying out mirror image operation on a picture to obtain N after the mirror image operation₁-N₂Opening a picture, recording as Data_2extra. Then N after the mirroring operation₁-N₂Picture Data_2extraAnd the first type is the land-backing scene picture Data₂Merging and outputting a new picture set which is recorded as Data_2new. Defining Data_1new＝Data₁。

Defining a new set of pictures Data_new＝{Data_1new,Data_2new}。

Step 3, building a forward propagation network

Step 3.1, building a balanced feature pyramid network

As shown in fig. 1, a classical residual network construction method in definition 6 is adopted to construct a residual network with 50 network layers, which is recorded as Res-50, and feature maps generated by the last layer of network with different sizes in the residual network Res-50 are respectively recorded as F from large to small according to the feature map size₁，F₂，F₃，F₄，F₅。

F is to be₅Is otherwise denoted as P₅。

Following the convolution sum operation in definition 7, F₄Feature extraction by 1 × 1 convolution sum, feature extractionThe extracted result is marked as E₄(ii) a P is upsampled by the upsampling operation as in definition 9₅Feature size of (D) and (F)₄If the result of the sampling operation is consistent, the result is recorded as U₅(ii) a According to the cascade operation in definition 8, E₄And U₅Overlapping, and recording the overlapping result as P₄。

Following the convolution sum operation in definition 7, F₃Performing feature extraction by using 1 × 1 convolution sum, and recording the feature extraction result as E₃(ii) a P is upsampled by the upsampling operation as in definition 9₄Feature size of (D) and (F)₃If the result of the sampling operation is consistent, the result is recorded as U₄(ii) a According to the cascade operation in definition 8, E₃And U₄Overlapping, and recording the overlapping result as P₃。

Following the convolution sum operation in definition 7, F₂Performing feature extraction by using 1 × 1 convolution sum, and recording the feature extraction result as E₂(ii) a P is upsampled by the upsampling operation as in definition 9₃Feature size of (D) and (F)₂If the result of the sampling operation is consistent, the result is recorded as U₃(ii) a According to the cascade operation in definition 8, E₂And U₃Overlapping, and recording the overlapping result as P₂。

Following the convolution sum operation in definition 7, F₁Performing feature extraction by using 1 × 1 convolution sum, and recording the feature extraction result as E₁(ii) a P is upsampled by the upsampling operation as in definition 9₂Feature size of (D) and (F)₂If the result of the sampling operation is consistent, the result is recorded as U₂(ii) a According to the cascade operation in definition 8, E₁And U₂Overlapping, and recording the overlapping result as P₁。

P is upsampled by the upsampling operation as in definition 9₅Feature size and P of₃When the result of the sampling operation is consistent, the result is recorded as H₅。

P is upsampled by the upsampling operation as in definition 9₄Feature size and P of₃When the result of the sampling operation is consistent, the result is recorded as H₄。

Will P₃Is otherwise denoted as H₅。

P is pooled by maximum pooling as per pooling operation in definition 10₂Feature size and P of₃When the result of the sampling operation is consistent, the result is recorded as H₂。

P is pooled by maximum pooling as per pooling operation in definition 10₁Feature size and P of₃When the result of the sampling operation is consistent, the result is recorded as H₁。

H is to be₁，H₂，H₃，H₄，H₅According to the formula

Taking the characteristic diagram I as an input according to a formula

And (4) taking all the network operations in the step 3.1 as a balanced feature pyramid network, and marking as a backhaul.

Step 3.2, building a regional recommendation network

According to the regional recommended network construction method in the definition 11, the backhaul obtained in the step 3.1 is used as a feature extraction layer to construct a regional recommended network, and the regional recommended network is marked as RPN₀。

Step 3.3, building a balance classification regression network

As shown in fig. 2, the balanced classification regression network is divided into two parts, namely a classification head lead and a regression head Rhead, and full connection layers FC1 and FC2 are constructed according to the conventional full connection layer method in definition 12, the output of FC1 is used as the input of FC2, and FC1 and FC2 are used as classification heads and are marked as cluads; constructing four convolutional layers, Conv1, Conv2, Conv3, and Conv4, respectively, according to the convolutional kernel method in definition 7; at the same time, the Pooling layer is constructed according to the Pooling operation in definition 10, denoted Pooling. The output of Conv1 was taken as the input of Conv2, the output of Conv2 as the input of Conv3, the output of Conv3 as the input of Conv4, and the output of Conv4 as the input of Pooling. Conv1, Conv2, Conv3, Conv4 and Pooling were used as regression heads and labeled Rehead. The Classification head Clhead and the regression head Rehead have the same characteristic diagram input, and together with the backhaul, the Classification head Clhead and the regression head Rehead form a balanced classification regression network which is marked as BCRN₀。

Step 4, training area recommendation network

An iteration parameter epoch is set, and an initial epoch value is 1.

Taking the training set Train of the amplified data set Datanew obtained in the step 2 as a regional recommended network (RPN)₀According to the forward propagation method in definition 5, the training set Train is sent to the regional recommended network RPN₀Computing and recording network RPN₀As Result 0.

Taking the input Result0 obtained in the step 4.1 and the training set Train as input, and according to a formula

Calculating the IOU value of each recommendation box in Result0 by using a calculation method, and taking the output of the IOU more than 0.5 in Result0 as a positive sample, and recording as Result0 p; the output of Result0 with an IOU less than 0.5 is taken as a negative sample and is denoted as Result0 n. The total number of samples in the negative sample Result0n is counted as M. Manually inputting the number of required negative samples, and recording the number as N; the number of intervals for dividing IOU equally by human input is n_bThe number of samples in the ith IOU interval is M_i. Setting the random sampling probability of the ith interval as

Step 4.3, training and optimizing the regional recommendation network

And (3) taking the positive sample sampling Result0ps and the negative sample sampling Result0ns obtained in the step 4.2 as input, and training and optimizing the regional recommendation network according to the classic Adam algorithm in the definition 4. And obtaining the RPN1 of the area recommendation network after training and optimization.

Step 5, training the balance classification regression network

Taking the training set Train of the amplified data set Datanew obtained in the step 2 as a balance classification regression network BCRN₀According to the forward propagation method in definition 5, the training set Train is sent to the BCRN₀Calculating, and recording balance classification regression network BCRN₀As Result 1.

Step 5.2, training and optimizing the balance classification regression network

The equilibrium classification obtained in the step 5.1 is regressed to a BCRN₀Is used as input to train and optimize the regional recommendation network according to the classical Adam algorithm in definition 4. And obtaining the trained and optimized regional recommended network BCRN 1.

Step 6, alternate training is carried out

Step 7, evaluation method

Step 7.1, Forward propagation

firstly, marking a box with the highest score in a detection result R1 as a BS;

the step (2) then adopts a calculation formula as follows:

step (3) selecting a frame BS with the highest score from the rest frames;

Step 7.2, calculating indexes

As shown in FIG. 3, the detection result R obtained in step 7.1 is used^FAs input, calculating the precision ratio P, the recall ratio R and a precision ratio and recall ratio curve P (R) of the network by adopting a traditional recall ratio and precision ratio calculation method in definition 14; using a formula

And calculating the average detection accuracy mAP of the SAR ship based on balance learning.

Claims

1. a ship detection method based on balanced learning is characterized in that it comprises the following steps:

Step 1. Initialize the SSDD dataset

A random method was used to adjust the order of SAR images in the SSDD dataset to obtain a new SSDD dataset;

Step 2. Use the balanced scene learning mechanism for scene augmentation

Step 2.1. Use GAN network to extract SSDD dataset features

Using the classic GAN network construction method, build the generative adversarial network GAN ₀ ; take the new SSDD data obtained in step 1 as input, use the classic Adam algorithm to train and optimize the generative adversarial network GAN ₀ , get the training and optimization Generative adversarial network, denoted as GAN;

Then again take the new SSDD data obtained in step 1 as input, adopt the traditional forward propagation method, input the new SSDD data obtained in step 1 into the generative adversarial network GAN after training and optimization, and get the network The output vector M={M1,M2,...Mi,...M1160}, where Mi is the output vector of the ith picture in the new SSDD data;

Define the output vector M as the scene feature of all pictures in the new SSDD dataset, and define Mi as the scene feature of the ith picture in the new SSDD dataset;

Step 2.2, perform scene clustering

Taking the set M of scene features of all pictures in the new SSDD data obtained in step 2.1 as input, the traditional K-means clustering algorithm is used, and the pictures in the new SSDD data set are clustered with the help of scene feature M:

Step 2.3, initialization parameters

For the centroid parameter in the traditional K-means clustering algorithm, randomly initialize the centroid parameter of the K-means clustering algorithm in the first iteration, denoted as

Define the current number of iterations as t, t=1,2,...,I, where I is the maximum number of iterations of the K-means clustering algorithm, initialize I=1000; define the centroid parameter of the t-th iteration as

Initialize the iterative convergence error ε, as one of the algorithm iterative convergence conditions;

Step 2.4, perform iterative operation

First use the formula

Calculate the scene feature M ⁱ of the ith picture to the first centroid in the first iteration

distance, denoted as

using the formula

Calculate the scene feature M ⁱ of the ith image to the second centroid in the first iteration

distance, denoted as

Compare

and

like

Then define: in the first iteration, the scene feature M ⁱ of the ith picture belongs to the second category, otherwise, it is defined that the scene feature M ⁱ of the ith picture in the first iteration belongs to the first category;

Definition: After the first iteration, the set of all scene features of the first class is

The set of all scene features of the second category is

Then let t=2, do the following until convergence:

1) Let the centroid parameter of the t-th step

for the collection

The arithmetic mean of , let the centroid parameter at step t

for the collection

The arithmetic mean of ;

2) Using the formula

Calculate the scene feature M ⁱ of the i-th image to the first centroid in the t-th iteration

distance, denoted as

using the formula

Calculate the scene feature M ⁱ of the i-th image to the second centroid in the t-th iteration

distance, denoted as

3) Compare

and

like

Then define: in the t-th iteration, the scene feature M ⁱ of the ith picture belongs to the second category, otherwise, define: in the t-th iteration, the scene feature M ⁱ of the ith picture belongs to the first category; After the t-th iteration, the set of all scene features of the first category is

The set of all scene features of the second category is

Output the clustering result, denoted as CLASS;

4) Calculate the change of the centroid parameter between this iteration and the previous iteration, denoted as σ, and the expression is

If σ<ε or t<I, output the clustering result CLASS, otherwise t=t+1, and then return to step 1) to continue the iteration;

Step 2.5, perform scene augmentation

According to the clustering result CLASS obtained in step 2.4 and all the pictures in the new SSDD data, all the pictures in the new SSDD data are divided into two categories, the first category is the landing scene pictures, denoted as Data ₁ , the second category The offshore scene picture is recorded as Data ₂ ; Definition: The number of pictures of Data ₁ is N ₁ , and the number of pictures of Data ₂ is N ₂ ;

If N ₂ >N ₁ , randomly select N ₂ -N ₁ pictures from the first type of images of the landing scene, Data ₁ , based on Gaussian distribution, and perform traditional mirroring operations to obtain N ₂ -N ₁ pictures after mirroring operation. Picture, denoted as Data _1extra ; then merge the N ₂ -N ₁ pictures Data _1extra after the mirror operation with the first type of docking scene picture Data ₁ , and output a new set of pictures, denoted as Data _1new ; define Data _2new = _Data2 ;

If N ₂ <= N ₁ , then randomly select N ₁ -N ₂ pictures from the second category of offshore scene pictures Data ₂ based on Gaussian distribution to perform traditional mirroring operations, and obtain N ₁ -N ₂ pictures after mirroring operations Picture, denoted as Data _2extra ; then merge the N ₁ -N ₂ pictures Data _2extra after the mirror operation with the first type of docking scene picture Data ₂ , and output a new set of pictures, denoted as Data _2new ; define Data _1new = _Data1 ;

Define a new image set Data _new = {Data _1new , Data _2new };

Divide the data _new into two parts according to the ratio of 7:3, and obtain the training set and the test set. The training set is recorded as Train, and the test set is recorded as Test;

Step 3. Build a forward propagation network

Step 3.1, build a balanced feature pyramid network

The classical residual network construction method is used to build a residual network with 50 network layers, denoted as Res-50. At the same time, the feature maps generated by the last layer of networks of different sizes in the residual network Res-50 are classified according to the feature map size. They are recorded as F ₁ , F ₂ , F ₃ , F ₄ , and F ₅ in descending order;

Denote F ₅ as P ₅ ;

Using traditional convolution kernel operation, F ₄ is convolved with 1×1 sum to perform feature extraction, and the feature extraction result is recorded as E ₄ ;

The adopted traditional up-sampling operation, through the up-sampling operation, the size of the feature map of P ₅ is consistent with F ₄ , and the result after the up-sampling operation is recorded as U ₅ ;

Using traditional cascade operation, E ₄ and U ₅ are superimposed, and the superposition result is recorded as P ₄ ;

Using traditional convolution kernel operation, F ₃ is convolved with 1×1 sum to perform feature extraction, and the feature extraction result is recorded as E ₃ ;

Adopt the traditional up-sampling operation, through the up-sampling operation, the size of the feature map of P ₄ is consistent with F ₃ , and record the result after the up-sampling operation as U ₄ ;

Using traditional cascade operation, superimpose E ₃ and U ₄ , and denote the superposition result as P ₃ ;

The traditional convolution kernel operation is adopted, and F ₂ is convolved with 1×1 sum for feature extraction, and the feature extraction result is recorded as E ₂ ;

Using the traditional upsampling operation, through the upsampling operation, the size of the feature map of _P3 is consistent with F2, and the result after the upsampling operation is recorded as _U3 _;

Using traditional cascade operation, superimpose E ₂ and U ₃ , and denote the superposition result as P ₂ ;

Using traditional convolution kernel operation, F ₁ is convolved with 1×1 sum to perform feature extraction, and the feature extraction result is recorded as E ₁ ;

Using the traditional upsampling operation, through the upsampling operation, the size of the feature map of P ₂ is consistent with F ₂ , and the result after the upsampling operation is recorded as U ₂ ;

Using cascade operation, E ₁ and U ₂ are superimposed, and the superposition result is recorded as P ₁ ;

Adopt the traditional up-sampling operation, make the feature map size of P ₅ consistent with P ₃ , through the up-sampling operation, and record the result after the up-sampling operation as H ₅ ;

Using the traditional upsampling operation, the size of the feature map of _P4 is consistent with that of _P3 through the upsampling operation, and the result after the upsampling operation is recorded as H4 _;

Denote P ₃ as H ₅ ;

Using the traditional pooling operation, the size of the feature map of P ₂ is consistent with that of P ₃ through maximum pooling, and the result after the sampling operation is recorded as H ₂ ;

Using the traditional pooling operation, the size of the feature map of P ₁ is consistent with that of P ₃ through maximum pooling, and the result after the sampling operation is recorded as H ₁ ;

For H ₁ , H ₂ , H ₃ , H ₄ , H ₅ , use the formula

Calculate the feature map I, where k represents the subscript of H, and (i, j) represents the spatial sampling position of the feature map;

Taking the feature map I as input, the formula

Calculate the feature map O; wherein, I _i represents the feature of the ith position on the feature map I; O _i represents the feature of the ith position on the feature map O;

represents the normalization factor; f(I _i , I _j ) is a function used to calculate the similarity between I _i and I _j , and the specific expression is

Among them, θ(I _i )=W _θ I _i , φ(I _j )=W _φ I _J , W _θ and W _φ are matrices learned through 1×1 convolution operation; g(I _j )=W _g I _j , W _g are matrices learned through a 1×1 convolution operation;

After all network operations in step 3.1 are completed, a balanced feature pyramid network is obtained, denoted as Backbone;

Step 3.2. Build a regional recommendation network

The traditional regional recommendation network construction method is adopted, and the Backbone obtained in step 3.1 is used as the feature extraction layer to construct the regional recommendation network, which is recorded as RPN ₀ ;

Step 3.3, build a balanced classification and regression network

The traditional fully connected layer method is used to construct the fully connected layers FC1 and FC2, the output of FC1 is used as the input of FC2, and FC1 and FC2 are used as the classification head, which is recorded as Clhead;

The traditional convolution kernel method is used to build four convolution layers, namely Conv1, Conv2, Conv3, and Conv4; at the same time, the traditional pooling operation is used to build the pooling layer, which is recorded as Pooling; the output of Conv1 is used as the input of Conv2, and the output of Conv2 is The output is used as the input of Conv3, the output of Conv3 is used as the input of Conv4, and the output of Conv4 is used as the input of Pooling; Conv1, Conv2, Conv3, Conv4, and Pooling are used as the regression head, denoted as Rehead; the classification head Clhead and the regression head Rehead have the same Feature map input, together with Backbone to form a balanced classification and regression network, denoted as BCRN ₀ ;

Step 4. Train the regional recommendation network

Set the iteration parameter epoch, and initialize the epoch value to 1;

Step 4.1, forward propagation to the regional recommendation network

The training set Train of the augmented data set Datanew obtained in step 2 is used as the input of the regional recommendation network RPN ₀ , and the traditional forward propagation method is used to send the training set Train into the regional recommendation network RPN ₀ for operation, denoted the network RPN ₀ the output as Result0;

Step 4.2, perform balanced interval sampling on forward propagation results

Take the input Result0 obtained in step 4.1 and the training set Train as input, using the formula

Calculate the IOU value of each recommendation box in Result0, take the output with IOU greater than 0.5 in Result0 as a positive sample, and record it as Result0p; take the output with IOU less than 0.5 in Result0 as a negative sample, record it as Result0n; count the total number of negative samples in Result0n. The number of samples is M; the number of negative samples required for artificial input is denoted as N; the number of intervals of equal division IOUs required for artificial input is n _b , and the number of samples in the i-th IOU interval is M _i ; The random sampling probability is

Random sampling is performed on each IOU interval, and the sampling results of all IOU intervals of the negative sample are recorded as Result0ns;

Count the number of samples in the positive sample Result0p, denoted as P; set the random sampling probability as

Random sampling is performed on Result0p, and the positive sample sampling result is recorded as Result0ps;

Step 4.3. Train and optimize the regional recommendation network

Using the positive sample sampling result Result0ps and the negative sample sampling result Result0ns obtained in step 4.2 as input, use the classic Adam algorithm to train and optimize the regional recommendation network; obtain the regional recommendation network RPN1 after training and optimization;

Step 5. Train a balanced classification and regression network

Step 5.1, forward propagation to the balanced classification and regression network

The training set Train of the augmented data set Datanew obtained in step 2 is used as the input of the balanced classification and regression network BCRN ₀ , and the traditional forward propagation method is used to send the training set Train into the balanced classification and regression network BCRN ₀ for operation, and the balance is recorded. The output of the classification and regression network BCRN ₀ is used as Result1;

Step 5.2. Train and optimize the balanced classification and regression network

Take the output Result1 of the balanced classification and regression network BCRN ₀ obtained in step 5.1 as input, train and optimize the regional recommendation network according to the classic Adam algorithm; obtain the regional recommendation network BCRN1 after training and optimization;

Step 6. Perform alternate training

Determine whether the epoch set in step 4 is equal to 12; if the epoch is not equal to 12, set epoch=epoch+1, RPN ₀ =RPN ₁ , BCRN ₀ =BCRN ₁ , and repeat steps 4.1, 4.2, 4.3, and 5.1 in turn , Step 5.2, and then return to step 6 to judge the epoch again; if the epoch is equal to 12, let the trained regional recommendation network RPN1 and the trained balanced classification regression network BCRN1 be recorded as the network BL-Net, and then proceed to step 7.

Step 7. Evaluation method

Step 7.1, forward propagation

Take the network BL-Net obtained in step 6 and the test set Tests obtained in step 2.5 as input, and use the traditional forward propagation method to obtain the detection result, which is denoted as R;

Taking the detection result R as input, the traditional non-maximum suppression method is used to remove redundant boxes in the detection result R1. The specific steps are as follows:

Step (1) first make the frame with the highest score in the detection result R1, denoted as BS;

Step (2) then adopts the calculation formula as:

Calculate the overlap rate threshold (IoU) of all boxes in the detection result R1; discard the boxes with IoU>0.5;

Step (3) selects the frame BS with the highest score from the remaining frames;

Repeat the process of calculating IoU and discarding in the above step (2), until there is no frame to discard, and the last remaining frame is the final detection result, denoted as ^RF ;

Step 7.2, Calculate the index

Taking the detection result RF obtained in step 7.1 as the input, using the traditional recall rate and precision rate calculation method, obtain the network precision rate ^P , recall rate R and precision rate and recall rate curve P(R);

using the formula

The average precision mAP for SAR vessel detection based on balanced learning is calculated.